Big Idea

The argument in one line.

A single CLAUDE.md file derived from Karpathy's coding principles fixes the four most expensive AI agent failure modes: silent assumptions, overengineering, scope creep, and unverified output.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

Claude Code users who keep getting over-engineered, scope-creeping outputs that solve problems they didn't have
Developers who want the Karpathy-derived CLAUDE.md principles explained and demonstrated before they copy the file
Anyone using coding agents daily who wants a 9-minute fix for silent assumptions and unverified output

SKIP IF…

Developers not using Claude Code or AI coding agents — the demo is specific to the Claude Code workflow
People who already have a well-tuned CLAUDE.md and agent behavior they're happy with

TL;DR

The full version, fast.

AI coding agents have stopped writing broken syntax — the new failure modes are subtler and more expensive: silent assumptions where the agent picks one interpretation and commits without asking, overengineering where a simple function becomes a configurable utility class, scope creep where adjacent code gets refactored uninvited, and unverified output where the agent reports success without confirming the thing actually works. The andrej-karpathy-skills GitHub repo addresses all four with a single CLAUDE.md file derived from Karpathy's public thread on how his workflow shifted to 80% agent coding. The file installs in minutes, runs as a behavioral specification for Claude Code, and is demonstrated live against an ecommerce dashboard build to show the measurable before/after difference in output quality.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:15

01 · Karpathy thread + the four problems

Opens with Karpathy workflow flip (80% manual to 80% agent). Frames story around failures. Names four agent failure modes: silent assumptions, overengineering, scope creep, no verification.

01:15 – 03:35

02 · Excalidraw: problems mapped to cost

Three-column diagram: Your Request / What the Agent Does / What You Get. Walks each failure mode with concrete cost examples (400-line OAuth, 200-line date formatter, 40-line diff, untested validation).

03:35 – 04:11

03 · The four principles (GitHub README)

Introduces the Karpathy Skills repo. Maps each principle to the problem it solves: Think Before Coding, Simplicity First, Surgical Changes, Goal-Driven Execution.

04:11 – 05:31

04 · Installation walkthrough

Two paths: Claude Code plugin (global, recommended) via /plugin marketplace add + /plugin install; and per-project curl with append support for existing CLAUDE.md.

05:31 – 07:32

05 · Live demo: ecommerce dashboard

Builds dashboard with guidelines active. Agent asks 3 clarifying questions. Output: 1 file, 120 lines. Without guidelines: 6-8 files, 500+ lines, unasked-for features.

07:32 – 08:07

06 · VS Code: clean diff

Shows actual code output. Every changed line traces to what was asked. No renamed variables, no reformatted comments, no drive-by refactors.

08:07 – 09:04

07 · Tradeoff + close

Guidelines bias toward caution, not speed. For trivial tasks they are overkill. For nontrivial work where wrong assumptions cost hours, they are the fix.

Atomic Insights

Lines worth screenshotting.

Coding agents have stopped making syntax errors — the new failure mode is silent assumptions that produce confident, wrong answers.
When you ask for user authentication, the agent picks OAuth with refresh tokens and role-based access control because it was trained on codebases where abstraction is rewarded.
A prompt to fix a bug in one function often produces a 40-line diff touching things you never asked about.
Agents write code and call it done without verifying it handles empty strings, edge cases, or any real input at all.
A CLAUDE.md file with 50 lines of behavioral rules changes the agent's default from guess-and-build to ask-first-then-build.
The Karpathy Skills repo has 26,000+ stars because it solves a specific, expensive problem that most AI coding tutorials never address.
Installing the guidelines as a plugin applies them across every project automatically — no per-project setup required.
A dashboard built with the guidelines active produces one file and 120 lines; the same prompt without them typically yields 6-8 files and 500+ lines.
Every line in the output should trace back to something you asked for — surprise edits are a signal the agent is operating outside its guidelines.
The guidelines bias toward caution over speed, which is the right trade-off for any non-trivial feature where wrong assumptions cost hours.

Takeaway

Your CLAUDE.md is a behavioral contract, not a hint.

Steal this for JoeFlow / MCN+ / any Claude Code project

The agent does not need better prompts — it needs explicit rules about what NOT to do.

Install the plugin route (global, 10 seconds): /plugin marketplace add forrestchang/andrej-karpathy-skills then /plugin install andrej-karpathy-skills@karpathy-skills
Or append to your existing CLAUDE.md with the curl append command — your rules stay on top, Karpathy principles go at the bottom
The four failure modes (silent assumptions, overengineering, scope creep, no verification) are the ones that kill review time — name them in your own CLAUDE.md
The Excalidraw diagram format (Your Request / What Agent Does / What You Get) is a reusable content frame for any AI tool breakdown video
The tradeoff is honest and worth saying out loud: these rules slow down trivial tasks, so scope them to nontrivial work

Glossary

Terms worth knowing.

CLAUDE.md: A Markdown file placed in a project folder (or user config) that gives Claude Code standing behavioral instructions — role, constraints, coding style rules — that apply to every prompt in that project without needing to be repeated.
Claude Code: Anthropic's AI coding assistant that can read a project's files, write and edit code, run terminal commands, and build software through natural-language conversation.
Coding agent: An AI system that can autonomously plan and execute multi-step software development tasks — reading files, writing code, running tests, and iterating — with minimal human intervention between steps.
Pull request: A request to merge a set of code changes from one branch into another in a version control system like Git, typically reviewed by other developers before being accepted.
Silent assumption: When an AI agent picks one interpretation of an ambiguous request and acts on it without flagging the ambiguity or asking for clarification, often producing output that solves the wrong problem.
Scope creep (in edits): When an AI makes unrequested changes beyond the stated task — reformatting files, renaming variables, or refactoring unrelated code — inflating the diff and adding review burden.
Builder pattern: A software design pattern where an object is constructed step-by-step through a chain of method calls rather than a single constructor, used to handle complex configurations.
Diff: A view showing exactly which lines of code were added, changed, or removed between two versions of a file, used in code review to understand what a change actually does.
Claude Code plugin: A packaged extension that adds reusable instructions, skills, or tool integrations to Claude Code, installable via the plugin marketplace and applying across all projects.
Tailwind CSS: A utility-first CSS framework that lets developers style HTML elements by applying small, single-purpose class names directly in markup rather than writing custom stylesheets.

Resources

Things they pointed at.

00:57toolandrej-karpathy-skills (GitHub) ↗

00:00linkKarpathy X thread on coding workflow flip

Quotables

Lines you could clip.

08:28

“Coding agents are capable, but they behave badly. They make silent mistakes that look correct on the surface. They build too much when you need too little. They touch things they should not touch, and they do not verify their own work. This file corrects those patterns in about 50 lines of markdown.”

Self-contained thesis, no setup needed, lands hard→ TikTok hook or IG reel cold open↗ Tweet quote

08:08

“Every line that changed traces back to what I asked for. There are no surprise edits, no renamed variables in other files, no reformatted comments, no drive-by refactors.”

Concrete payoff after the demo — shows rather than tells→ Newsletter pull-quote↗ Tweet quote

00:48

“And then you end up reviewing a giant pull request that solves a problem you never actually had.”

Tight punchline, universal pain point for any Claude Code user→ TikTok hook↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogystory

00:00Andre Karpathy posted a long thread on X about how his coding workflow completely flipped. He went from 80% manual coding to 80% agent coding in just a few weeks, and he was pretty honest about it.

00:14He said he is now mostly programming in English, telling the model what to write in plain words. But the interesting part was not the productivity gains. It was the failures.

00:25He laid out clearly how coding agents keep messing up in ways that are not obvious. They are not writing broken syntax anymore. The mistakes are deeper.

00:35They make assumptions about what you meant and keep going without checking. They pick one interpretation out of three possible ones and commit to it silently. They do not ask you to clarify.

00:46They just act confident and move forward. And then you end up reviewing a giant pull request that solves a problem you never actually had. So that thread got a lot of attention, and a developer named Forrest Chang took the core ideas from that post and turned them into a single file, a file called claud dot m d.

01:05It is a set of behavioral guidelines for claud code. The repo is called Andre Karpathy Skills, and it is on GitHub right now. It has more than 26 thousands, and honestly, it is one of the most practical things I have seen in this space recently.

01:20Not because it does anything fancy, but because it targets the exact problems that waste the most time when you are working with AI coding agents every day. Okay.

01:31So before I show you how to install it and use it, I want to explain why this matters. Because if you have not run into these problems yet, you probably will soon. And if you already have, then you know exactly what I am about to describe.

01:44The first problem is silent assumptions. You ask your agent to add user authentication. There are 10 different ways to interpret that.

01:53Session based? Token based? OAuth?

01:55The agent does not know. But instead of asking you which direction to go, it just picks one and starts building. And it picks the most complex version.

02:05So twenty minutes later, you have a 400 line auth system with OAuth, refresh tokens, and role based access control. All you needed was basic email and password for a prototype.

02:16The model guessed instead of asking. The second problem is overengineering. You ask for a simple function that formats a date string.

02:25You get back a configurable date formatting utility class with six methods, a builder pattern, and error handling for edge cases that will never happen. The model writes 200 lines when 30 would do. It is trained on massive code bases where abstraction is valued, so it defaults to that style even when the task is small.

02:45The third problem is scope creep in edits. You ask the agent to fix a bug in one function, it fixes the bug. But it also reformats the file, renames variables, cleans up comments that were not part of the task, and refactors an adjacent function that was fine.

03:01Now your diff is 40 lines instead of four, and you have to review every change to make sure nothing broke. The fourth problem is the lack of verification. You tell the agent to add form validation.

03:14It adds the validation code and says done, but did it actually test it? Did it check if it handles empty strings, special characters, values that are too long?

03:23Usually no. It just writes the code and moves on. There is no verification step.

03:29There is no success criteria. It does what you asked in the most literal way and calls it finished. So those are the four core problems, and the Carpathi Skills repo addresses each one with a matching principle.

03:42The first principle is called think before coding. It basically tells the agent to stop and surface any ambiguity before writing a single line. If there are multiple ways to interpret a request, list them.

03:54If something is unclear, ask. Do not guess. The second principle is simplicity first.

04:00Write the minimum code needed to solve the problem. No speculative features. No abstractions for things that are only used once.

04:08Alright. So now let me show you how to actually set this up. I am going to go through the installation, and then we are going to build a small project with it so you can see the difference.

04:18So I am on my screen right now. There are two ways to install this. The first way is the recommended one, and that is the Claude Code plugin route.

04:27You open Claude Code and run slash plugin marketplace add Carpathi skills that adds the marketplace. Then you run slash plugin install skills. And that is it.

04:38Now the guidelines are installed as a plugin, which means they apply across all your projects automatically. You do not have to copy any file into each project folder. It just works everywhere.

04:50The second way is the per project route. This is for when you only want the guidelines inside one specific project. You open your terminal, go to your project folder, and run this cURL command that downloads the file straight into your project root.

05:04Now if you already have a claud. Md file and you do not want to lose your existing rules, you can append instead of overwrite. The repo shows the command for that.

05:14You run echo to add a blank line and then curl the file content and pipe it into your existing claud. Md. Your project rules stay on top and the Carpathi guidelines get added at the bottom.

05:27Either way takes about ten seconds. Okay. So now that it is installed, let me actually build something with it so you can see how the agent behaves differently.

05:35I am going to ask Claude Code to build a simple ecommerce dashboard. Nothing crazy. Just a front end page that shows total revenue, number of orders, top selling products, and maybe a recent orders table.

05:49The kind of thing you would want to glance at to see how your store is doing. So I type my prompt, build an ecommerce dashboard page that shows total revenue, order count, top products, and a recent orders table, use React and Tailwind, keep it simple, and now watch what happens. With the Carpathathy guidelines active, the first thing the agent does is ask me questions.

06:10It wants to know if the data should come from a real API or if hard coded sample data is fine for now. It asks if I want the dashboard to be responsive or desktop only. It asks if I need any filtering or date range selector or if this is just a static snapshot view.

06:26This is exactly the behavior we want. It is not guessing. It is asking.

06:31I tell it sample data is fine, desktop only for now, and no filters needed. Just the basic overview. And then it builds exactly that, nothing more.

06:41The output is clean. One file, maybe 120 lines of code.

06:45Four stat cards at the top, a simple table for recent orders, and a short list of top products. No router setup, no state management library, no API service layer, no authentication wrapper, no dark mode toggle.

06:59Just the dashboard I asked for. And if you have ever asked an agent to build a dashboard without these guidelines, you know what usually happens. You get six to eight files, maybe 500 lines, with a full component tree, context providers, a mock API with fetch hooks, loading skeletons, pagination, and a sidebar navigation for pages that do not even exist yet.

07:21For a prototype, for something you just needed to check a layout idea, that is the difference these guidelines make.

07:28And here is the other thing I noticed. The diff is exactly what I expected. Every line that changed traces back to what I asked for.

07:35There are no surprise edits, no renamed variables in other files, no reformatted comments, no drive by refactors. Just the code I asked for, and nothing else. That is the surgical changes principle doing its work.

07:49You spend way less time reviewing because there is less noise in the output. And now there is a trade off the repo is honest about. These guidelines bias toward caution over speed.

08:00For simple stuff like fixing a typo, you do not need the full rigor. The guidelines are meant for nontrivial work. The kind where wrong assumptions cost you hours and over engineering means throwing away the output.

08:14And I think that is what makes this repo different from a lot of other stuff in the AI coding space right now. It is not trying to impress you, it is solving a very specific, very real problem.

08:25Coding agents are capable, but they behave badly. They make silent mistakes that look correct on the surface. They build too much when you need too little.

08:33They touch things they should not touch, and they do not verify their own work. This file corrects those patterns in about 50 lines of markdown. If you are using Claude code regularly, this is worth trying.

08:46It takes ten seconds to install, and the worst case is you delete the file. But once you see the difference in your diffs and code reviews, you will want to keep it. Alright, so that's it from the video, and I hope you enjoyed it.

08:58If you did, please like this video and subscribe to the channel, and I'll see you in the next video.

The Hook

The bait, then the rug-pull.

Andrej Karpathy posted a thread saying he now programs mostly in English — and then catalogued exactly how that breaks. Not broken syntax. Something worse: agents that guess silently, build too much, touch things they should not, and call it done without checking. One developer turned those observations into a single file. This is a breakdown of that file.

Frameworks