Big Idea

The argument in one line.

Getting 10x output from Claude requires not better prompts but a three-layer system: a detailed spec that bridges your context and AI's computation, a verification loop that treats AI as a deterministic machine rather than a human, and a compounding environment built from CLAUDE.md rules, an LLM knowledge base, and reusable skills.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use Claude Code daily and keep hitting the ceiling where the output drifts from what you actually wanted.
You've tried plan mode and custom instructions but the sessions still feel like prompting from scratch each time.
You're a builder or solo founder who wants a repeatable workflow for shipping with AI agents, not a one-off trick.
You've heard of Karpathy's spec/verifier ideas but want a concrete, worked example you can run today.

SKIP IF…

You're brand-new to Claude — this assumes you already have a working Claude Code setup.
You want a comparison of AI tools or models — this is Karpathy's method applied specifically to Claude Code.

TL;DR

The full version, fast.

Most people treat Claude like a smart employee you can boss around with better wording. Karpathy's insight is that it's closer to a robot librarian — brilliant within its library, confidently wrong outside it, and indifferent to emotional pressure. His three-layer method closes that gap: Layer 1 (Spec) extracts your goals and context into a structured, agile-scoped document before a single line is written. Layer 2 (Verifier) sets evaluation criteria upfront, uses a second model as critic, and pulls external signals to give the agent a feedback loop. Layer 3 (Environment) turns your workspace into a compounding asset — a tuned CLAUDE.md, an LLM knowledge base of your own data, reusable skill files, and rule-based guardrails the agent literally cannot bypass.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:23

01 · Hook + Karpathy's three-layer promise

Karpathy's AI Ascent talk as the source. Car-wash question proves AI's context blindness. Promise: three layers that unlock 10x speed.

01:23 – 03:30

02 · Layer 1 — The Spec

What a spec is vs. plan mode. Three steps: uncover the goal (have Claude interview you), be agile (small scoped chunks), be precise (make Claude surface key decisions).

03:30 – 06:14

03 · Layer 2 — The Verifier

Animals vs. ghosts mental model. Three verification levers: set evaluation criteria upfront, use a second AI as critic (Codex plugin in Claude Code), pull external signal (connect deployment system or reference historical outputs).

06:14 – 08:22

04 · Mid-video re-hook + channel CTA

Anti-SLOP agreement, subscribe ask, Claude Max giveaway. Filler — skip.

08:22 – 12:30

05 · Layer 3 — The Environment

Four steps: tune CLAUDE.md, build an LLM knowledge base, create reusable skills, create rule-based guardrails (pre-tool-use hooks). Three-bucket framework: always do / ask first / never do.

12:30 – 13:18

06 · Karpathy's final answer + outro CTA

The one thing to focus on: you can outsource thinking, not understanding. Points to a follow-up video on four Claude projects.

Atomic Insights

Lines worth screenshotting.

State-of-the-art AI models will tell you to walk to a car wash 50 meters away — proving they're brilliant at what can be measured and blind to everything else.
A task and a goal are different things: 'create an end-of-month report' is a task; the conclusion it drives is the goal, and AI will never infer the goal on its own.
Giving an AI agent everything at once is waterfall thinking — the better move is agile specking: tight scope, clear checkpoint, review, adjust, repeat.
Yelling at Claude, pleading with it, or saying 'make this better' doesn't work — it's not an animal with intrinsic motivations, it's a robot librarian that can only pull from books it has.
The only lever most people ignore is the verification lever — and the creator of Claude Code says a feedback loop 2-3x's the quality of the final result.
Setting evaluation criteria before Claude touches anything is identical to the spec discipline in Layer 1 — precision upfront leaves less room for drift.
Using a second AI model as a critic is like asking a librarian from a different library to grade the first one's answer — a different training set catches what the first missed.
Your CLAUDE.md file is injected automatically at the start of every Claude session — instructions you put there aren't remembered, they're enforced.
A rule in CLAUDE.md that says 'don't touch /important' is a request; a pre-tool-use hook that blocks writes to that folder is an actual rule the agent cannot bypass.
Bucket every AI action into three categories: always do (autopilot), ask first (consequences), never do (hard lines) — then enforce the third category at the tool level, not the prompt level.
Karpathy's LLM knowledge base is a folder system on your machine where you incrementally build your own training data — your data becomes your intellectual property the AI works from.
Skills are reusable handbooks for repeatable tasks — the more you run water through a skill, the faster you find where it leaks and the better it compounds over time.
Karpathy's one-line thesis: you can outsource your thinking, but you can't outsource your understanding.

Takeaway

The three things that make Claude actually work

WHAT TO LEARN

Better prompts hit a ceiling — what compounds is a spec that transfers your context, a verification loop the agent can act on, and a workspace that improves every session.

AI can't infer your goals from a task description alone — the gap between 'create a report' and 'the decision this report drives' is yours to close, not Claude's.
Agile specking beats waterfall delegation: small, scoped chunks with a review checkpoint at each step produce better output than handing an agent everything at once.
Set evaluation criteria before Claude touches anything — defining what 'good' looks like upfront removes the biggest source of drift.
A second AI model acting as critic catches what the first missed, because a different training corpus means different blind spots.
Connecting Claude to the real system it's affecting (a deployment environment, a historical dataset) closes the feedback loop better than any prompt instruction.
CLAUDE.md instructions are enforced on every session without re-stating — investing in that file pays compound returns across every future build.
A 'don't touch this folder' rule in CLAUDE.md is a request; a pre-tool-use hook that blocks the write is an actual constraint the agent cannot reason around.
Build reusable skills for any task you repeat — the more you run water through a skill, the faster you find the leaks and the better the system compounds.
Karpathy's deepest point: delegating thinking to AI is fine, but surrendering understanding of the goal means you've lost the only lever that makes delegation work.

Glossary

Terms worth knowing.

Spec: A structured document that translates your context, goals, and constraints into a format Claude can act on. More granular than plan mode — it includes your actual goal, agile checkpoints, and explicit decisions.
Agile specking: Breaking a spec into small, reviewable chunks rather than handing the entire task to an agent at once. Each chunk has a clear checkpoint where you review and adjust before proceeding.
Robot librarian: A mental model for how AI agents work: they retrieve and recombine from a fixed training corpus brilliantly, but confidently confabulate when the answer isn't in their library.
Pre-tool-use hook: A script that fires before Claude executes a write or edit command, allowing you to inspect the target file and block the action if it violates a rule — enforced at the tool level, not the prompt level.
LLM knowledge base: A local folder system where you store your own curated data — documents, transcripts, notes — structured so an LLM can be pointed at it as a context source, making your unique knowledge retrievable.
CLAUDE.md: A project-level configuration file that Claude reads automatically at the start of every session. Instructions placed here apply to every interaction without needing to be re-stated.

Resources

Things they pointed at.

00:26linkAI Ascent 2026 ↗

06:41toolOpenAI Codex plugin for Claude Code

Quotables

Lines you could clip.

12:54

“You can outsource your thinking, but you can't outsource your understanding.”

Karpathy's one-line answer to what still matters when AI gets cheap — quotable, standalone, zero context needed.→ TikTok hook↗ Tweet quote

03:30

“The only lever you have, which most people don't even think to use, is the verification lever.”

Contrarian claim that reframes the whole AI quality problem in one sentence.→ IG reel cold open↗ Tweet quote

01:38

“A task and a goal are not the same thing. The actual goal is the conclusion you're trying to draw — and AI will literally never be able to decide what that goal is.”

Sharp distinction that most AI content glosses over — actionable and specific.→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

00:00I just listened to Andrea Carpathi speak at AISN twenty twenty six and I learned something that I wasn't expecting. Almost everyone is prompting Claude wrong. So I decided to dig deeper and see exactly how Carpathi, the former head of AI at Tesla, uses AI in 2026.

00:13And it turns out that Karpathi's method for building 10 times faster can be broken down into three simple layers. So in today's video, I'll be bringing down each layer so that anybody can apply them. And then I'll show you the one thing that Karpathi said focus on in the age of AI.

00:27So layer one is the spec. AI models are incredibly smart, but they're still missing something. To showcase their current limitation, Carpathi explained a simple question AI will get wrong.

00:37I wanna go to a car wash to wash my car and it's 50 meters away. Should I drive or should I walk? And state of the art models today will tell you to walk because it's so close.

00:48At first, I actually didn't believe this, so I went to Claude, Gemini, Grock, and ChatGPT, asked them the same question, and they all gave me the same answer. And it reveals the whole foundation of this video.

00:57AI is brilliant at what can be measured. But for context driven things like needing a car for a car wash, it has no signal to act on. So how do you bridge this gap between your understanding and your contextual information and AI's computational power?

01:11That's where the spec comes in. And a spec is how you deliver your understanding to Claude in a format it can use. A term you may have heard is Claude's plan mode, which essentially can be used to help you create a plan before building anything.

01:22But Carpathi thinks that this is too high level. I actually don't even like the plan mode. I I would I mean, obviously, it's very useful, but I think there's something more general here where you have to work with your agent to design a spec that is very detailed.

01:34Now, Karpathy isn't telling you that plan mode is bad. What he's actually saying is you have to go deeper. Work with these AI tools to design the actual spec.

01:43So how do you create a spec that Claude can successfully use to build what you're trying to build? The first step is you have to uncover your goal. If you just say create a end of month report, that's a task.

01:54But the actual goal is a conclusion you're trying to draw, the decision the report drives. And what the goal actually is is something AI will literally never be able to decide. So to help you do this, we'll tell Claude to interview me to identify the goal of this project.

02:09This is the way to get the information out of you and into the spec. Now, step two is be agile with how you work. There are two methods of completing any task.

02:17The first is waterfall and the other is agile. Waterfall is you take a big task and you complete the entire thing and then you show the final product. Agile on the other hand is you break that same task into small buckets and you show the result throughout the entire process to make sure you're going in the right direction.

02:34And people are extremely susceptible to using AI agents in a waterfall manner because they wanna give them everything to do once. The better move is agile specking. You wanna have a tight scope, a clear checkpoint, you wanna review the output, adjust it, and then repeat.

02:48To help with this, we'll tell Claude to bias towards smaller and more compartmentalized specs. Step three is you wanna be precise and use your brain.

02:56The more precise you are, the less AI has to assume. And every assumption that AI makes is a chance for it to drift from the final product you actually want. And when you have AI create a spec for you, you have to use your brain to think critically about what that spec actually says.

03:11So to help you use your brain, can say, make me verify key decisions explicitly to ensure nothing is missed. And when you put these three pieces together, we have a final prompt we can use in Claude to help create a tightly scoped, well thought out spec that aligns with our actual goal.

03:27This is a process that I call modern engineering, which every successful person has to become. Now, layer two is the verifier.

03:34Layer two sits on top of the spec. This is the verification process. One of the most frustrating things about AI is reviewing and verifying the output.

03:42And unlike a human, it can't grasp non measurable things. So how can we help AI verify its own outputs? Well, first you need to understand the mental model behind this.

03:51And Carpathi explains it as animals versus ghosts. Here's him getting asked a question about this in a recent interview, and if it sounds confusing, don't worry. I will simplify it after.

04:01And the idea is that we're not building animals. We are summoning ghosts. Why does that framing matter?

04:06And what does it actually change about how you build and deploy and evaluate or even trust them? Yeah. I think the reason I wrote about this is because I'm trying to wrap my head around what these things are.

04:16Right? Because if you have a good model of what they are or are not, then you're going to be more competent at using them. I think it's just coming to terms with the fact that these things are not, you know, animal intelligences.

04:27Like, if you'll yell at them, they're not gonna work better or or worse or it doesn't have any impact.

04:32And it's all just kind of like these statistical simulation circuits. It's more just being suspicious of it and

04:38figuring out over time. Now that's some gigabrain stuff, but let me simplify it. People, me and you, are used to interacting with people, which Carpathi is calling animals.

04:48These animals are driven by different motivators and emotions, which help produce the final product and output within a team setting. And if you say to a person, become an expert at SEO marketing in the next fourteen days or you're fired, they're going to figure it out. That's because they have these intrinsic motivations.

05:04But AI is not that. Carpathi describes it as a ghost. But in my eyes, that's a little too confusing, so throw it out the window.

05:10Instead, think of it like a robot librarian. If you ask it that same SEO question, the librarian will only suggest resources and answers based on the books in its library. If it doesn't have a book, it can't help you.

05:22And part of the challenge here is that the librarian doesn't know when it's missing a specific book. So it may just confidently make something up. And that's what's happening when AI nails math and fumbles things with context.

05:34It's brilliant because the library has the clear answers. But if it doesn't, then it's confidently wrong or uncertain, which means interacting with it like it's an animal, I e a human, doesn't help.

05:45Right? Yelling at it, pleading, just saying make this better doesn't necessarily work. Really, the only lever you have, which most people don't even think to use, is the verification lever.

05:54Because by optimizing this, it makes it so that you're playing within the actual rules that the AI follows. So how do you help AI verify the output so it's up to the standard you want? Well, there are three places to focus on.

06:05First, you wanna set the evaluation criteria upfront. Before Claude touches a single thing, whether that's technical or nontechnical tasks, define what good looks like with precision. For example, a vague way to evaluate an output is make this report look good.

06:19Whereas, a precise way would say, the report must have three sections, each ends with a recommendation. And if you're making the connection, this is very similar to what we covered in layer one. The more precise you are upfront, the less room Claude will have to make mistakes.

06:32To help enforce this, we'll add this to our verification Claude prompt. Outline the evaluation criteria you will use to ensure a high quality final product. Be precise.

06:41The second step is use a second AI model as the critic. Think of this like a second robot librarian from a different library. You use that librarian to grade the output of the first librarian.

06:52This other librarian has a whole different set of books, and that may give them insight into why this first librarian is right or wrong. Now a tactical way to do this, if you use Claude Code, you can install the Codex plug in, which will allow you to directly ask Codex questions within your Claude Code session. So you could say something like, if this turns into a complex build, run the final output by codex to ensure both systems agree.

07:15And step three is pull external signal where possible. The question here is how can you bring in additional context that will help you verify an output? Here are two concrete examples.

07:24Let's say you're deploying app and you're not sure if it's successfully deployed. What you can do instead is connect your clawed session with your system where it's deployed so it can verify that it has been deployed successfully. We are making a connection to pull external data to enhance our verification verification layer.

07:39Layer. And now, if it says that the deployment was successful, we know for certainty that it actually was. In a nontechnical example, let's say you're working on a monthly report.

07:47You could bring in your historical reports to use as reference for the exact format that the final output should be in, pulling in data and empowering the verification process. Now bringing this concept with the first two points, combining this third point with the first two points, here is a prompt that you can run-in Claude, which will help ensure that you are adding a proper evaluation layer where it makes sense.

08:07I can't stress how important this is. The creator of Claude code, Boris Churney, said it best. If Claude has a feedback loop, it will two to three x quality of the final result.

08:15So layer one and layer two are about creating specs and evaluating the output. The third layer, however, is where we build a foundation that can't be replicated. But before we get to that, if this is your first video of mine, welcome to the channel.

08:26If it's your second or more, here is our anti SLOP agreement. The visuals, the testing, the hours of research that went into this video, this is entirely built for humans, not for AI clankers.

08:36So all that I ask is that you subscribe as part of this agreement because it helps it reach more people so that I can keep making videos like this. Also, every couple of weeks, I give away a clawed max subscription, so comment below with whatever you're building to enter. Layer three, the environment.

08:49So layer one and layer two need somewhere to live, and that's layer three, which is the environment that you build in. Think of this layer as a workshop. The spec is a blueprint pinned to the wall, the verifier is the quality check stationed by the door, and then the environment is the workshop itself.

09:03You need to create the proper tooling and the proper system so that the whole thing can function at a high level. Now, the problem here is that most people use a workshop from scratch every time they use AI. And, no, if you have a single chat with your entire conversation history, that is not what I'm talking about.

09:17So, how do you create a proper workspace that improves over time? First is you need to set up a proper Claude MD file. Every time you prompt Claude, your Claude dot MD file gets injected automatically.

09:28It's essentially the first thing that Claude reads to help determine how it should operate. For example, you can add to your Claude MD before building anything multi step, include a verification plan. Now verification is forced into every build, not something that you have to remember to say.

09:41This is just one of the ways that you can improve this Claude MD and here's actually mine on the screen and I'm gonna call out a couple of sections. The first is I outline how this repo works. So think of my repo as my workspace.

09:52It gives high level to the details around it. I then tell it the custom skills and how they're routed, how to use them. I then outline the architecture of the training data or knowledge architecture so that the AI knows where to look for certain information.

10:05And then I have key working rules that it should follow no matter what. Make this your environment. It's your world and AI is living in it.

10:12It should not feel like the other way around. The second step is you need to build your LLM knowledge base. Karpathy went viral for this concept on Twitter that he calls his LLM knowledge base.

10:21And this is essentially creating a folder system on your machine that you're able to ingest your own training data in a way that makes it really easy for Claude to understand where information is. This is so important because your data is your moment and this begins the process of building out your own intellectual data property.

10:40And step three is you have to start building out your skill set. A general rule of thumb that I have is if you plan on doing something repeatedly, create a custom skill for that. Think of this like a handbook to complete a specific task.

10:50And the more you use these skills, the better they'll become. I have a saying that I tell my team, the best way to find a leak in a hose is to run water through it. And it's the same with skills.

10:59The more you use them, the more you'll realize where you need to fix them and where they're really good. Keep running water through it and your system's going to compound over time. Step four is create rules for what the AI can and can't work on.

11:10Depending on the cost of getting something wrong, you need to establish different AI guardrails. So here's how to think of this. Right?

11:16So take the claw dot m d file that I mentioned earlier. You could add a line that says don't make up information, but that's a guide, not necessarily a hard rule.

11:25So at the end of the day, AI can still ignore it. So if you have things that are critical not to get wrong, then you need to introduce rule based guardrails to ensure that the AI can't bypass them.

11:35To help you visualize this, imagine you have a folder called important don't edit. You could have a rule in Claude MD that says don't touch anything in the slash important don't edit folder. And that might get you 80% of the way there, but it's essentially a request, not a rule.

11:51Claude can still touch those files. So instead, you add a pre tool use hook before Claude uses the write or edit tool, and it checks to see the file that it's trying to edit. Now Claude literally can't make the edit and it's enforced at the tool level, not the prompt level.

12:07And as a result of this, this is now a concrete rule that the agent can't bypass. So with this in mind, bucket things into three groups. The first is always do.

12:16This is things that AI should run on autopilot. The second is ask first. So this is anything that you wanna double check.

12:22And then the third is never do. These are lines that can't be crossed that are absolutely critical not to get wrong. Here's a prompt that brings all of these four points that I mentioned to help audit your system and create an optimized environment for Claude to interact with.

12:36That's the Carpathi method end to end. The spec, the verifier, and the environment. But there's a question that needs to be answered.

12:42What's the one thing that Carpathi thinks we should focus on in the age of AI? Here's him getting asked this in an interview. What still remains worth learning deeply

12:51when intelligence gets cheap as we move into the next era of AI?

12:55You can outsource your thinking, but you can't outsource your understanding. And the thing with everything we covered here is that the three layers are centered around your understanding of the bigger picture. You need to understand your goals and what's needed to direct AI to start working for you.

13:08Now, if you like this video, you will love this one where I do a deep dive into four clawed projects that you need to build today using these three layers. I'll see you over there. Peace.

The Hook

The bait, then the rug-pull.

Austin Marchese opens by dropping a claim he earned — he attended AI Ascent 2026, heard Karpathy speak, and walked away with one uncomfortable observation: almost everyone prompting Claude is doing it wrong. The hook is not hype but a thesis that pays off for the full 13 minutes.

Frameworks

Named ideas worth stealing.

00:14list

Karpathy's Three Layers

Layer 1: The Spec
Layer 2: The Verifier
Layer 3: The Environment

The full stack needed to build 10x faster with AI agents — each layer sits on top of the last.

Steal forStructuring any complex Claude Code project from scratch

01:28list

Three Steps to a Tight Spec

Uncover the goal (Claude interviews you)
Be agile (tight scope, clear checkpoints)
Be precise (surface key decisions explicitly)

How to turn a vague task into a spec Claude can actually execute without drifting.

Steal forAny multi-step build session or client project kickoff

06:10list

Three Verification Levers

Set evaluation criteria upfront
Use a second AI model as critic
Pull external signal (connect real systems)

How to close the feedback loop so Claude's output meets the standard you actually want.

Steal forCode review, report generation, any task where quality matters

08:22list

Four Steps to a Compounding Environment

Setup a proper CLAUDE.md
Build your LLM knowledge base
Create reusable skills
Rule-based guardrails (always do / ask first / never do)

How to turn your Claude workspace into an asset that improves with every session.

Steal forAny developer or builder using Claude Code regularly

02:20model

Waterfall vs. Agile Specking

Waterfall = hand everything to the agent at once and wait for the final result. Agile = small scoped chunks, review at each checkpoint. People default to waterfall with AI because they want to offload — the better move is always agile.

Steal forAny long-running AI agent task

12:20list