Big Idea

The argument in one line.

The model inside Claude Code matters less than the harness built around it — five configurable layers determine whether an agent holds up as a codebase grows, and the order in which you build those layers is itself a best practice.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use Claude Code on a project with more than one architecture area or multiple developers.
You have noticed Claude hallucinating modules or editing the wrong file as your codebase scales.
You are deciding whether to rely on the built-in Claude Code harness or build a custom one for your team.
You work with unconventional languages like C++ or custom DSLs where context navigation is unreliable.

SKIP IF…

You are building small greenfield projects where the default Claude Code setup already works well.
You have already read the source Anthropic article and want original analysis rather than a narrated walkthrough.

TL;DR

The full version, fast.

Anthropic published a best-practices guide for running Claude Code on large codebases, and this video walks through each layer. The core argument: the model alone does not determine output quality — the harness does. That harness has five ordered extension points: CLAUDE.md files (capped at 300 lines, split per subdirectory in monorepos), hooks (scripts that force deterministic behavior), skills (on-demand expertise that loads only when needed), plugins (distributable bundles for teams), and MCP servers (connections to internal tools). LSP integrations and sub-agents round out the picture, with LSP giving symbol-level navigation and sub-agents protecting the main context window by handling delegated tasks in isolation.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:31

01 · Problem statement

Agents fail when codebases scale; unconventional languages make it worse.

00:31 – 01:21

02 · RAG vs. filesystem navigation

Why embedding-based retrieval fails at scale and how file-system-based navigation replaced it.

01:21 – 02:47

03 · The harness thesis

The ecosystem built around the model determines performance more than the model alone; five extension points introduced.

02:47 – 04:13

04 · Layer 1: CLAUDE.md

Context file loaded each session; keep under 300 lines; split per subdirectory in monorepos; update as models evolve.

04:13 – 04:58

05 · Sponsor

CleanMyMac sponsor segment.

04:58 – 06:33

06 · Layer 2: Hooks

Scripts that force deterministic agent behavior — session-start, PreToolUse, and the stop hook for CLAUDE.md self-improvement.

06:33 – 08:22

07 · Layer 3: Skills and Plugins

Skills load on demand with progressive disclosure; plugins bundle skills, hooks, and MCP configs into distributable team packages.

08:22 – 09:21

08 · Layer 4: LSP

Language Server Protocol gives the agent symbol-level navigation — critical for C++ and unconventional languages.

09:21 – 11:05

09 · Layer 5: MCP servers

Connect the agent to internal tools, data sources, and APIs; must configure after the base app is working, not before.

11:05 – 12:28

10 · Sub-agents

Isolated context windows that handle delegated tasks and return only final output, enabling parallelization and protecting the orchestrator context.

12:28 – 14:06

11 · Practical extras

Per-subdirectory test suites, codebase map file for unconventional languages, .ignore files, and periodic harness review as models evolve.

Atomic Insights

Lines worth screenshotting.

The harness around a model determines output quality more than the model itself — a weak harness wastes a strong model.
RAG-based coding tools fail at scale because embedding pipelines cannot keep pace with thousands of engineers committing new code daily.
CLAUDE.md should stay under 300 lines; longer files distract the agent with context it does not need for the current task.
Monorepos need per-subdirectory CLAUDE.md files that load progressively — one root file cannot serve every architecture area without context pollution.
Instructions written for an older model can actively work against a newer one; CLAUDE.md needs model-aware maintenance, not set-and-forget.
The stop hook is the most underused hook type — it pushes the agent to reflect and propose CLAUDE.md updates while session context is still fresh.
Skills use progressive disclosure: they load only when the task calls for them, keeping session context lean in every other situation.
Plugins bundle skills, hooks, and MCP configs into a single installable package, making team-wide context distribution a one-command operation.
LSP integrations give the agent symbol-level navigation — without them, Claude pattern-matches on text and frequently lands on the wrong symbol.
MCP servers should be configured after the base app is working, not before — premature setup regularly causes implementation failures.
Sub-agents hold isolated context windows, so delegating exploration tasks to them protects the main orchestrator context from noise.
Parallelizing sub-agents is the key lever for making large-project workflows dramatically faster than sequential execution.
A codebase map file acts as a table of contents for the agent — critical for unconventional languages where training data cannot fill the gap.
Review and prune CLAUDE.md every few model generations; accumulating legacy instructions is a token tax with no upside.
Per-subdirectory test suites avoid timeout issues and allow tests to be scoped precisely to the area being changed.

Takeaway

Five layers that decide whether Claude Code scales.

WHAT TO LEARN

The model is only one variable — the ordered harness of CLAUDE.md, hooks, skills, plugins, and MCP servers is what determines whether an agent holds up as a codebase grows.

Keep the root CLAUDE.md under 300 lines and create per-subdirectory files in monorepos so the agent loads focused instructions rather than one bloated context file.
Update CLAUDE.md when a new model ships — instructions written for older models can become active constraints on newer ones that no longer need them.
Use the stop hook to let the agent propose CLAUDE.md updates at the end of each session, while context is fresh and failures are still visible.
Scope skills to relevant directory paths so they load only when the task requires them — context saved in one session compounds across hundreds of sessions.
Distribute team context via plugins rather than per-developer setup; one installable package carries the skills, hooks, and MCP configs everyone needs.
Install LSP before writing any project code, not after problems appear — symbol-level navigation is especially critical for C, C++, and unconventional languages.
Build MCP servers only after the base app is working; premature MCP configuration regularly fails because there is no stable foundation to connect to.
Sub-agents protect the main orchestrator context window — delegate exploration tasks to isolated sub-agents and get only the final result back.
Parallelize sub-agents for independent work streams; sequential execution is the default but is rarely the right choice on large projects.
Create per-subdirectory test suites instead of one global suite to avoid timeouts and to let tests run scoped to the area being changed.
Add a codebase map file for projects using unconventional languages — it acts as a table of contents so the agent does not waste bash calls navigating blind.
Review the entire harness setup every few model generations and remove anything that newer models no longer need.

Glossary

Terms worth knowing.

Agent harness: The configurable environment around a coding agent that shapes behavior independently of the underlying model — made up of CLAUDE.md files, hooks, skills, plugins, and MCP servers.
RAG (Retrieval-Augmented Generation): An architecture that embeds a codebase into a vector database and retrieves relevant chunks at query time. Works on small codebases but breaks at scale because the index lags behind commits.
CLAUDE.md: A project context file that Claude Code reads automatically at the start of every session. It carries conventions, architecture notes, and rules for the codebase.
Hooks: Shell scripts triggered at specific points in the agent lifecycle — session start, before a tool call, after a session ends — that enforce deterministic behavior.
Skills: On-demand context files that load only when the agent needs them for a specific task, using progressive disclosure to keep session context lean.
Plugins: A distributable bundle of skills, hooks, and MCP configurations installable as a single package to propagate consistent harness setup across a team.
LSP (Language Server Protocol): A protocol providing IDE-level symbol intelligence — go-to-definition, find-all-references — to any editor or agent, giving Claude precision instead of text-pattern matching.
MCP (Model Context Protocol): A protocol for connecting an agent to external or internal tools, data sources, and APIs as callable tools, extending the agent beyond file-system navigation.
Sub-agents: Isolated Claude instances with their own context windows that handle delegated tasks and return only final output to the parent, preventing noise from polluting the main orchestrator context.
Codebase map file: A supplemental file that maps project directory structure as a table of contents for the agent, reducing bash calls needed for navigation in unfamiliar languages.

Resources

Things they pointed at.

04:58productCleanMyMac by MacPaw ↗

08:10toolSuperpowers (open source harness)

13:40productAI Labs Pro

Quotables

Lines you could clip.

02:44

“The ecosystem built around the model — the harness — determines how Claude Code performs more than the model alone.”

standalone thesis, no setup needed→ TikTok hook↗ Tweet quote

06:08

“Instructions in CLAUDE.md can get blurred in the agent's attention span due to too many things to focus on, but hooks actually force Claude to act.”

concrete contrast between soft instructions and enforced behavior→ IG reel cold open↗ Tweet quote

03:42

“The claude.md should stay short, ideally around 300 lines.”

specific, actionable, quotable number→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

Nowadays, shipping small projects has become really easy, but agents start failing the moment the code base grows large and gets multiple dependencies. The issue gets even worse if you are working with unconventional languages, where errors and issues become even harder to trace.

What people miss is that you need to take proper steps before making the agents work on large code bases, and this is exactly what Anthropic talks about here. They cover how to actually handle projects when they scale. It was really insightful because these are things we ourselves have been using in our own projects and have found pretty helpful.

Before we go into detail on how to set up a project at a large scale, let us first understand how the agents navigate around the code in general. There are two ways they do this. The first is rag based.

This works by embedding the entire code base and retrieving the relevant chunks at query time. Based on your query, it runs a semantic search, which matches your query with the code in its database. From the similarity matches, it loads that specific context for the model to analyze and work ahead from.

This might work for small scale apps, but it does not sustain on large scale ones. This is because there is a central database that maintains the data, and if there are a lot of files in the database, the semantic matching might be problematic. This is the reason coding agents hallucinate modules that no longer exist.

Exactly because of its issues, the rag based approach has been completely replaced. The other type is file system based navigation, which is what Claude code and most other agents now use.

This is similar to how software developers actually navigate. The agent uses bash tools, finds files with the l s command, then greps and narrows down to the exact code snippet it needs and loads that into context. Bash tools work because they do not pollute the context window with unnecessary snippets.

So this mode handles all the ways rag based systems were failing, and almost all coding agents now navigate this way. The thing here is that no matter how models are improving on their own, the model alone does not determine how good the code you are able to produce will be. An even more important thing that matters when it comes to working systems is what harness you use for coding.

So whichever tool you use, whether it is Claude code, codex, or Gemini CLI, the output you get is not solely defined by their powerful models. It also depends on the harness you combine with the model's capabilities.

If the harness is weak and the model is strong, there is no point in the model being strong on its own. Now we know agents like Claude code and Codex have strong inherent harnesses, but this does not mean you have to rely on those entirely. You need to set up a harness tailored to your project directly so it fits your project better.

There are also open source harnesses like superpowers, and you can use any of those when you are building something. But when you are developing a large project, these harnesses might not sustain, and you would need to set up your own anyway.

Every agent harness you build on your own or pull from shared chats contains five pieces centered on how Claude's jobs and agentic loops are configured environmentally. We will go through each. The first piece in the agent harness is the claud dot m d file, which is loaded at the start of the session and remains in memory for the entire session.

This file is really important because it gives Claude the knowledge base for the code base. We have already done a separate video on how to write and structure a proper Claude dot m d which you can check out on the channel. When your code base grows large Claude dot m d becomes critical.

If you do not spend time on it your project is bound to fail at scale. This file is for project conventions, code based knowledge, and the do's and don'ts that apply across the entire code base, not just a single aspect. This might be fine if your code base is small, but it becomes a problem the moment you scale into multiple architectures.

So stuffing every aspect of the code into one file is highly inefficient. It distracts the agent with information it does not need at the moment. That's why the claude.md should stay short, ideally around 300 lines.

And if you are running a monorepo with multiple areas, each subdirectory should have its own claud.md following the same rules.

The agent progressively loads it when working in that directory. So instead of pulling everything from the root file, it gets more focused instructions from the subrepo files. This file is not something you write once and rely on forever.

We need to maintain it actively not only as the project evolves, but also as model intelligence evolves. The principles applicable for Sonnet 4.5 will definitely not apply for Opus.

Newer models are trained to overcome patterns that were failing in earlier instructions. So giving the same instructions to every model just wastes tokens. But before we move forwards, let's have a word by our sponsor, CleanMyMac.

If you work with AI tools like we do, your Mac quietly piles up junk, old builds, cache, broken downloads, and you don't notice until it starts lagging. I run CleanMyMac every week, and it frees up over 15 gigs in a single scan. That's it.

One click, and my Mac was brand new again. CleanMyMac is built by MacPaw, Apple notarized, and trusted by over 29,000,000 people for seventeen years.

The cleanup feature removes over 20 types of junk so your system stays fast without babysitting it. Space lens maps your drive visually so you know what's eating up space. It even scans your iCloud, Google Drive, and Dropbox locally for unsynched files wasting cloud storage, and it catches 99% of known malware through MoonLock so your Mac stays clean and secure.

Your Mac should keep up with you, not the other way around. Use code AI labs for 20% off and try clean my Mac free for seven days. Now hooks are another important thing that helps when working with these large code bases.

They are basically scripts that let the agent take specific actions based on certain conditions. There are many types of hooks you can configure usually written as shell scripts that control the agent's behavior. For example, you can configure a session start hook, which loads the information you want at the start of each session, like which files Claude should load for context.

You can also use a hook with exit code two and feed the error message back to Claude so it can iterate on that. Pretool use hooks are another type. Whenever the agent uses whichever tool you have configured the hook for, it runs your commands.

You can use it to prevent Claude from editing files you do not want it to touch. But one of the most important hooks is the stop hook, which runs after a session ends. This pushes Claude to reflect on what has been done so far.

From that, it can update the claude. M d with the learnings from the session so the same issues do not happen again. You can also configure hooks for linting, running tests, and many other purposes.

All of these strung together help a lot with large scale code bases. Hooks force the agent to do things it should be careful about where instructions in Claude dot m d alone may not suffice. Instructions in Claude dot m d can get blurred in the agent's attention span due to too many things to focus on, but hooks actually force Claude to act.

The third piece in the workflow is skills. It is a set of skills dot m d files and other grouped files that load on demand instead of being present in every session and bloating it unnecessarily. Skills are important because they use progressive disclosure and are tailored to perform a specific specialized task needed for the workflow.

They expand the agent's knowledge of something it is already capable of doing. If you put these instructions in claw dot m d, they just consume unnecessary tokens. Project specific instructions should go into skills because they load only when the agent actually needs them.

You can also scope skills to specific paths so they only activate in the relevant part of the code and do not bloat the context outside of that. For example, if you are working in the deployment area, you can specify the path of that directory in the skill description so the skill is never loaded when you are working elsewhere.

To configure skills, you just invoke the skill creator that now comes built into Claude code. Previously, you had to get it open source from GitHub, then you answer the questions it asks during the discussion session. You will have a skill tailored to your exact needs, which you can access once you restart the session.

Aside from skills, you can also use plugins. Plugins are a bundle of skills, hooks, and MCPs available as a single downloadable and distributable package. So whoever installs this plugin will have the exact same context and configurations made available for their use right away.

So if you are working in a team, creating your own plugins to distribute to teammates becomes really important. If you set up all your configs in one place, that information can be distributed across the organization so your team members have the same context as you. You can do this by creating your own plugins and managing them by either manually uploading them or syncing with a GitHub repository.

You can install any plugin using the plugin command, and you can browse the marketplace and install whichever one you want. You can also add other marketplaces using the add plugin marketplace command. Claude code also comes bundled with multiple plugins like front end design, code review, code simplifier, playwright, and others all from the Claude official marketplace.

You can use them directly in your workflow and you can create your own as well. Plugins matter especially for large scale projects because a lot of people work on the same project and distributing context among them is important. So instead of making each person download skills and other components separately, they can install the plug in directly.

Also, you are enjoying our content, consider pressing the hype button because it helps us create more content like this and reach out to more people. Another thing that matters in agent harnesses but is not talked about enough is LSP. Language server protocol or LSP is basically an integration that gives the agent the same kind of navigation a developer has in an IDE.

There is an LSP for almost any programming language, and it might be unnecessary with popular ones, but it becomes critical with unconventional ones. It gives the agent intelligence about the programming language so it can navigate the code base the way a human does. For example, when a human wants to find a function, they check where that function is imported from, go to that file, and check that file for the function's definition.

That is how they actually find the exact source they need. Without LSP, the agent pattern matches based on text and is likely to land on the wrong symbol.

As we mentioned, Claude code uses the file system based approach with bash commands. So without LSP, it is just pattern matching on file names and text, not navigating with deeper intelligence. Now do not assume LSP is not needed just because your agent has not run into errors yet.

Set up LSP even before you start working on the project. Configure it for all the languages you will use even before writing any code so the agent already has information on how to work with them. Instead of letting the agent guess patterns, installing LSP lets it read and edit code the way a developer thinks about it, just as text.

Now as you already know, MCP is used to connect the agent to external tools, but you can also connect your MCPs to your project's internal tools, data sources, APIs, or other systems the agent otherwise cannot reach. For that, you need to create your own MCPs and make them available so people on your team can use them easily.

MCPs are basically an extension to the existing setup loaded whenever they are needed and the tools they provide are then available for the agent to use. If you are working on a large code base, you can build MCPs that serve many purposes like acting as a documentation guide, retrieving analytics, or even letting you make changes through them.

These are helpful because if you have your own code base, you can let the agent naturally interact with internal information, call tools, and make changes there instead of fumbling through huge documentation. This gives the agent more direct access to the information and systems it needs. But to configure an MCP, the basic setup of the app needs to already be working.

If you configure your MCP before that, things can go wrong and the MCP implementation may fail. So first, make sure your app is working properly, then create the MCP, and let the agent interact with your project with more intelligence and better information. Another thing you need to create is sub agents.

Sub agents contain isolated context windows of their own and do whichever task is delegated to them by the main orchestrator agent, then return only the final output to the parent. This is a key part of an agent harness because using sub agents properly does not bloat the context window and makes context utilization much better since they do not fill the main agent's context with information it does not need.

Sub agents only run when invoked and then return their findings. Claude spins off sub agents on its own, but you can configure sub agents yourself as well. You can configure whichever tools and models you want for them and provide instructions on how they should operate, creating specific agents for your own workflows.

You can also override Claude's existing agents. For example, you can create your own agent whose instructions override existing ones like explore and provide description on how it should navigate around your directory. Claude's own explore agent is generalized for all kinds of code bases, but if you configure your own, the custom one overrides the default.

This gives the agent more context on how the files in your project are structured, so it does not waste tokens navigating files relying only on the information in claude.md. So you can make the main agent control the whole project execution and rely on sub agents for the actual work. Sub agents also help because you can parallelize their work through agent delegation, which makes the workflow much smoother and faster than doing everything sequentially.

There are a few more practices you need to follow when navigating around a large code base. This is important because Claude's ability to navigate a large code base is determined by whether it is able to find the right context. So ensuring Claude gets the right context is important.

So the agent does not get too little or too much and stays focused. Aside from separating the Claude dot m d file, you need to separate tests for each subdirectory instead of having them all in one place. This way they stay segmented, avoid time out issues when a lot of tests run at once, and can be scoped more effectively.

You can also create a separate code based map file that maps your project structure. If you are working with conventional apps like React or Next. Js, you can skip this because the agents have been trained extensively on those.

But with unconventional languages like c plus plus, you need a code based map. It acts as a table of contents for the agent, letting it know where each file lives instead of running a lot of bash commands to narrow down to the right one. Lastly, but most importantly, review your setup every few months as the model evolves.

Remove the instructions, hooks, or anything else that the newer model no longer needs. Use dot ignore files like dot git ignore and dot agent ignore so the files you do not want the agent or version control to touch are left alone. This way your setup will be able to sustain on large scale apps.

Now the resources for this video can be found in AI labs pro for this video and for all our previous videos from where you can download and use it for your own projects. If you found value in what we do and want to support the channel, this is the best way to do it. The link's in the description.

That brings us to the end of this video. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below. As always, thank you for watching, and I'll see you in the next one.

The Hook

The bait, then the rug-pull.

Small projects ship easily. Large ones break agents. That is the premise this video builds from — and it is the right one. The real question is not which model you are using but what surrounds it.

Frameworks

Named ideas worth stealing.

02:47list

Five-Layer Agent Harness

CLAUDE.md files
Hooks
Skills
Plugins
MCP servers

The five ordered extension points for building a project-specific harness around Claude Code. Build them in order because each layer depends on what came before.

Steal forany team onboarding doc or Claude Code project setup checklist

CTA Breakdown