Big Idea

The argument in one line.

AI agents produce better code when you encode strict, process-driven workflows as reusable skills that force them to mirror how senior engineers think through design, testing, and architecture.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

A developer using Claude Code or similar AI agents who wants to establish repeatable processes that keep autonomous coding assistants on track and improve output quality.
An engineer integrating AI into your development workflow who needs practical patterns for steering agent behavior without relying on memory or context carryover.
A software engineer with 3+ years of experience who's experimenting with AI-assisted coding and wants tactical skills to encode your domain knowledge into agent instructions.

SKIP IF…

You're new to programming or haven't shipped production code — this assumes solid engineering fundamentals and experience debugging complex systems.
You're looking for Claude Code setup tutorials or beginner onboarding — this is advanced applied technique, not foundational tooling instruction.
You work primarily in non-code domains or don't use Claude Code as part of your daily workflow — these skills are specific to that tool and process.

TL;DR

The full version, fast.

Process matters more than ever when working with AI agents that have no memory, and structured skills are how you encode that process so agents walk the same path every time. The approach chains five composable skills: a grill-me skill that forces the agent to interview you down every branch of the design tree before writing a plan, a write-a-PRD skill that turns the shared understanding into a GitHub issue describing the destination, a PRD-to-issues skill that slices the destination into vertical tracer-bullet tickets with blocking relationships, a TDD skill that drives a strict red-green-refactor loop, and an architecture skill that spawns parallel sub-agents to propose deeper module designs. Treat agents like humans with weird constraints, refactor toward deep modules with thin interfaces, and code quality compounds.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:18

01 · Intro - process is the product

Sets up the argument: agents have no memory, so process documentation is the multiplier. Shows the skills repo README.

01:18 – 03:55

02 · /grill-me

Three-sentence skill forcing Claude to interview you about every branch of a design tree before touching code. Live 16-question session demonstrated.

03:55 – 06:00

03 · /write-a-prd

Five-step skill producing a GitHub issue PRD with problem statement, solution, user stories, and implementation decisions.

06:00 – 08:29

04 · /prd-to-issues

Converts the destination PRD into tracer-bullet GitHub issues with blocking relationships, safe for parallel agents.

08:29 – 12:04

05 · /tdd

Red-green-refactor loop for autonomous agents. Tests verify behavior through public interfaces, not implementation.

12:04 – 16:18

06 · /improve-codebase-architecture

John Ousterhout deep-module philosophy. Spawn 3+ sub-agents with radically different interfaces, compare, recommend hybrid, create GitHub RFC.

16:18 – 16:42

07 · Course pitch + outro

Claude Code for Real Engineers - 2-week cohort. Curriculum overview shown.

Atomic Insights

Lines worth screenshotting.

The grill-me skill is only three sentences long and reliably produces sixteen or more clarifying questions on a complex feature — proof that skill length has no correlation to impact.
The design tree framework from Frederick Brooks treats a feature design as a branching structure where every decision creates downstream dependencies that must be resolved before coding begins.
Invoking grill-me after giving Claude some research forces it to walk every branch of the design tree and surface decisions the developer had not thought through.
The write-a-PRD skill skips steps already completed by grill-me — it recognizes when a deep interview has already happened and jumps directly to the module sketching phase.
PRDs are submitted as GitHub Issues, which makes them the source of truth for a Ralph loop that processes each issue until it is done.
User stories in the PRD describe desired system behavior in language — the format is agile-inspired but the goal is durability so the PRD stays useful as the code evolves.
Process is more important than ever when the engineers have no memory — strict, well-defined workflows are the only way to steer agents consistently toward useful output.
The write-a-PRD skill chain (grill-me → PRD → GitHub Issue → Ralph loop) is a complete autonomous development pipeline that runs without human intervention once the spec is locked.
Implementation decisions in a PRD should be non-prescriptive to stay durable — over-specifying the implementation creates conflicts when the code diverges from the document.
A grilling session of 30 to 50 questions on a really complex feature is normal and worthwhile — the time invested upfront eliminates far more time lost to misaligned builds.
Skills encode process so the agent has a strict path to walk every single time — this is the mechanism by which code quality improves as a skill library grows.
The five daily skills together form a complete development loop: understand the problem, write the spec, verify the modules, execute the build, and review the result.

Takeaway

Steal the skill chain.

Process-first Claude Code playbook

The five skills form a complete, repeatable workflow from vague idea to production-quality code, and each one is three to fifty lines of markdown.

Build a grill-me skill first: three sentences that force the model to interview you before touching code. Ship nothing before this.
Make write-a-prd output GitHub issues, not local docs. Issues become the memory the agent fetches on every task.
Use prd-to-issues to break work into vertical slices through ALL layers, not horizontal layer-by-layer sprints.
Wire tdd into your autonomous loop so every issue gets red-green-refactor by default.
Run improve-codebase-architecture weekly. Shallow modules are the number one reason AI outputs degrade over time.
Skills compound: the chain is the system, any skill in isolation is half the value.

Glossary

Terms worth knowing.

Claude Code: Anthropic's command-line coding agent that runs in a terminal and edits, runs, and reasons about code in a repository. Used here as the primary interface for invoking AI engineering workflows.
Skill: A reusable instruction file that tells an AI coding agent how to perform a specific task or follow a specific process. Skills are invoked by name and inject their contents into the model's context to steer behavior.
Agent: An autonomous AI worker that reads instructions, plans steps, and executes code-related actions like editing files, running commands, or opening pull requests without step-by-step human input.
Plan mode: A Claude Code mode where the agent drafts a written plan before touching any files, letting the human review and adjust the approach before implementation begins.
LLM: Large language model — the underlying AI that generates text and code from prompts. In coding agents, it is the engine doing the reasoning and writing.
Context window: The fixed amount of text a language model can consider at once, including prior messages, instructions, and code. Once it fills up, older information must be summarized or dropped.
Design tree: A branching map of design decisions where each choice spawns further sub-decisions that must be resolved before committing to an implementation. Walking the tree forces every dependency between decisions to be made explicit.
PRD: Product requirements document — a written spec that defines the problem, the desired solution, and the user stories a feature must satisfy. Acts as the durable destination description for a piece of work.
User story: A short statement describing a feature from the user's point of view, typically framed as 'as a user, I want X so that Y.' Comes from agile methodology and is used to capture desired behavior in plain language.
Cucumber language: A Given-When-Then syntax for writing executable behavior specifications, popularized by the Cucumber testing framework. Used to express user stories in a structured, testable form.
GitHub issue: A tracked task or ticket inside a GitHub repository, used to record bugs, features, or work items. Issues can reference each other to establish parent-child or blocking relationships.
Kanban board: A visual workflow board where tasks move through columns like To Do, In Progress, and Done. Used here as a metaphor for a queue of independently grabbable issues.
Vertical slice: A small piece of work that cuts through every layer of a system — UI, logic, data, integrations — to deliver one end-to-end behavior. Contrasts with a horizontal slice that only builds one layer at a time.
Tracer bullet: A development approach where the team builds a minimal end-to-end path through the whole system first, then iterates. Borrowed from gunnery, where tracer rounds let the shooter see where shots are landing and adjust aim.
Unknown unknowns: Risks or problems you do not yet know exist. Good task ordering attacks these first so they surface early, while there is still time to change the plan.
Blocking relationship: A dependency between tasks where one issue cannot start until another is finished. Tracking these explicitly lets agents pick up only the work that is currently unblocked.
Ralph loop: An autonomous loop that repeatedly picks the next open task, runs an AI agent against it, commits the result, and moves on until the queue is empty. Named after a popular pattern for unattended agent runs.
TDD: Test-driven development — a practice where you write a failing test first, then write just enough code to make it pass, then clean up. Forces design decisions to be made through tests rather than guesswork.
Red green refactor: The three-step rhythm of test-driven development: write a failing test (red), make it pass (green), then improve the code without changing behavior (refactor).
Interface: In software design, the set of functions, types, or endpoints a module exposes to its callers — the contract — separated from the internal implementation that fulfills it.

Resources

Things they pointed at.

01:50bookThe Design of Design

08:05bookA Philosophy of Software Design

00:56productClaude Code for Real Engineers ↗

01:12linkMatt Pocock Skills Repo ↗

Quotables

Lines you could clip.

00:59

“Skills don't have to be long to be impactful. You've just gotta choose the right words for the LLM at the right time.”

Tight quotable thesis.→ TikTok hook↗ Tweet quote

15:05

“If you have a garbage code base, the AI is gonna produce garbage within that code base.”

One-liner, visceral truth, no setup needed.→ IG reel cold open↗ Tweet quote

15:15

“The most successful way to get code quality up from agents is just to treat them like humans. Humans with weird constraints.”

Memorable framing, encapsulates the whole video.→ newsletter pull-quote↗ Tweet quote

00:30

“At your fingertips now, you have access to a fleet of middling to good engineers that you can deploy at any time. But the weird thing about these engineers is they have no memory.”

Strong cold-open energy, sets stakes fast.→ TikTok hook↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

00:00I've been an engineer for nearly a decade. And in all of that time, right now, process has never been more important. At your fingertips now, you have access to a fleet of middling to good engineers that you can deploy at any time.

00:13But the weird thing about these engineers is they have no memory. They do not remember things they've done before, and so you need extremely strict and well defined processes to get those agents to actually do things that are useful. So this means that you as a developer are looking constantly for ways to steer your agents, to keep them on the right track.

00:32And for me, that has resulted in a lot of skill building. Here's the repo of all the skills that I'm using right now, each of which I have gone through and designed. Some of these I use relatively rarely, but some of them I use every single day.

00:44And these skills help me encode my process so that AI has a really strict path it can walk down every single time. And as a result of using all of these skills, the code quality that the AI is producing has shot up. Now if you think that process is important and that real engineering skills are important, then boy, do I have a course for you.

01:02This course is called Claude Code for Real Engineers. It's a two week cohort that starts on March 30. And for seven more days, it is 40% off.

01:12If you feel like you're behind the curve on Claude code and you want to get way ahead of the curve in just two weeks, then blimey, this is the place for you. But let's start talking about our skills with number one, which is maybe my favorite. This is the grill me skill.

01:25This skill, yes, it is just three sentences long, and let's just read it out in full to describe what it does. Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree resolving dependencies between decisions one by one.

01:40And finally, if a question can be answered by exploring the code base, explore the code base instead. The concept of a design tree comes from this book by Frederick p Brooks, which is the design of design.

01:50Actually, Actually, don't know if it comes from this book, but this book is where I saw it first. The design tree is this idea that as you're coming towards a design, you need to walk down all of the branches of a design tree. For instance, you might be designing a search page and you need to decide whether you want an advanced search or a text box.

02:05If you choose advanced search, then you need to figure out all of the filters and all of the sorting methods that you need on advanced search. And you keep on walking down the tree until you figure out your design kind of in full or as full as you can before committing to code. This grill me skill, when I invoke it, I invoke it when I want to reach a shared understanding with the LLM.

02:25I found that relatively recently, Claude code will tend to just spit out a plan really early when I go in plan mode, and it tends to just create a document before I feel I've reached a shared understanding with the LLM. But the grill me skill forces that conversation. It forces the LLM to interview me about every single part.

02:42Here's a conversation I had with Claude recently about adding a feature to my course video editor code base. I gave it some research that I'd done in a markdown file and I said, grill me. I'd like to think about adding this to the right page.

02:54It loaded up the skill and the thing I want to show you is just how many questions it asked me. So the first thing it did is it just explored the relevant stuff in the code base, which is good. Then we zoom down, we can see it ask question one, where does the document live?

03:06Question two, what's the UI layout? Question three, which modes get the document panel? Question four, the document life cycle?

03:11Question five, what does the right document tool look like? Question six, the edit tool shape. Question seven.

03:17Question all the way down to question nine, question 10, question 11, question 12, all the way down to question freaking 16 here. And this is a relatively short grilling session in my book.

03:28I've had sessions where I've sat there for nearly half an hour forty five minutes with the AI answering questions on really complex features. You know, that could be thirty, forty, 50 questions all from this absolutely tiny skill.

03:41That's one thing I want you to take from this. Skills don't have to be long to be impactful. You've just gotta choose the right words for the LLM at the right time, and this design tree resolving dependencies has just been absolutely great for me.

03:52By the way, if you want these skills, then they will be at a link below. Once I have reached a shared understanding with the LLM, once I have grilled my idea and sort of understood all of its ramifications, if I then decide I want to implement it, then I invoke my next skill, is a write a PRD skill.

04:08I actually did this in the conversation we were just looking at. So it said anything I've missed or got wrong, and I said write a PRD. I was suffixing it with user because I have some that sort of live in the project.

04:18So that's the reason why I did that. Here's what the skill looks like. This will be invoked when the user wants to create a PRD.

04:23You may skip steps if you don't consider them necessary. So for instance, in the previous conversation, it said, we've already done a deep interview. Let's move to step four.

04:29So step one is to ask the user for a long detailed description. Then number two is to explore the repo to verify their assertions. Number three is basically to interview the user relentlessly.

04:38So just a copy of the grill me skill again. Next, we sketch out the major modules you will need to build or modify to complete the implementation. We're gonna look at this later because it links to skills I'm gonna show you in a bit in this video.

04:49And finally, once you have a complete understanding of the problem and the solution, use the template below to write the PRD, and the PRD should be submitted as a GitHub issue. The way that my dev flow works is I take these PRDs in GitHub, I turn them into more GitHub issues that reference the parent PRD, and then I have a Ralph loop that just loops over each issue until it's done.

05:09If we go back to the conversation where we were before, we can see that it created this PRD here. This was four days ago as you can see. We've got a problem statement.

05:17The article writing page currently regenerates the entire document on every AI interaction. And the solution was to add a split pane document editing experience to the article writer. Chat stays on the left, a new document panel blah blah blah.

05:27So this is a big feature. We're adding document editing to a kind of AI chat feature. The important thing here is the user stories.

05:34There are many, many user stories as part of this. And this comes from agile methodology, and we're basically trying to describe the kind of desired behavior of our system in language, which is not an easy thing to do.

05:44I still haven't properly, like, landed on the right format for these. This is just something I sort of like. But you could easily use, like, Cucumber language for these or whatever you're kind of used to do, used to working with.

05:55We then zoom down to the bottom, and we just sort of pass in some implementation decisions. The implementation decisions here, we don't want to be, like, over prescriptive because we want these to be durable. Because if the code ends up getting out of date with the PRD, then we're gonna have issues when we actually go to implement it.

06:11But you can see the theory here. This is the kind of it's a really good description of the destination that we're going to.

06:18But what we don't have from the PRD is the actual journey, is the is the way we're gonna get to this destination. And if we lead back to that conversation, this is where I use my next one, which is PRD to issues. What this does is it takes a PRD, takes the destination, and it turns it into a Kanban board of different issues that can be independently grabbed.

06:38So the first step in here is it locates the PRD. If the PRD is not already in your context window, fetch it with this instruction. Explore the code base if you need to.

06:46And then draft vertical slices. It's not always clear how you should break a PRD down into individual tasks.

06:54This is something that developers have been doing for yonks. Right? And we've developed a kind of intuition for how to do it.

07:00In my opinion, the best way to do it is to break it into tasks that flush out the unknown unknowns really quickly. For instance, if you're integrating with a new kind of service or integrating two things which you haven't integrated before, then you should do that work first because it's gonna give you feedback on whether your approach is even valid.

07:16The right analogy here is the tracer bullet analogy. I won't go into what that means, but basically each issue is a thin vertical slice that cuts through all integration layers, not a horizontal slice of one layer.

07:27In the conversation, it broke down that really complicated PRD into just four slices. It first created a kind of engine with some tests applied to it. This is actually quite a good vertical slice because this was the engine that was going to then power the rest of the kind of setup.

07:42If this engine wasn't working for whatever reason or it wasn't feasible, then we would need to flush that out quickly. And this is what this, um, breakdown does. The PRD two issues also establishes blocking relationships between the tasks.

07:54For instance, number two here is not actually blocked by anything, so it can be picked up independently to one. This is really useful if you have a parallel agent setup where you can actually fire two agents at it at once, for instance, in, like, background tasks. And it also means that in the future, you can add other issues to this, like, uh, QA issues that you find or things that need to be improved, and you can then establish blocking relationships between that and the other things.

08:18We can see that number three here is blocked by one, the editing engine, and the number four, the Monaco editor toggle is blocked by number two. So I said yes to all of these, and it created then all of these GitHub issues. These issues reference the parent p r d so that the local agent can fetch it and view it.

08:34And it sort of just breaks down what to build really. And crucially, it references the previous user stories in the p r d.

08:41We can then see a comment actually from Claude code that ended up implementing this. It said a pure function document editing engine with 28 tests covering all acceptance criteria. And we can then take a look at the commit that references this issue.

08:52So this was basically my Ralph loop came and just implemented this based on the issue, commented on it, closed it, and then the next issue was unblocked. So so far, the grill me skill can help you flesh out an idea. The write a PRD skill can help you take that idea and turn it into a document.

09:08And then the PRD ish or PRD two issues skill helps you then turn that destination document into an actual journey. But then how do you actually execute on that skill? How do you make it like how do you make the implementation really rock solid and increase the code quality of what gets produced?

09:25We have got a TDD skill. TDD means test driven development.

09:30And when you invoke this skill, it basically forces the agent or encourages the agent rather to follow a red green refactor loop. Unusually for my skills, there is actually a lot in here. So it's not just the skill itself.

09:42It's also, uh, ideas on refactoring, on mocking, on what deep modules are. Doing really really good TDD has been the most consistent way that I've improved agents outputs. So let's have a look at what's actually in here.

09:54What we can see is I'll just skip over the philosophy stuff. I'll let you guys read that. We are basically looking at this workflow.

10:00Yeah. Now the first one here is really important. Confirm with the user what interface changes are needed.

10:06Now I made a video on interfaces and implementation recently, but let me just give you the pricey. When an AI looks at a bad code base, it will look at or it will see something like this where it has a ton of tiny modules here that are kind of undifferentiated. They're not really grouped together.

10:21It doesn't really understand how these things relate. And And so it has to do a lot of work kinda working out, okay, what's responsible for what? What are the dependencies?

10:28How does this actually how does the code base even function? Whereas if you restructure this into several larger modules with just kind of thin interfaces on top, the interface being the functions that are actually exported from this, the, uh, things that the callers actually call, then it's a lot easier for AI to navigate this code base, and it's a lot easier to work out how to test these modules because you just test them at their interfaces.

10:52You test them at their boundaries. You can check out the whole video on that below. So what this TDD skill is encouraging here is basically trying to make these interface changes really top of mind for the AI to get it to understand that when it changes an interface, that's an important decision it needs to take time over.

11:08You confirm with the user which behaviors to test. You design the interfaces for testability linking to a doc. And then we have some more stuff around planning here.

11:16It then goes into a lovely loop where it writes one test at a time and it writes the test first. Now I've talked about red green refactor before, so I'll link the video below if you're interested. But I found that red green refactor with agents is incredible.

11:30And it basically does this loop until it's complete. It just writes a failing test, then writes the code to make that test pass. And finally, it goes through and looks for refactor candidates.

11:39I haven't found that this is amazing. It hasn't been brilliant because often LLMs are quite, uh, no. They're quite reluctant to refactor their own code.

11:48If you were to clear the context of the LLM, then it would just sort of wipe its own memory, and it would be a lot less precious about the code that it's just written. But while its own code is sitting in its own context window, it's quite reluctant to change it. So this TDD skill is what I prompt my Ralph loops with in order to get them to do red green refactor.

12:05Now TDD demands a lot of you or rather it demands a lot of your code base. TDD is really hard to do in a badly structured code base because the test boundaries of this are really unclear. Should it just sort of test these modules on their own?

12:19Should it test these modules on their own? What are the boundaries here? Whereas when your code base looks more like this, then it's a lot easier to test because the module boundaries are really clear.

12:28So wouldn't it be great if there was a skill that made your code base look more like this? Well, isn't it nice? We've got an improved code base architecture skill.

12:36The process for this one is that we explore the code base and explore it kind of like naturally as an agent would. We're trying to find confusions. We're not like we're trying to sort of surface naturally what the AI finds confusing so that it can then sort of, like, help it out later.

12:52Where does understanding one concept require bouncing around between many small files? Where have pure functions been extracted just for testability but the real bugs hide in how they're called? Where do tightly coupled modules create integration risk in the seams between them?

13:06All of these are questions that a senior engineer would be asking about your code base. Number two is you present candidates. So you present a numbered list of deepening opportunities.

13:15In other words, opportunities to deepen shallow modules in your code base into deeper ones. The user then picks a candidate, and then you design multiple interfaces.

13:25So it says to spawn three sub agents in parallel, each of which must produce a radically different interface for the deepened module. In other words, we're extracting that code and designing possible ways that it could look in the future.

13:37Designing it in multiple different ways is a really great way that you can then decide on the right idea. I've seen this agent spawn like five different sub agents for a really big refactor. The coolest thing about this is you don't need to know a lot about interface design in order to get this working.

13:51After comparing, give them your recommendation which design you think is strongest and why. And if elements from different designs would combine well, then propose a hybrid. Notice that I've made this really language agnostic, really kind of sort of everything agnostic, really.

14:05You can just run this in any code base and just get a decent answer for how it could be improved. There might be four or five candidates that really could use some work. But really, I think you should only be sort of doing one of these at a time because they really are quite hard to get your head around.

14:20And they require a human in the loop to sit with them and improve the code base because these decisions do require taste. Finally, it creates a GitHub issue. So it creates a refactor RFC as a GitHub issue using g h issue create.

14:33Usually, once this is done, I will then go with my PRD to issues, uh, skill, reference that GitHub issue that's just been created and get it to, you know, this describes the destination. We then need a journey to get there. So just doing this every so often in a code base, you know, once a week just to identify opportunities.

14:49Or if you have a sudden surge of development and you kind of create a whole sort of extra wing of features, then this skill will be really really useful in just making sure it conforms to the rest of the code base, making sure that it's not too sloppy. And as you keep running this, as you keep refining your code base, you're gonna notice the quality of the agent's output goes up.

15:09Because the old adage really does apply. If you have a garbage code base, then the AI is gonna produce garbage within that code base. Because to be honest, if you took all of these skills and just said, okay, this is like a little mini markdown book of processes for humans, then it wouldn't look out of place.

15:24I found that the most successful way to get code quality up from agents is just to treat them like humans. Humans with weird constraints. Sure.

15:32Humans that, uh, have no memory and are just sort of cloned come out of the birthing pod and go right to work. But if you like me think these real engineering skills are super important, then this course is absolutely for you. What I noticed while I was creating the course is that I'm really not teaching Claude code that much.

15:48I'm teaching kind of what are sub agents. I'm talking about the constraints of LLMs, the sort of weird smart zone dumb zone stuff with a context window. We're talking about steering, which is essentially just a way of documenting stuff inside your code base.

16:01How to tackle massive tasks, understanding tracer bullets, and building those into our skills. Understanding how to build really great feedback loops and doing exercises with them and crucially how to hook these up to an autonomous agent.

16:13Every part of this course just sort of like leads onto the other and I'm super happy with how it turned out. So over the course of two weeks, you'll be working through that self paced material with me as your guide in Discord and on live office hours. And if that sounds fun to you, then the link is below.

16:28Thanks for watching folks. I'll be coming back with a lot more stuff this week. What would you like me to cover next?

16:32I find the intersection between this real engineering and AI is like it's such a awesome place to make content about. But anyway, thanks for watching, and I'll see you in the next one.

The Hook

The bait, then the rug-pull.

Process is the product. Matt Pocock opens with a deceptively plain observation: AI agents have no memory, which means the only way to get reliable output is to encode senior engineering decision-making into the prompts themselves.

Frameworks