Modern Creator
AI Engineer · YouTube

Full Walkthrough: Workflow for AI Coding

A 96-minute live workshop on using software engineering fundamentals to get autonomous coding agents to ship real, non-slop features.

Posted
2 months ago
Duration
Format
Tutorial
educational
Views
1M
21.7K likes
Big Idea

The argument in one line.

Software engineering fundamentals — small tasks, tight feedback loops, shared design concepts, and deep modules — are what make autonomous AI coding agents produce high-quality output, and skipping them is why most developers are frustrated with AI code.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You write code with AI every day and keep hitting walls where the agent drifts into the dumb zone or produces garbage after the first few sessions.
  • You are a solo developer or small-team lead trying to build a reliable night-shift pipeline where agents ship features autonomously while you are away.
  • You want a reproducible workflow that takes a vague brief all the way to a committed, TDD-tested, code-reviewed feature with AI doing the heavy lifting.
  • You are curious how classic software engineering books map directly onto AI agent patterns.
SKIP IF…
  • You want a step-by-step tutorial for a specific tool or CLI — this is framework-level thinking delivered through a live demo, not a beginner setup guide.
  • You already run a mature agentic orchestration stack and are looking for eval benchmarks or model comparisons rather than workflow philosophy.
TL;DR

The full version, fast.

LLMs have a smart zone of roughly 100k tokens, and the entire workflow is designed to stay inside it. A slash-command grill session stress-tests a vague brief and builds a shared design concept between developer and AI before a single line of code is written. That conversation becomes a PRD — the destination document. The PRD is sliced into vertical Kanban issues that each cross all system layers, enabling the agent to get integrated feedback after every issue. An autonomous AFK loop runs TDD against those issues. The ceiling on output quality is the quality of the feedback loops: codebases with shallow modules and no tests produce slop regardless of model size.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0004:20

01 · Introduction & Thesis

Workshop kickoff; audience poll on AI coding experience; core claim that SE fundamentals work for AI.

04:2012:45

02 · Smart Zone & Memento

LLM attention quadratic scaling; the 100k smart zone; compacting vs. clearing; multi-phase plan as precursor to DAG.

12:4522:10

03 · The Grill Me Skill

Live /grill-me demo on a gamification brief; shared design concept vs. plan; sub-agents; 25k tokens of alignment.

22:1035:50

04 · Q&A: Grilling & Alignment

Specs-to-code critique; meta-prompting tool ecosystem; who should run grill sessions; 1M context window reality check.

35:5048:15

05 · Writing the PRD

/write-prd demo; destination doc structure; vertical vs. horizontal slicing; tracer bullet concept; proposed modules in the PRD.

48:151:05:30

06 · Slicing into Issues & AFK Agent

Kanban board from PRD; DAG blocking relationships; parallelization; Ralph loop prompt walkthrough; TDD red-green-refactor live.

1:05:301:18:45

07 · QA, Code Review & Human Touch

Human QA as taste mechanism; more code review is unavoidable; team workflow for planning phases; prototype role in front end.

1:18:451:23:30

08 · Deep vs. Shallow Modules

Ousterhout deep module concept; AI defaults to shallow; /improve-codebase-architecture skill live scan; big integration test boundaries.

1:23:301:34:06

09 · Parallelization with Sandcastle

Sandcastle TypeScript library; planner-implementer-reviewer-merger pipeline; push vs. pull for coding standards; Opus review / Sonnet implement; final summary.

Atomic Insights

Lines worth screenshotting.

  • LLMs go dumb at roughly 100k tokens regardless of context window size — a 1M window is just more dumb zone, not more smart zone.
  • Compacting preserves sediment and introduces drift; clearing resets to a known state every time.
  • The goal of the grill session is a shared design concept, not a plan — AI in plan mode produces a plan before you have reached alignment.
  • Specs-to-code fails not because specs are bad but because it encourages treating the code as irrelevant; the code is the battleground.
  • AI codes horizontally by default — layer by layer — which delays integrated feedback until the final phase; vertical tracer-bullet slices fix this.
  • A tracer bullet issue must cross schema, service, and UI in a single ticket so the agent can test the entire integrated flow.
  • A sequential multi-phase plan can only be worked by one agent; a DAG of issues with blocking relationships enables genuine parallelization.
  • The ceiling on AI coding quality is the quality of your feedback loops — agents coding without tests and type checks silently ship bad code.
  • AI unaided produces shallow modules — many tiny files with small exports — because it codes by spreading changes across layers.
  • A deep module has a large complex interior behind a small simple interface; wrap a big integration test boundary around it.
  • Leaving a closed PRD as a markdown file in the repo causes doc rot — the agent finds it, trusts it, and drifts from the actual codebase.
  • Push coding standards to the reviewer agent so they are always in context; let the implementer pull them on demand.
  • Human QA is the mechanism for imposing taste — automating it entirely produces apps that technically function but feel like slop.
  • The human role shifts to a day shift: grilling, PRD, Kanban slicing. The agent handles the night shift: implementation, TDD, automated review.
  • Using Opus for code review and Sonnet for implementation is a deliberate cost-quality tradeoff — reviewing requires more reasoning, implementation more throughput.
Takeaway

The agent is only as good as the codebase you hand it.

WHAT TO LEARN

Classic software engineering discipline — shared alignment, tight feedback loops, and deep modules — is the multiplier that separates high-output AI coding from expensive slop generation.

01Introduction & Thesis
  • AI is a new paradigm only in tooling — the underlying discipline of writing good software still determines output quality.
02Smart Zone & Memento
  • Size every task to stay inside roughly 100k tokens by clearing context between sessions rather than compacting, which accumulates noise.
  • A 1M context window gives you more dumb zone, not more smart zone — the smart ceiling has not risen proportionally.
03The Grill Me Skill
  • Run a structured grilling session before writing any plan; the goal is a shared design concept with the AI, not an asset you hand to the AI.
  • Sub-agents that explore the codebase before the grill session add accuracy without bloating the parent context window.
04Q&A: Grilling & Alignment
  • Specs-to-code fails because it encourages you to ignore the code; keep the codebase in view throughout planning and use it to sanity-check every proposed module.
  • Own your planning stack rather than delegating it to a third-party framework — when it breaks, you need to know how to fix it.
05Writing the PRD
  • The PRD is a destination document and a definition of done — not a spec you hand to the AI and then stop reading the code.
  • Slice your PRD into vertical tracer-bullet issues so the agent gets integrated feedback after every issue, not only at the end of a horizontal phase.
06Slicing into Issues & AFK Agent
  • A DAG of Kanban issues with explicit blocking relationships enables parallel agent runs; a numbered sequential plan does not.
  • The ceiling on agent output is your feedback loop quality; agents coding without tests and type checks produce garbage silently.
  • TDD red-green-refactor is harder for the agent to cheat than writing tests after implementation — it instruments the code before writing it.
07QA, Code Review & Human Touch
  • Human QA is not a bottleneck to automate away — it is the mechanism for imposing taste, and removing it produces apps that work but feel like slop.
  • Expect to do more code review than ever before; there is no shortcut for reviewing agent-generated output.
08Deep vs. Shallow Modules
  • AI defaults to shallow modules — many small exports with little logic; intentionally design deep modules with simple interfaces and big integration test boundaries.
  • Prefer closing issues over keeping completed PRDs as markdown files in the repo; stale documentation causes agents to drift from the actual codebase.
09Parallelization with Sandcastle
  • Push coding standards to the reviewer agent so they are always in context; let the implementer pull them on demand to avoid bloating every implementation session.
  • Classic pre-AI software books — Pragmatic Programmer, Brooks, Ousterhout, Fowler — already codified the principles that make AI agents effective; they are the highest-leverage prompt engineering resource available.
Glossary

Terms worth knowing.

Smart Zone
The portion of an LLM context window — roughly the first 100k tokens — where output quality is highest before attention relationships become strained.
Dumb Zone
The portion of a context window beyond the smart zone where the model makes increasingly poor decisions; the failure mode of long uncleared sessions.
Compacting
Squeezing a long conversation history into a shorter summary to reclaim context space, at the cost of accumulated noise from the summarization.
Grill Me Skill
A Claude Code slash command that interviews the developer relentlessly about a brief — one question at a time with a recommended answer — until AI and human reach a shared design concept.
Design Concept
Frederick Brooks's term for the shared mental model of what is being built, held by all participants; the grill session is explicitly trying to build this between human and AI.
PRD
Product Requirements Document — a destination document summarizing the shared design concept, user stories, implementation decisions, and out-of-scope items used as the AI's definition of done.
Tracer Bullet
From Pragmatic Programmer: a development unit that crosses all system layers end-to-end, providing immediate integrated feedback — named after phosphorescent bullets that show a gunner where they are aimed.
Vertical Slice
A Kanban issue that touches schema, service logic, and UI in a single ticket, enabling the agent to produce testable integrated output after each issue rather than one layer at a time.
Ralph Loop
An autonomous AFK agent loop: the agent picks the next Kanban issue, implements it with TDD, runs feedback loops, commits, and repeats until the backlog is empty.
AFK
Away From Keyboard — tasks the agent can complete without human involvement. Contrasted with HITL (Human In The Loop) tasks like grilling and QA.
Deep Module
From John Ousterhout: a module with a simple public interface and a large amount of logic inside, easy to test with a big integration boundary and easy for agents to reason about.
Shallow Module
A module that exports many small functions with little internal logic; the default AI output, difficult to test meaningfully and hard for agents to navigate.
Sandcastle
A TypeScript library (@ai-hero/sandcastle) for running agent loops in parallel Docker sandbox worktrees, with a planner-implementer-reviewer-merger pipeline.
DAG
Directed Acyclic Graph — a Kanban board structure where issues have explicit blocking relationships, allowing parallel execution of non-blocked branches by independent agents.
Doc Rot
The failure mode where an outdated document remains in the repo, is discovered by the agent, and causes drift from the actual codebase because the agent trusts the stale doc.
Resources

Things they pointed at.

40:50bookPragmatic Programmer
26:20bookThe Design of Design (Frederick P. Brooks)
1:19:40bookA Philosophy of Software Design (John Ousterhout)
34:20bookRefactoring (Martin Fowler)
1:29:40linkBeads Framework (Steve Yegge)
Quotables

Lines you could clip.

32:00
AI is giving you more dumb zone, not more smart zone.
Punchy reframe of the 1M context window hype — lands immediately with no setup needed.TikTok hook↗ Tweet quote
34:20
The code is your battleground. You cannot ignore it.
Direct counter to the specs-to-code movement, standalone and quotable.IG reel cold open↗ Tweet quote
27:20
I needed a shared design concept. I didn't need an asset. I didn't need a plan.
Captures the central philosophical shift of the whole talk in one sentence.newsletter pull-quote↗ Tweet quote
1:08:20
If you try to automate the taste, you end up with apps that are just slop.
Visceral one-liner on why human QA cannot be removed from the loop.TikTok hook↗ Tweet quote
1:03:20
The ceiling is the quality of your feedback loops.
Distills the entire TDD argument in a single transferable rule.newsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:14Yeah. We good? Okay, folks.
00:17We're at capacity. Let's kick off.
00:21I don't want you waiting here for twenty five more minutes before we some arbitrary deadline. So welcome.
00:28My name is Matt. I'm a teacher, and I suppose now I teach AI.
00:35We have a link up here, if you've not already been to this, which is has the exercises for the stuff we're gonna do today. This is gonna be around two hours, so we might just sort of kick off two hours from now. Is that alright, Mike?
00:47Yeah. Perfect. And the theory behind this talk, or at least the thesis under which I've been operating for the last kinda six months or so, is that we all think that AI is a new paradigm.
01:01Right? AI is obviously changing a lot of things. You guys are obviously interested in this, and that's why you've come to this talk.
01:07And I feel that when we talk about AI being a new paradigm, we forget that actually software engineering fundamentals, the stuff that's really crucial to working with humans, also works super well with AI.
01:25And this is what my keynote is on tomorrow, really. I'm gonna sort of be flashing that out a lot more. And in this workshop, I'm hopefully gonna be able to direct your attention to those things and, uh, hopefully, show you that I'm right, but we'll see.
01:41Can I get a quick heads up first? How many of you guys are coding have ever coded with AI?
01:47Raise your hand if you've ever coded with AI. Perfect. Okay.
01:51Keep your hand raised.
01:54Let's all share those armpits with the world. How many of you code every day with AI?
02:01Cool. Okay. Rick, keep your hand raised if you've ever been frustrated with AI.
02:07Okay. Very good. You can put your hands down.
02:11Thank you for that show of obedience. I really appreciate that. And we are also being livestreamed to the Gielgud Room as well.
02:16I've not, uh, did we send someone up to the Gielgud Room to just check they're okay? Don't know.
02:22But I see you, uh, and there is a way that you can participate, which is we have the, um, a q and a. We're gonna be doing kinda I have a sort of hatred of q and a's because they're not very democratic.
02:33They're mostly the sort of most talkative people get to get to participate and share.
02:39And so we're gonna be going through this QA here. So why don't we have to wait till 03:45?
02:44The room is packed. Doors are closed. 100% agree.
02:47And so if you want to, uh, ask a question, we're gonna be I would like you to pile into this async, and then we can vote on each other questions and hopefully get the best questions surface for the for the entire room to enjoy. So I wanna talk about first the kind of weird constraints that LLMs have.
03:06And those weird constraints are sort of what we have to base a lot of our work around. Now there's a guy called Dex Horthy who runs a company called Human Layer, and he came up with this idea, which is that when you're working with LLMs, they have a smart zone and a dumb zone.
03:27When you're first kind of like working with an LLM and it's like you just started a new conversation, you start from nothing, that's when the LLM is gonna do its best work. Because in that situation, the attention relationships are the least strained.
03:40Every time you add a token to an LLM, it's kind of like you're adding a team to a football league. You think of the number of matches that get added every time you add a team to a football league.
03:50It just goes it scales quadratically. And that's because you have attention relationships going from essentially each token to the other that are positional and the sort of meaning of the individual token.
04:02And so this means that by around sort of 40% or around, I would say, around a 100 k is kind of my new marker for this. Because it doesn't matter whether you're using 1,000,000, uh, context window or 200 k.
04:15It's always gonna be about this. It starts to just get dumber. So as you continually keep adding stuff to the same context window, it just gets dumber and dumber until it's making kind of stupid decisions.
04:27Raise your hand if that feels familiar to you. Yeah? Cool.
04:31So this means that we kind of want to size our tasks in a way that sticks within the smart zone. Right?
04:39We don't want the AI to bite off more than it can chew. And this goes back to old advice, like Martin Fowler in refactoring. Uh, like, uh, the pragmatic programmer talks about this.
04:49Don't bite off more than you can chew. Keep your tasks small so that you, as a developer, a human developer, don't freak out and don't start acting and going into the dumb zone. But how do you tackle big tasks?
05:04How do you take a large task like, I don't know, cloning a company or something or just doing something crazy, and how do you break it into small tasks so they all fit into the dumb zone? One way, of course, you could do is I mean, kind of what the AI companies maybe want you to do, or the natural way of doing it is just keep going and going and going.
05:23You end up in the dumb zone, charging you tons of tokens per request. You then compact back down. We'll talk about compacting properly in a minute, and you keep going, keep going, keep going, compact back down, keep going, keep going, keep going.
05:34And I think that's doesn't really work very well because the more sediment I will talk about that in a minute. So the theory here is then, and this is what I was doing for a while, is I would use these kind of multiphase plans where I would say, okay.
05:51We have this sort of number four thing here, this large, large task. Let's break it down into small sections so that we can then kind of chunk it up and do each little bit of work in the smart zone. Raise your hand if you've ever used a multiphase plan before.
06:05Yeah. Really common practice. Right?
06:07And this is kind of how we've been doing it. This is how I was doing it up until December last year, really. And any developer worth their salt will look at this and go, this is a loop.
06:19Right? This is a loop. We just got phase one, phase two, phase three, phase four.
06:24Why don't we just have phase n? Right? Phase n, where we essentially just say, okay.
06:32We have, let's say, a plan operating in the background, and then we just loop over the top of it, and we go through until it's complete. And this is where raise your hand if you've heard of Ralph Wiggum as a software practice.
06:44Okay. Cool. Raise your hand if you've not heard of Ralph Wiggum as a software practice, actually.
06:47That's more like it. Okay. So there's this idea called Ralph Wiggum, uh, which is kind of, um, sort of based on this, which is essentially all you need to do is sort of specify the end of the journey where you just say, okay.
07:01We create a PRD, a product requirements document to say, oh, okay. Let's describe where we're going. And then we just say to the AI, just make a small change.
07:09Make a small change that gets us closer and closer to that. And Ralph works okay, but I prefer a little bit more structure.
07:17So that's kind of where we got to in terms of thinking about the smart zone, and that's kind of where I want you to first start thinking about here. Another weird constraint of LLMs is LLMs are kind of like the guy from Memento.
07:31Right? They just continually forget. They could just keep resetting back to the base state.
07:36Let me pull up this diagram. I sort of I I I really should use slides, but I just prefer just like randomly scrolling around a infinite TL draw canvas.
07:46Thank you, Steve.
07:49So let's say another concept I want you to have is that every session with an LLM kind of goes through the same stages. You have, first of all, the system prompt here. This gray box here is essentially the stuff that's always in your context.
08:03You want this to be as small as possible. Because if you have a ton of stuff in here, if you have 250, like I have seen people put in there, then that you're just gonna go straight into the dumb zone without even being able to do anything.
08:17So you want this to be tiny. You then go into a kind of exploratory phase. This blue is sort of where the coding agent is going out and exploring the code base.
08:26Then you go into implementation, and then you go into testing and some of making sure that it works, running your feedback loops and things like this.
08:34Raise your hand if that feels familiar based on what you've done. Yep. Sort of the, like, the the main cornerstones of any session.
08:42And when you clear the context, you go right back to the system prompt. You go right back there. So you delete everything that's come before.
08:51And raise your hand if you've heard of compacting as well. Yeah.
08:56Okay. There are some people who've not heard of compacting. So let's just quickly show what that means.
09:00For instance, I've just been having a little chat with my LLM.
09:06I wanna make sure we sort of, you know, just cover the basics so we're all sort of on the same wavelength here. I've just been having a chat with my LLM.
09:14I've been talking about a thing that I want to build. How's the font size? Shall I bump it up?
09:18Folks in the back? Bump. Bump.
09:21Bump. Bump. Bump.
09:22I'm using ClawCode for this session, but you don't need to use ClawCode. In fact, it's often nice not to use CLOUR code.
09:30So I've been having a chat with the LM just sort of planning out what I'm gonna do next. It's asking me a bunch of questions, and I can I highly recommend you do this?
09:40There's this tiny little status line here that tells me how many tokens I'm using, the exact number of tokens I'm using. Um, I have a article on my website, AI Hero, if you wanna copy this. This is oh, wow.
09:53That is that shakes, doesn't it? This is essential information on every coding session because you need to know exactly how many tokens you're using so that you know how close you are to the dumb zone. Absolutely essential.
10:06And so let's watch it. So I've got two options. I can either clear, wrong, and go back to nothing, or I can compact.
10:15And when I compact, then it's going to squeeze all of that conversation, which admittedly isn't very much, into a much smaller space. And this, in diagram terms, kind of looks like this, where you take all of the information from the session and you essentially create a history out of it, a written record of what happened.
10:36And devs love compacting for some reason, but I hate it. I much prefer my AI to behave like, uh, the guy from Memento because this state is always the same, always the same every time you do it.
10:49You clear and you go back to the beginning. And so if you're able to do that and you're able to optimize for that, then you're in a great spot. So that's kind of the two things I want you to think about with LLMs, the two constraints that we're working with.
11:01They have a smart zone and a dumb zone, and they're like the guy from Memento. So let's take a look at the first exercise.
11:09And I'm while I'm doing this, the way I want this to work is I'm gonna sort of show you how, um, I'm gonna be sort of walking through it up here, and I want you folks to be kind of like tapping away and doing things as well. So that was just a little lecture bit. Let's now actually get and do some coding.
11:25For anyone who arrived late or anyone in the Gilbert Room, uh, go to this link. This link up here to see the exercises and clone the repo.
11:38You absolutely do not have to. You can just watch me do it if you fancy it. But let's go there myself, and let's see what exercises await us.
11:44So, essentially, I've built a this is from my course.
11:49This is a a course management platform, essentially a kind of CMS instructors, for students, and this is what we're gonna be building a feature in.
11:59So I'm gonna take you from essentially the idea for the feature all the way up to building a PRD for the feature all the way up to implementing the feature. And, hopefully, you can take inspiration from this process and use it in your own work. So let's kick off.
12:16So we're gonna start by using a a skill which is very close to my heart. It's the grill me skill.
12:23And this grill me skill is wonderfully small, wonderfully tiny, and it helps prevent one of, I think, the main issues when you're working with an AI, which is misalignment.
12:37The the sort of silent idea that I'm talking against here, that I'm arguing against, is the specs to code movements. Anyone heard of the specs to code movement?
12:47Raise your hand. It's not really a movement. I suppose it's just sort of people saying specs to code.
12:53What it is is people say, okay. You can write a program or you want to build an app. The best way to build that app is to take some specifications.
13:02So to write some sort of like document, and then turn that document into code. So just turn it into code.
13:10How do you do that? You pass it to AI. If there's something wrong with the resulting code, you don't look at the code, you look back at the specs.
13:17You change the specs, and you sort of just keep going like this. This is kind of like vibe coding by another name where you're essentially ignoring the code.
13:25You don't need to worry about the code. You just sort of keep editing the specs, and eventually, you just keep going. And I tried this.
13:31I really tried it, and it sucks. It doesn't work. Because you need to keep a handle on the code.
13:37You need to understand what's in it. You need to shape it because the code is your battleground. And so this again is where we're going.
13:45Let's let's get some exercises. So what I'd like you to do is go to this page, the the grill me skill.
13:51And inside the repo here, we have a Slack message from our pal.
13:58Where is it? It's in the root of the repo and it's under.
14:04Where is it?
14:07Client brief dot m D. It's a Slack message from Sarah Chen. For some reason, the Claude always chooses Sarah Chen as the name.
14:13I don't know why. It's saying that in Cadence, our, um, course platform, our retention numbers are not great.
14:21Students sign up, do a few lessons, then they drop off. I'd love to add some gamification to the platform.
14:26And so when you're presented with an idea like this, you need to find some way of turning it into reality. Let's say Sarah Chen is your client. You're on a tight budget.
14:34You need to get this done fast. How do you go and do it? Raise your hand if you would enter plan mode when you're doing this.
14:43Anyone big user of plan mode? Yep. Let's actually shout out quickly.
14:47Any other ideas about what you would do with this? Or raise your hand if you what what would be your first port of call? Yeah.
14:54Ask Sorry? Just to verify what is the purpose and where are the parents leaving?
15:00Yes. Exactly. Let's imagine that Sarah Chen's gone on hold.
15:02You have no idea. Right? Uh, she's just posted this thing.
15:05You need to action it before you go. Well, my first port of call is I go for this particular skill. I'm gonna clear my context.
15:15I'm going to get rid of you. You don't need to be there.
15:20And I'm gonna say, I'm gonna invoke a skill, which is the grill me skill.
15:26Let's quickly check. Raise your hands if you don't know what this is. Cool.
15:32Oh, sorry. Sorry. Let me be more specific.
15:34Raise your hands if you don't know what I'm doing here when I do a forward slash and then type something. Anyone everyone kind of understand what that is? I'm invoking a skill.
15:45I'm invoking the grill me skill. And what I'm gonna do is I'm gonna say grill me, and I'm gonna pass in the client brief. So now the LLM really has only a couple of things here.
15:58It just has the skill, and it has the description of what I wanna do.
16:04And this is virtually how I start every piece of work with AI. And while it's exploring the code base, I'm just gonna show you what the grooming skill does.
16:14So this is inside the repo, so you can check it out. It's extremely short. Interview me relentlessly about every aspect of this plan until we reach a shared understanding.
16:23Walk down each branch of the design tree, resolving dependencies one by one. For each question, provide your recommended answer. Ask the questions one at a time, blah blah blah.
16:34What this does and what I noticed when I was working with AI, especially in plan mode actually, is it would really eagerly try to produce a plan for me.
16:45It would say, okay, I think I've got enough. I'm just going, poof, plan. Plan.
16:49And what I found was that I was really trying to find the words for this, for for what I wanted instead of that. And Frederick p Brooks in the design of design, he has a great quote, uh, talking about the design concept.
17:05When you're working on something new with someone, when you're all trying to build something together, then there's this shared idea that's shared between all participants, and that is the design concept.
17:18And that's what I realized I needed with Claude. I needed I needed to reach a shared understanding.
17:25I didn't need an asset. I didn't need a plan. I needed to be on the same wavelength as the AI, as my agent.
17:31And this is an extremely effective way of doing it. So hopefully we go.
17:35Nice. It has done its exploration, first of all. It's invoked a sub agent, which spent 97 93.7 k tokens on Opus, and it's asked me the first question.
17:50Cool. We can see that even though the sub agent burned a ton of tokens, I haven't actually, uh, increased my token usage that much.
17:59Raise your hand if you don't know what sub agents are. It's an important question. Everyone kind of clear what sub agents are?
18:06Okay. I'll give a brief definition, which is that this this sub agents thing here, this explore sub agent, is essentially gone and called another LLM, which has an isolated context window, and then that LLM has reported a summary back.
18:21So a sub agent is kind of like a delegation. You're delegating a task to a sub agent. It goes eagerly, all the thing, explores a ton of stuff, and then just drip feeds the important stuff back up to the orchestrator agent, to the parent agent.
18:34So okay. So hopefully, you guys have seen the same thing. It's done on explore, and we now have our first question.
18:41Points economy. What actions earn points and how much? Oh, okay.
18:45At this point, you can ask it, by the way, questions to deepen your understanding of the repo. I obviously know this repo really well because I wrote it, but you might not, um, know what's going on.
18:55So let's say, my recommendation, keep it simple, two point sources to start. What's so nice about this is that not only does it give us a question that kind of aligns us here, we get a recommendation too.
19:07And often what I'll find is the AI's recommendations are really good. And so I'll just say, skip video, watch events, they're noisy and gameable.
19:14I agree. Sarah's asked, well, keep lessons in the bread and butter. Yeah.
19:21Looks good, pal.
19:24Now what I usually do is I usually dictate to the AI. I'm usually actually chatting to the AI instead of typing here. But this is a relatively new laptop, and I couldn't get my dictation software working on it because Windows is crap.
19:38So should points be retroactive? There are existing lesson progress records.
19:45We're completing that time stamps. This is a really nasty question. Right?
19:48Should we actually go back and backfill all of the lesson progress events? This is a kind of question that you need to be aligned on if you're gonna fulfill the feature properly. This is not something I considered, and Sarah Chen certainly didn't consider.
20:01Do I want it to be retroactive? Let's actually do a vote inside here. Should we go back and backfill all the records?
20:09Raise your hand if you think we should backfill all the records.
20:13Raise your hand if you think we shouldn't backfill all the records. There are a lot of, uh, fence sitters in the room. I'm gonna say, you know, this is the kind of discussion you're sort of having with AI.
20:24You're getting further aligned. Yes. I'm just gonna go with this recommendation because I'm lazy.
20:31Notice too how I'm able to keep in the loop here with AI. I'm not you know, it's it's pinging me these questions pretty quickly. I'm not having to go off and check Twitter or something.
20:42Levels. What's the progression curve? Yeah.
20:44That looks about right, for instance. Yes. Okay.
20:47So, hopefully, you should be able to go and, um, kind of work through this with the AI and essentially try to reach an alignment.
20:56And this grill me skill, this can last a long time. This can I've had it ask me 40 questions. I've had it ask me 80 questions.
21:03I've had some people that asks a 100 questions to. Literally, you're sat there for an hour chatting to the AI. And what you end up with is essentially this conversation history that works really nicely and works really nicely as an asset of the design concept that you're creating.
21:19This can also function like this. You can have a meeting with someone who's maybe a domain expert. Maybe I have a meeting with Sarah.
21:27I feed that meeting transcript into, uh, I don't know, Gemini meetings or whatever you guys are using. You take that, you feed it into a grilling session, and you grill through the assumptions that you didn't have.
21:39So this ends up being a really nice kind of a really nice way of just taking inputs from the world and then just turning and validating them. So okay.
21:51Let's see. I really wanna get to the end of this, but I also don't wanna just, like, be sat here talking to the AI in front of you for a thousand days.
21:58So I'm just gonna say, yes.
22:03Let's see what happens. So I tell you what. Um, while you guys sort of have a little fiddle with this locally, let's start a little q and a session now.
22:12And let's see. How is this gonna work?
22:15Can we keep the door closed? I'll turn up the microphone. It's quite noisy.
22:19Let's see. Mike, can we door closed?
22:23Oh, it has been closed. Mark has answered. Beautiful.
22:26So what I'd like you to do, is there any air con? Yeah, there is some air con, I think. There is some air con.
22:34You guys aren't being lit here. I'm being I'm being fried alive here. Uh, so what I'd like you to do is go on to the Slido, which you can join here.
22:42Have a if if you're not taking the exercise, go on to the Slido, have a little fiddle, and vote on some good questions. I'm just going to chat to the AI for a second until we reach a stopping point.
22:53So do Streaks earn points?
22:57Streak's as standalone.
23:06Let's see what else it comes up with.
23:12Where does gamification UI live? Let's have it in the dashboard. I'm just gonna scan these and blast through them, basically.
23:21So how are doing with our Slido? Okay. Have I tried SpecKit, OpenSpec, or Taskmaster instead of the GrillMe skill?
23:30Do I find them more verbose or a structural alternative? This is a great question. So there are a ton of different frameworks out there that allow you to sort of build up this planning process for you.
23:41I personally believe you at at this stage, when there's no clear winner, when there's no kind of, like, one true way and when things are changing all the time, you need to own as much of your planning stack as you possibly can. What I've noticed in a lot of my students is they tend to overuse a certain stack.
24:03They get into trouble, and they because they don't own the stack and they don't have observability over the whole thing, they just go, this isn't working.
24:12This sucks. Whereas if if you have control over the whole thing, then at least you know how to fix it or potentially know how to fix it.
24:21So I'm even though I'm sort of giving you a stack, basically, I believe in inversion of control, and you should be in control of the stack.
24:32So can I place it at zero, please?
24:38Sorry?
24:40Sorry. That was a lot of sort of mumbling. Can I feedback?
24:43You have four options on the bottom of the bottom, and they want you to hit this next question. Thank you. I'm
24:50so sorry. Well, you didn't wanna give Claude good feedback. Why what's wrong with you?
24:57Okay. Cool. Many of the questions asked by the Grimley scale are not necessarily appropriate for a developer, rather a PO.
25:04In larger teams, who should use it? Yeah. Raise your hand if you've ever done pair programming.
25:12Anyone ever done pair programming? Right. Now keep put your hands down, and raise your hand again if you've ever done a pair programming session with an AI.
25:20Right. How did it go? Was it good?
25:23Enjoy it? I think pair programming sessions with AI is a great idea because you've got a third person in the room who will relentlessly quiz you and ask you questions. It should if you don't know the answer, it should be you, the domain expert, and the AI in the same room.
25:36If you have a question about implementation, it should be you, a fellow developer, and the AI in the same room. You know? You can be sort of working through these questions in your team.
25:45And I think, actually, we're gonna look at implementation in a bit, and we're gonna see how you can make implementation so much faster. And but I think the really crucial decisions, the ones you need humans for, you actually need a lot of humans, and it doesn't really matter how many humans are in there.
26:01You can actually throw a bunch, like, I'm kind of like mob programming with AI, essentially. What's my favorite meta prompting tool? I think I kind of answered that.
26:11There's no air con. Let's just live with it. How do I use the conversation as an asset after the grill me session?
26:17Well, we're gonna get there. Okay.
26:21So I really want to I want to speed this up sort of artificially. Just what Ralph looped up.
26:30This is the thing. So someone just said, okay. Ralph looped this.
26:33But this is crucial because I can't loop over this. Right? I can't I think of there as being two types of tasks in the AI age where you have human in the loop tasks where a human needs to sit there and do it, which is this.
26:50We are the human in the loop. We're multiple humans in the loop, and there are AFK tasks. There are tasks where the human can be away from the keyboard, and it doesn't matter.
26:59Implementation, as we'll see, can be turned into an AFK task. But planning this alignment phase has to be human in the loop.
27:07Has to be. So I've gotta do it, unfortunately. I don't know.
27:13Give me a long list of all your recommendations.
27:20I'm running a workshop right now. So I artificially need you to pull more weight.
27:31So let's see what it does. Let's answer a couple more questions while it's doing its thing. What is my opinion on PMs or other non dev roles via coding tasks?
27:45I'm going to return to this later, I think. I'm gonna leave this unanswered. A bit of mystery.
27:53I noticed I'm not using the ask user questions UI for grill me why there's a specific UI that you can bring up in Claude code, which I'll answer this just quickly.
28:03Ask me a question using the ask user question tool.
28:10And this UI is just sort of broken in Claude, and I really hate it. You notice I'm using Claude, but I don't like Claude very much. Like, you you really are free with this method to choose any system you like.
28:24And this is what the UI looks like. It's very pleasing when you first encounter it, but then you realize it is actually broken in a ton of different ways. Alright.
28:32What did it come back with? Oh, blimey. Oh, no.
28:37So while this is doing its thing, let me do some teaching in the meantime. The plan here is that we take our grill me skill, and we need to essentially find some way of turning it into a destination.
28:53We need to go down to the, uh, we essentially need to we're figuring out the shape of this. That's what we're doing.
29:02We're figuring out the shape of the tasks during the grilling session. And in order to turn it into a bunch of actionable actions for the AI, we essentially need to figure out the destination.
29:14We need to know where we're going. We need to know the shape of this entire thing. So I think of there as being two essential documents that we need.
29:22We need a document that documents the destination. Oh, no.
29:29It's so not bright enough. There we go.
29:33Still not bright enough. There we go. We need something to document the destination, and we need something to document the journey.
29:41In other words, we need something a document that's going to figure out what this even looks like in all of its user stories and figure out a definition of done, and then we need to figure out what the split looks like. So that's where we're gonna go to next. So once we finish with the grilling session yeah.
29:59It looks great. Fantastic. I love it.
30:01It answered it answered 22 of its own questions. There you go. That's quite representative of what a grilling session looks like.
30:09So at this point now, I have used 25 k tokens, and all of that or loads of that stuff is gold.
30:18I wanna keep that around. I've I've got 25 there.
30:24And what I wanna do is kinda summarize it in some kind of destination documents. So this is the next exercise, where we're going to, uh, we're going to write a product requirements document.
30:39And the product requirements document or the PRD is essentially that's its function.
30:45It's the destination document. And it sort of doesn't matter what shape it is. I've got a shape that I prefer and that I quite like, but you can just choose your own shape or whatever your company uses.
31:00And all we're really doing is too worried about that.
31:05All we're really doing is summarizing the design concept that we have so far. And the so let let's try this.
31:14So I'm gonna initiate this. I'm gonna say, zoom all the way to the bottom.
31:19All I'm gonna do is just say, write a PRD. And we can take a look at that skill now. Write a PRD.
31:29So this skill, it does a few things. It first asks the user for a long detailed description of the problem.
31:36You can use writer PRD without grilling first, but I just like to grill first and then write the PRD afterwards. Then you can get it to explore the repo, which we've kind of already done.
31:47Then we get it to interview the user relentlessly, so we're kind of grilling session again. And then we start putting together a PRD template.
31:56So this is available in the repo if you wanna check it out. And, essentially, this is what it looks like. We've got some problem statements, the problem the user is facing, the solution to the problem, and a set of user stories.
32:07And these user stories sort of define what this is. You know, as you you guys have probably seen things like this if you've been a developer at all. Um, you know, there are Cucumber is a language you can use to write these in, or we just sort of, write them ourselves, essentially.
32:22Then we have a list of implementation decisions that were made and list of crucially testing decisions too. So I'm gonna run this.
32:32Okay. And so it's finished its thing.
32:37Windows, let me close the thing. Thank you. I don't know why I bought a Windows laptop.
32:42I think I just I like the challenge. So the first thing that it's gonna give me are a set of proposed modules it wants to modify.
32:53Now there's a deep reason why I'm thinking about this. So this is at this stage, we have an idea.
33:01We have sort of specked out the idea. We've reached a sort of understanding of what we're trying to do.
33:07And then we need to start thinking about the code because at this point, we need to this is not specs to code.
33:15This is not where we're ignoring the code. We actually keep the code in mind throughout the whole process. And the way I like to do this is I like to just sort of think about a set of proposed modules to modify.
33:26We're gonna return to this, this idea of continually designing your system and keeping your system in mind. So it's it's saying recommend test for the gamification service. It's the only deep module with meaningful logic.
33:38These modules look right. Yeah. That's good.
33:44And it's going to ping out a PRD. Now for ease of setup, I've got it so that it creates a set of issues locally.
33:54So it's just gonna create essentially a PRD inside this issues directory. But the way I usually do it, and you can check this out yourself, is you can go to my, essentially, what I consider my work repo, which is github.com/mattpocock/coursevideomanager up here.
34:15And in here, this is essentially a app that I create, that I use all the time to record my videos and things like this.
34:22I think I've recorded like I pulled down the set. I think I've recorded like a thousand videos in here or something nuts.
34:28And you can see here that it's got 744 closed issues, and this is essentially all of the PRDs and all of the implementation issues that I've put into here.
34:38So this is how I usually like to do it.
34:42So that's what I'm doing with the there we go. Yeah.
34:45I'm just gonna say yes and and get that issue out. Let's see.
34:51It is inside here. So we got the problem statement. People sign up for courses.
34:57The solution, the user stories, 18 user stories, looks nice. Some implementation decisions, level thresholds, etcetera.
35:04This is enough information. We've kinda clarified where we're going and what we're doing. So that's what we do.
35:10We essentially have a grilling session, and we've created an asset out of it. Now raise your hand. Should I be reviewing this document?
35:19Raise your hand if you think I should be reviewing the document. Yeah. I don't I don't look at these.
35:24I don't look at these. The reason I don't look at these is because what am I testing at this point? What am I like, when I read it, what am I testing?
35:34What am I what are the failure modes I'm trying to test for? I know that LLMs are great at summarization because they are.
35:39They're really good at summarization. I have reached the same wavelength as the LLM, right, using the grill me skill. We have a shared design concept.
35:48So if I have a shared design concept, all I'm doing is I'm just essentially checking the LLM's ability to summarize. So I don't tend to read these.
35:58Let's have let's have a q and a, so I can feel you guys are itching for it. And then I think we might have, like, I don't know, just a five minute comfort break, just to rest my voice and so you can catch up with the exercises for a minute, if that's all right.
36:11So let's have a little Q and A sesh. If I don't like Claude Code, which one do I actually like?
36:20Have you ever heard the phrase, democracy is the worst way to run a country apart from all the other ways? That's how I feel about Claude Code.
36:30We've answered that one.
36:34What's your thoughts on developers needing to very deeply understand TypeScript now that fix the t s make no mistakes exist? I don't understand the phrasing of this, but I think I understand the meaning, which is that I believe that code is very important, and this is kind of gonna feed through the whole session, and that bad code bases make bad agents.
36:57If you have a garbage code base, you're going to get garbage out of the agent that's working in that code base. We'll talk more about that in a bit. And so I think understanding these tools very deeply, understanding code deeply is gonna make you a much, much better developer and get more out of AI.
37:14And that answers that question too. Sweet.
37:20Get out of it. There you are.
37:24Now that we have 1,000,000 tokens available, do we ever actually want to take advantage of that? I've noticed that the dumb zone has become less dumb lately. Okay.
37:33Great question. This goes back to our kind of initial idea on the dumb zone.
37:44I I recorded my Claude code course using a 200 k context window, and on the day that I launched the course, they announced the 1,000,000 context window. My take on this is that what Claude Code did is they essentially just did this. Wee.
37:58They shipped a lot more dumb zone to you, essentially. Essentially. Now this is good for tasks where you want to retrieve things from a large context window.
38:07If you want to pass five copies of War and Peace or something to it, and you want to find out all the things that uh, I can't remember a character from War and Peace. Why did I start with that?
38:18It's good for retrieval. It's less good for coding. So I consider that it is about 100 k at the moment is the SmartZone.
38:28The SmartZone will get bigger, and that will be a really nice improvement. So folks, we're gonna take it like a five minute comfort break, if that's alright, just for my voice, and so maybe you can have a little move around or something or grab a drink. I've just noticed some sleepy eyes, and I wanna make sure that we're awake for the next bit, if that's alright.
38:45So we'll take five minutes, and I'll see you back here then. Alright?
38:50So we have our PRD, which I'm not gonna read, our kind of destination document.
38:57Let's quickly scan for any good questions before we zoom ahead.
39:05Rediscovering the role of software engineer in today's world, top three disciplines you recommend. Taekwondo is good, I've heard. I've I've no idea how to answer this question.
39:16Thank you for asking it though. Top three disciplines I recommend. I mean, sorry?
39:22Plumbing. Plumbing is a good one. Yeah.
39:24Yeah. Yeah. I don't know if that's a discipline.
39:25The plumbers I've hired are not usually very disciplined. Right.
39:32So okay, we now have our destination. Okay? Perfect.
39:38So how do we actually get to our destination? How do we we have a sort of vague PRD. How do we split it so that we don't put things into the dumb zone?
39:48In other words, we have our number four. How do we split it into this kind of multiphase plan? Well, probably what you would do at this point is you would say, okay, Claude, give me a multi phase plan that gets me to this destination.
39:59Right? That sort of makes sense. This is what we've been doing before.
40:03But I have a sort of better way of doing it now, which is that I like creating a Kanban board out of this.
40:13Raise your hand if you don't know what a Kanban board is. Cool. Okay.
40:18A Kanban board is essentially just a set of tickets that you put on the wall that have blocking relationships to each other. So we're gonna see what it kind of looks like here. This is how we've worked, um, as developers for a long time, really since Agile came around.
40:33And what it does, we can see it here, it has proposed that we split this setup into five different tasks here.
40:43We have the first one, which is the schema and the gamification service. Yeah. That looks pretty good.
40:48This is blocked by nothing. And we can even see here that it's given it a type of AFK too. Remember I talked about human in the loop and AFK earlier?
40:57This is an AFK task. This is something we can just pass off to an agent to do its thing. Streak tracking.
41:02Okay. That looks good.
41:05Then wire points and streaks into lessons quiz completion. This is blocked by one and two. Retroactive backfill.
41:11This is blocked only by one. And then this one here is blocked by all of the tasks. Cool.
41:20Now I consider this, you could say, why don't we just make this sort of generation of the issues? Why don't we just hand that over to the AI?
41:28Why do I need to be involved here? Right? Because it's given us quite a good selection of tools here.
41:32Why do I need to review this and figure out what's next? Now my take here is that this is really cheap to do, very quick to do once I've done the PR, and I can immediately see some issues here.
41:47There's a really, really important technique when you're kind of figuring out what the shape of this journey should look like. And it sort of comes to this very classic idea, which comes from pragmatic programmer called traceable.
42:04It's all vertical slices. And traceable, it's really transformed the way I think about actually getting AI to pick its own tasks.
42:14Systems have layers. Right? There are layers in your system.
42:19These might be different deployable units. You might have a database that lives somewhere. You might have an API that lives maybe close to the database but in a separate bit.
42:26You might have a front end that lives somewhere totally different like a CDN. Or within these deployable units, you might have different layers within those.
42:35In, for instance, the code base that we're working in, we have a ton of different services. Servers.
42:41We have a quiz service, a team service, user service, coupon service, course service. And these services have dependencies on each other.
42:48So they're kind of like individual layers. Well, what I noticed is that AI loves to code horizontally.
42:57So it loves to code layer by layer. So in other words, in phase one, it will do all of the database stuff, all of the schema, all of the, you know, all the stuff related to that unit, then it will go into phase two and do all of the API stuff, then it will add the front end on top of that.
43:14Does can anyone tell me what's wrong with that picture? Why is that not a good thing to do? Raise your hand if you have an answer.
43:21Yeah. You don't have the whole feedback loop. Exactly.
43:24You don't get feedback on your work until you've really started or completed phase three. So what you really need to do is you you're not until you get to phase three, you're not actually testing that all the layers work together.
43:41You haven't got an integrated system that you can test against. And so instead, you need to think about vertical layers.
43:48You need to think about thin slices of functionality that cross all of the layers that you need to. And this is a much better way to work, much better way for the AI to work too because it means at the end of phase one or during phase one, it can get feedback on its entire flow.
44:04So what this means to me is inside the PRD to issue skill up here, I have got break a PRD into independently grabbable issues using vertical slices, traceable.
44:18It's written as local markdown files. We first locate the PRD. Again, explore the code base if this is a fresh session.
44:26We draft vertical slices. So we break the PRD into tracer bullet issues. A tracer bullet, by the way, is essentially when you're like an anti aircraft gunner.
44:36It's quite a violent idea, actually. And you're looking up in the sky, and it's night.
44:41If you're just shooting normal bullets, you have no idea what you're firing at. Right? You could just be you know?
44:46You you see the plane, you don't see where your bullets are going. Tracer bullets is they attach a tiny bit of phosphorescence or phospho or something to make it glow as it goes.
44:55So this means that every sixth bullet or something, you actually see a line in the sky. So you have feedback on where you're aiming. So this is what this is the idea here, is that we increase our level of feedback, and we get near instant feedback on what we're building.
45:09Because without that, the AI is kind of coding blind until it reaches the later phases. We've got some vertical slice rules. We quiz the user, and then we create the issue files.
45:18So what I see here is that even though I've I've told it to do vertical slices, it's proposing to create the gamification service first on its own.
45:33That's just one slice there, and that to me feels like a horizontal slice. What I want to see in the first vertical slice especially is I want to see the schema changes or some schema changes.
45:44I want to see some new service being created, and I want a minimal representation of that on the front end. So I want it to go through the vertical slices, not just the horizontal. Does that make sense?
45:53Okay. So I'm gonna give the AI a rollicking.
45:59Bad boy. No. I'm not gonna waste tokens just being just memeing.
46:06So the first slice is too horizontal. I'll just start with that and see if it picks it up. Does that make sense as a concept?
46:14And I think having that what I really like about going back to those old books is that we are really trying to, in this day and age, like, get verbalize best software practices in English.
46:29And these books, twenty year old books, have already done that, and it's an absolute gold mine if you want to throw that into prompts. But even with that, it's not gonna not gonna do a perfect job each time. So award points for lesson completion visible on dashboard.
46:43Yes. That's a beautiful vertical slice because it's definitely a big chunk of stuff. It's doing a lot of stories there, but we're gonna see something visible at the end, and the AI will then just be able to add to that.
46:55You see why that's preferable to the first one? Cool. Looks great.
47:01So we're getting closer now. And anyone following at home as well, you're not at home, but you get the idea, will hopefully see the same thing too and start developing the same instincts.
47:11Let's open up for questions just while I'm sort of creating these GitHub issues. Or not GitHub issues, local issues. When will I stop using Windows?
47:21Never. What is your okay. We'll get to that later.
47:25How does AI decide when to stop grilling? Because AI can ask incessantly, can we have a smarter way to decide the stop point?
47:32Yeah. It does tend to really those grilling sessions can be super intense.
47:36And the thing about these skills is you can tune them if you want to. If you feel like the AI is just absolutely hammering you, hammering you, hammering you, then you can just tell it to just pull back a little or get it to do, you know, stop points and that kind of thing. So if that's a failure mode that you run into a lot, then you just, you know, change the skill.
47:55Do I still use be extremely concise, sacrifice grammar for the sake of concision? There was a tip that I gave folks five months ago, which is that to basically increase the readability of your plans.
48:08So when you're using plan mode, then you can put it in your Claude. Md, and you can say, okay.
48:15Yeah. Approve that. Let's open up Claude.
48:18Md. Do I have a Claude. Md?
48:22Maybe I don't. Really don't use Claude. Md very much.
48:25I'm just gonna put a dummy inside here. When no.
48:31When talking to me, sacrifice grammar for the sake of concision.
48:40And this prompt was really useful to me when I was reading the plans because it meant that the plans will come out and they will be very concise, really nice, easy to read, often very concise. But I've since dropped this idea in preference to a grilling session because what I noticed well, it just I didn't want to read the plans.
49:00I wanted to get on the same wavelength as the LLM. I wanted it to ask aggressive questions to me. And when I stopped reading the plans, I stopped needing them to be concise.
49:08So I think of the plans really in the destination document as, uh, the end state, and I don't need that end state to be concise. Hopefully, that answers your question.
49:20What do I think will be the outcome of the Mexican standoff of future roles of PMs and other roles converging? I have no idea. I'm not a pundit.
49:27I have no idea. Okay. So we should, after a couple of approvals, end up with a set of issues.
49:39Now, these issues that we're creating, they're designed to be independently grabbable, which means that this Kanban board ends up looking kind of like this, where you have essentially a set of tickets with a whole load of independent relationships.
49:57So this one needs to be done before this one. This one needs to be done before this one. And this one, let's say, we've another one over here.
50:04This one needs to number before this one. This means that you can start to parallelize.
50:10You can start to get agents working at the same time on these tasks because, yeah, this one needs to be done first, and then these two can be grabbed at the same time by independent agents.
50:26Raise your hand if you've done any kind of parallelization work with agents. Okay.
50:30Cool. So this allows you to turn those plans into optimally kind of like into directed acyclic graphs, essentially, where you just are able to essentially have three phases here, where you have phase one.
50:48Let me grab and move that.
50:52Above this line here, you do this one. Then phase two, you do the two below it, and then phase three, do this third one and add it onto there.
51:02And when you think about there could be this could this is a relatively simple plan, but you could have many different plans operating all at once. It means that you can do really nice parallelization, and we'll talk more about that in a bit. But that's why I prefer a Kanban board set up like this to a sequential plan because a sequential plan can really only be picked up by one agent.
51:24So this where did it go? Over here.
51:29Yeah. This plan here, this is really only one loop.
51:33Right? Only one agent can work on these because we have numbered phases and they're not parallelizable. Does that make sense?
51:40Cool. So we've got our issues. Come on.
51:44Stop asking me for oh, no. It's creating them on GitHub. I really didn't want that.
51:49Oh, no. You fool. Create them in issues instead.
51:57Nope. That's not precise enough. You fool.
52:01Create them in local markdown files instead, referencing the local version.
52:11Sorry about this.
52:15So once we get to this point, we have a bunch of issues locally that we can start looping over and implementing.
52:25And it's at this point that the human leaves the loop. So so far, let me pull up a proper overview of this kind of flow that we're exploring here.
52:37So far, we have taken an idea. I'll zoom this in a bit for the folks at the back.
52:46And we've grilled ourselves about the idea. We can skip over research and prototype, but we turn that into a PRD, into a destination document.
52:56We've then turned that PRD into a Kanban board, and all of those steps are human reviewed. And now the implementation stage, we step back, and we let an agent work through that Kanban board or multiple agents work through the Kanban board.
53:15Now what this means is that, yeah, we spent a lot of time planning here, but it means that we've queued up a lot of work for the agent. We can think of this as kind of like the day shift and the night shift. This is the day shift for the human.
53:27Right? Planning everything, getting all the all the stuff ready. And then once we kick it over to the night shift, the AI can just work AFK.
53:34But what does that look like? Well, so I'm just gonna Oh yeah, just allow it.
53:41It's perfect. So this looks like if we head to the next exercise, which is In fact, the last exercise here, running your AFK agent.
53:55Now, I've called this Ralph really, because it is essentially a Ralph loop.
54:02And this prompt here, wanna walk through this really closely. The first thing it's doing here is we're essentially going to run Claude, and we're gonna basically try to encourage it to work completely AFK.
54:16I'll show you what the sort of script for this looks like in a minute. But you say, okay, local issue files from issues are provided at the start of context. The way we do that is if you look inside once dot s h here inside the repo, we have, uh, it's essentially just a bash script where we grab all of the issues, um, which are inside markdown files, and we cap them into a local variable.
54:41So that issues variable contains all of the issues that are in our entire backlog. Then we grab the last five commits. I'll explain why in a minute.
54:52And then we grab the prompt, and we just run Claude code with permission mode except edits, and then just essentially just pass it all of the information. This is what the implementer looks like.
55:04So that's what a very, very simple version of this sort of loop looks like. And of course, this is not a loop. This is just running it once.
55:12The loop is in the AFK version up here, which is a fair bit more complicated.
55:18And the crucial part here is we're running it in Docker sandbox as well. So I I don't want you to install Docker on your laptops because we're just gonna be like, you need to download a special image, and we're gonna tank the conference Wi Fi if we do that. So I'm I am gonna demo this to you, but you won't need to run this yourself, but I'll talk through this in a minute.
55:37But essentially, this once loop here, we're just essentially running one version of the thing that we're going to loop again and again and again.
55:50So this is kinda like the human in the loop version. And this is essential. Running this again and again is essential because you're gonna see what the agent does and see how it ends up working.
55:59And any tuning that you need to add to the prompt, then you can do that. Let's go to the prompt.
56:10Local issue files are being passed in. You're gonna work on the AFK issues only. That makes sense.
56:15If all AFK tasks are complete, output this no more tasks thing. And then the next thing, pick the next task. So what we're doing here is we're essentially running a backlog or curating a backlog that our AFK agent is gonna pick up.
56:33That's the purpose of all of these setups in the beginning. In this all the way to this Kanban board here, we're just essentially creating a backlog of tasks for the night shift to pick up.
56:46And the night shift, this sort of Ralph prompt here, it's got its own idea about what a good task looks like to next pick up. I'm I did talk about parallelization.
56:57I I will show you this later, but this is essentially a sequential loop here. We're just gonna run one coding agent at a time. This is a good way to just sort of get your feet wet, essentially.
57:08So it's prioritizing critical bug fixes, development infrastructure, then tracer bullets, then polishing quick wins, and refactors.
57:16And then we just have a very simple kind of instruction on how to complete the task. So we explore the repo, use TDD to complete the task.
57:25I'll get to that later. And we then run some feedback loops.
57:30So let's let's just try this, and let's just see what happens. So good, it's created the issue files. We should be good to go.
57:35I'm gonna cancel out of this. I'm gonna clear, and I'm gonna run.
57:41Where is it? Ralph one.
57:44Sh. And you can feel free if you're following along to do the same thing. So we can see it's just running Claude inside here with the prompt and with all of the issues that have been passed in.
57:56And while it's doing its thing, you probably have some questions about this setup and about the decisions that I've made to essentially delegate all of my coding to AI.
58:08Right? So let's let's do a quick q and a while it's getting its feet under.
58:14Okay. I'm gonna just remove those.
58:23How do you retain negative decisions, things that you decided against, and rationales when persisting the results from the Grow Me session? Great question. That's a very simple answer, which is that in the PRD write a PRD section, there is a stuff at the bottom of section of the things that are out of scope.
58:41So the things we're not gonna tackle in this PRD, which is very important for giving a definition of done. Feel free to ping on the Slido if you've got any more questions. What's my front end workflow?
58:53Okay. That's a great question. I'm gonna I'm gonna answer that in a minute, I think.
58:58How to deal with agents producing more code that we can review? How to properly parallelize and use multiple agents in a separate way? Okay.
59:05There's two questions there.
59:09Raise your hand if you feel like you're doing more code review now than you used to. Yeah.
59:16Definitely. I don't think there's a way to avoid this. If we delegate all of our coding to agents, you notice that the implementation here is really the only AFK bit.
59:31We then also need to QA the work and code review the work. Right? And if we are running these loops where it's essentially gonna implement four issues in one, it's hard to pair that with the dictum that you should keep pull requests small and self contained.
59:48Right? Like, self contained pull requests means you're needing to do fewer loops or shorter loops or something, or maybe you do, like, a big stack of PRs, but that seems horrible as well.
1:00:00That's still just more separated code to review. I don't honestly know what the answer to this yet. I think we just need to be ready to be doing more code review, essentially, which is not fun.
1:00:11That's not a fun thing to say. That's not like I don't know. I don't feel good saying that, but I do think it's probably the way things are going.
1:00:18It's a great question.
1:00:23Can we grab a couple of questions from the room as well? Let's not we won't do the mic, but raise your hand if you've got a question for me immediately. Yeah.
1:00:45And while you're working on something, something else comes in as well. Yep.
1:00:50How do you deal with the messiness? How do you tighten the feedback? Great question.
1:00:53So the question was,
1:00:55if this all looks great if you're a solo developer, but actually, how do you implement this in a team? How do you gather team feedback on this? And my answer to that is that if you have an idea up there, and essentially, the sort of journey from the idea to the destination is something you need to figure out with the team.
1:01:14Right? So all of this stuff up here, this is kind of like team stuff. You know what I mean?
1:01:19This so if you have an idea and you do a grilling session on it and you have a question that you don't know how to answer, then you need to loop in your team as we described before. Then you might need to go, okay. Look.
1:01:30We just need to build a prototype of this. We need to actually hash this out. We need something that the domain experts can fiddle with.
1:01:36All working. We might need to integrate a a third party library into this. We might need to do some research.
1:01:41We might need to actually kind of, like, ping this back and forth and find a third party service that we can get the most out of. We might need to go back with the information that we gathered there to the idea phase.
1:01:51So all the way up to the sort of PRD and the journey, that's something you need to involve your team with. That's something where these assets are gonna be shared and argued over, and you're gonna have requests for comments on them.
1:02:04And that that loop is gonna just keep grinding and grinding until you figure out where you're going. Once you figure out where you're going, then you can start doing the Camon board implementation. But this is essentially super arguable, and the you'll be bouncing back and forth between the phases.
1:02:18Does that make sense? Yeah. Would you not need a PRD for your prototype?
1:02:23Say again. Sorry? Would you not want to have a PRD for your prototype?
1:02:26The question was, do you want to go through this whole session just to sort of create a prototype? Do you not need a PRD for your prototype as well? Let's just quickly talk about prototypes for a second.
1:02:35There was a question about how do you make this work for front end? Like, how do you because front end is, like, really sensitive to human eyes. You need human eyes looking at the front end all the time to make sure that it looks good.
1:02:48AI doesn't really have any eyes. It can look at code, but it front end is multimodal.
1:02:55And so my experiences with trying to plug AI into, um, let's say, agent browser or Playwright MCP to give it you can give it tools to allow it to look through a front end and sort of look at images, but in my experience, the, um, it's not very good at that yet, and it can't create a nice front end in a mature code base.
1:03:16It can sort of spit one out. But what it can do is you say, okay. Uh, I want some ideas on how, uh, this front end might look.
1:03:24Give me three prototypes, um, that I can click between in a throwaway throwaway route that I can decide which one looks best, and you take the asset of that prototype and you then feed it back into the grilling session or you get feedback on it, blah blah blah blah blah.
1:03:39That answer your question kind of thing? The prototype is just you know, it's messy. It's supposed to give you feedback early on in the process.
1:03:45Process. So that's a great way of working with Front End Co, great way of looking at software architecture in general. Let's go one more question.
1:03:51Yeah. Yes. In your system, how do you integrate respecting
1:03:55an architecture and design with API contracts and fitting with your larger system, security constraints, all kinds of constraints like that.
1:04:05There's a lot in that question. The question was, how do you conform with existing architecture? How do you do how do you make it conform to the code standards, like, of your code base or Yeah.
1:04:16The architecture
1:04:17design APIs Yeah. Security rules that constrains your design.
1:04:23I'm gonna answer that in a bit, if that's okay. So hopefully, we have started to get some stuff cooking.
1:04:31It's just pinging on the explore phase here.
1:04:37Tempted to just start running at AFK. Maybe I will. Maybe I won't.
1:04:44What it's essentially doing is it's exploring the repo. It's going to then start implementing based on what we wanted. Let's actually have one more question just while it's running.
1:04:51Yeah. Why not an AI
1:04:54QA step after before the human interviews? Yep.
1:04:59So the question was, why do you not get AI to QA? AI to QA.
1:05:06I just got jargon overload for a second. Why do you not get AI to test its own code?
1:05:13Now, of course, you absolutely can. And I think while it's doing while it's cooking here okay.
1:05:18It's got a clear picture of the code base. It's assessing the issues. It's doing issue o two as the next task.
1:05:24I'm again gonna show you that in a bit, I think. The sort of because you definitely should do an automated review step as part of implementation.
1:05:33So you have your implementation. You should then, because tokens are pretty cheap and AI is actually really good at reviewing stuff, you should get it to review its own code before you then QA it. I found that that catches a ton of different bugs, and the way that works is I will just do a little diagram, is if you have, let's say, an implementation that's sort of like used up a bunch of tokens in the smart zone, if you get it to sort of try to do its reviewing, it's gonna be doing the reviewing in the dumb zone.
1:06:05And so the reviewer will be dumber than the thing that actually implemented it. If we imagine this is the, let's be consistent, that's the review, that's the implementation.
1:06:15Whereas if you clear the context, then you're essentially gonna be able to just review in the smart zone, which is where you wanna be.
1:06:27Let's see how our implementation is doing. Okay. Good.
1:06:30It's generating a migration. That looks pretty nice. We're getting some code spitting out.
1:06:37And while I'm sort of like, here we go. TDD.
1:06:43Let's talk about TDD, and then I think we'll have a little another little break. TDD, I've found, is absolutely essential for getting the most out of agents.
1:06:52Raise your hand if you know what TDD is. Cool. Okay.
1:06:57TDD is test driven development. What it's essentially doing is it's doing something called red green refactor.
1:07:04And if you look in the code base, you'll be able to find a skill which really describes how to do red green refactor and teaches the AI how to do it.
1:07:13So what it's doing is it's writing a failing test first. So it's saying, okay. I've broken down the idea of what I'm doing, and I'm just gonna write a single test that fails, and then I need to make the implementation pass.
1:07:27I have found that first of all, this adds tests to the code base, and this this tends to add good tests to the code base. And so we've got this kind of gamification service.
1:07:38It looks like it's using some existing stuff to create a test database. Test fails because the module doesn't exist yet.
1:07:45Okay. We've confirmed red. And then it goes and hopefully runs it, and it passes.
1:07:51I found that, uh, raise your hand if you've ever had AI write bad tests. Yeah. It tends to try to cheat at the tests because it's sort of doing it in layers.
1:08:03It will do the entire implementation, and then it will do the entire test layer just below it. Uh, I'm just gonna say, yes.
1:08:10You're allowed to use MPXB text. And using this technique, it generally is a lot harder to cheat because it's sort of instrumenting the code before it's then writing the code.
1:08:24So I find that TDD is so, so good for places where you can pull it off. And in fact, it's so good that I sort of walk my whole technique around getting TDD to work better.
1:08:35I can see some drooping eyes. It is so hot. Cannot imagine how hot it is up here.
1:08:40Let's take another five minute comfort break. Let's come back at quarter two, I think.
1:08:45Have a nice generous one. And we'll be back in about six, seven minutes, and I'll talk about how, uh, I think about modules, think about constructing a code base to make this possible.
1:08:57I've just been sort of fiddling with the AI here, and we have ended up with some with a commit. So we have something to test. Issue number two is complete.
1:09:05Here's what was done. This is kind of what it looks like when a Ralph loop completes, as you end up with a little summary. And we have now something we can QA because we did the feedback loops, because we did the trace bullets, because we were said, okay, give us something reviewable at the end of this, we can immediately go and QA it.
1:09:24Now there's nothing less exciting than watching someone else QA something, but hopefully we can have a little play. Let's just check that it works at all.
1:09:34In fact, before I go there, I just want to sort of work through what just happened, which is we see that it's created some stuff on the dashboard, and it then ran the feedback loops.
1:09:47So it then ran the tests and the types. Now TDD is obviously really important, and it's really important because these feedback loops are essential to AI, essential to get AI to produce anything reasonable.
1:10:02Because without this, AI is totally coding blind. Right? You have to have to.
1:10:09If if your code base doesn't have feedback loops, you're never ever ever gonna get decent AI decent output out of AI. And often what you'll find is that the quality of your feedback loops influences how good your AI can code, essentially.
1:10:24That is the ceiling. So if you're getting bad outputs from your AI, you often need to increase the quality of your feedback loops. We'll talk about how to do that in a minute.
1:10:35Now, so it ran NPM run test, NPM run type check. It got one type error, and it needed to fix it with a nice bit of TypeScript magic.
1:10:44Very good. Yeah, typo level threshold's number. Okay.
1:10:48You see why I stopped teaching TypeScript because just AI knows everything now.
1:10:54So and it ran the tests, it passed, and it's looking good. So we now end up with 284 tests in this repo.
1:11:01Pretty good. I I do find front end really hard to test here. We're essentially just testing the service.
1:11:09So we've created a gamification service if we look up here, and then we have a test for that service. You can see the the service and the test itself.
1:11:17Now if I was doing code review here, I would then go to would first go to review the tests, make sure the tests were testing reasonable things, and then go and kind of review the code itself just to make sure that it's it's not doing anything too crazy. Right?
1:11:32The essential thing is I need to actually look at the dashboard. I'm going to log in as a student.
1:11:40Oh, if it'll let me. Maybe it won't let me. Come on, son.
1:11:44There we go. So log in as Emma Wilson. Head into courses.
1:11:49Let's say I've got an introduction to TypeScript. Continue learning. Yes, I completed this lesson.
1:11:57Something went wrong. I imagine it's because I don't have SQLite error.
1:12:04I don't have the right table. So I need a table point events. Point events is a strange table name.
1:12:09I'm not sure quite what it was thinking there. Let's suspend. Let's run NPM DB migrate or push, I think.
1:12:19Can't remember which one it was. But you kinda get the idea. Right?
1:12:23I I'm not gonna subject you to watching me do QA because it's so dull. But at this point, I would essentially go back in.
1:12:30I would let me open the project back up. And I would this this is a crucial moment, and it's so important to QA it manually here because QA dear.
1:12:45Oh, dear. What's going wrong? There we go.
1:12:47QA is how I then impose my opinions back onto the code base, how I impose my taste.
1:12:56What you'll often find is that there are teams out there who are trying to automate everything, like every part of this process, and they will tend to if you try to, like, automate the sort of creation of the idea, automate the QA, automate the research, automate the prototype, you end up with apps that I feel just lack taste and are bad.
1:13:22Maybe they just don't work or they they don't even work as intended or there's just no AI. You need a human touch when you're building this stuff because without that, you just end up with slop. And we are not producing slop here.
1:13:33We're trying to produce high quality stuff, and so that's what the QA is for. So I'm gonna do two things in this final section, which is I'm gonna first tell you how to.
1:13:46There's probably a question in your mind here, which is, let's say I have a code base that I'm working on, and it's a bad code base. It's a code base that's like really complicated.
1:13:57Uh, the AI just never does good work in, and maybe actually most humans that go into that code base don't do good work. How what how do I improve that code base?
1:14:06And the second thing is I'll show you my setup for parallelization. So let's go with bad code first.
1:14:14Now where is it? Where's the diagram?
1:14:17Here it is. In his book, um, the philosophy of software design, Jon Osterhout talks about the ideal type of module.
1:14:29And let's imagine that you have a code base that looks like this. Each of these blocks here are individual files, and these files export things from them.
1:14:38You know, they have, um, things that you pull from the files that you then use in other things. And so you might have these weird dependencies where this file over here might rely on this file or might rely on that file, for instance. Now if these files are small and they don't kind of ex like, export many things, then John Asstheim would call these shallow modules, essentially, where they're not very they kind of look like this, if I actually, no.
1:15:05I can't can't make a good diagram of it. There are essentially lots and lots of small chunks. Now this is hard for the AI to navigate because it doesn't really understand the dependencies between everything.
1:15:15It can't work out where everything is. You know, it has to sort of manually track through the entire graph and go, okay. This relies on this.
1:15:22This one relies on this one. This one relies on this one. And it's then also hard to test this as well because where do you draw your test boundaries here?
1:15:31Do you test each module individually? Like, just literally draw a test boundary? No.
1:15:37Don't do that. Around this one, and then maybe another test boundary around the next one, and then the next one.
1:15:46Or should you sort of do big groups of it? Should you say, okay.
1:15:49We're gonna test all of these related modules together and just sort of, you know, hope and pray that they work. Now this means that if I think that bad tests mostly look like that, where the AI essentially tries to sort of wrap every tiny function in its own test boundary and then just sort of test that they those individually work.
1:16:12But what that does is it means that when, let's say, this module over here calls those two, so it depends on both of these, then this module might misorder the functions, or there might be sort of stuff inside that poor module that's worth testing on its own. And if you then wrap this in a test boundary, what do you do?
1:16:31Do you mock the other two modules? How does that work?
1:16:36So actually figuring out how to build a code base that is easy to test is essential here.
1:16:45Because if our code base is easy to test, then our feedback loops are gonna be better and the AI is gonna do better work in our code base. Does that make sense?
1:16:54So what does a good code base looks like look like? Well, not like that. It looks like this, where you have what John Asterhout calls deep modules.
1:17:07Modules that have a little interface on there that expose a small simple interface that have a lot of functionality inside them.
1:17:16Now, what this means is that these are easy to test because you just let's say that there's a dependency between this one and this one. My arrow working?
1:17:26Yeah. There we go. Then what you do is you just wrap a big test boundary around that one module, around this one up here, and you're gonna catch a lot of good stuff because there's lots of functionality that you're testing, and really the caller, the person calling the module, is gonna have a simple interface to work from.
1:17:47So it's not not too tricky. That makes sense. Deep modules versus shallow modules.
1:17:52This is good. This shallow version is bad. And what I find is that unaided, or if you don't you don't watch AI carefully, it's gonna produce a code base that looks like this.
1:18:08So you need to be really, really careful when you're directing it. And that's why too, is that if we look inside the PRD where is the PRD gone?
1:18:17It's inside the issues. It's inside the gamification system. Not found.
1:18:22Of course, it's not. Here it is. Then I have inside here data model, the modules.
1:18:32So it's specifically saying, okay. This gamification service is a new deep module which we're going to test around. It's gonna have this particular interface, and it's going to have okay.
1:18:44We're modifying the progress service too. We're modifying the lesson route. We're modifying the dashboard route, etcetera.
1:18:49So it I'm being really specific about the modules that I'm editing, and I'm making sure that I keep that module map in my mind at all times throughout the planning and then throughout the implementation. That make sense? Very, very useful.
1:19:03It's useful for one other reason too. Not only does it make your app more testable, but you get to do a little mental trick.
1:19:11And I'm gonna refill my water while you wait for what that is.
1:19:17Uh, let me me get a question from you guys. So raise your hands if you feel like if you feel like you're working harder than ever before with AI.
1:19:32Yeah. Raise your hands if you feel like you know your code base less well than you used to. Yeah.
1:19:43This is a real thing. Because we're moving fast, because we're delegating more things, we end up losing a sense of our code base. And if we lose the sense of our code base, we're not going to be able to improve it, and we're essentially delegating the shape of it to AI.
1:19:59I don't think that's good. But then how do we how do we make it so that we can move fast while still keeping enough space in our brains?
1:20:08I think that this is a way to do it. Because what you're doing here is not only are you thinking about creating big shapes in your code base, big services.
1:20:19What I think you should do is design the interface for these modules, but then delegate the implementation.
1:20:27In other words, these modules can become like gray boxes where you just need to know the shape of them. You need to know what they do and sort of how they behave, but you can delegate the implementation of those modules.
1:20:38I found this is really nice. I don't necessarily need to code review everything inside that module. I don't necessarily need to know everything of what it's doing.
1:20:45I just need to know that it behaves a certain way under certain conditions and that it does its thing. So it's kind of like, okay, I've got a big overview of my code base and I understand kind of the shapes inside it, understand what the interfaces will do, but I can delegate what's inside.
1:21:01I found that has been a really nice way to retain my sense of the code base while preserving my sanity. Make sense?
1:21:12And so you might ask, how do I take a code base that looks like this and then turn it into a code base that looks like this.
1:21:20How do I deepen the modules? Well, we have hopefully, it's in here. Pretty sure it is.
1:21:26We have a skill, and that skill is called improve code base architecture. Nice and direct.
1:21:35Let's run it. What this skill is gonna do is it's essentially just gonna do a scan of our code base and looking for what's available here. And feel free to run this yourself if you're running the exercises.
1:21:48And it's exploring the architecture, exploring essentially how to work within this code base, and it's going to attempt to find places to deepen the modules.
1:22:00Pretty simple. One really cool thing that it found here is part of my part of my course video manager app is a video editor.
1:22:10A video editor built in the browser, which is really hardcore. It's a decent bit of engineering. And I wanted a way that I could wrap the entire front end all the way to the back end in, a single big module so that I could test the fact that I press something on the front end, and it goes all the way to the back end.
1:22:27And so I found a way, essentially, by using a kind of discriminated union between the two types here by sort of I was able to use this skill to essentially have a huge, great, big module that just tested from the outside or was testable from the outside this video editor infrastructure.
1:22:45And it meant that AI could see the entire flow, could act on the entire flow, and test on the entire flow. And honestly, it was just night and day in terms of the, uh, ability of AI to actually make changes because AI working on a video editor is pretty brutal if you don't give it good tests. So that is honestly, I if you take one thing away from today, just try running this skill on your repo and see what happens.
1:23:10Let's go to Slido. Let's ask a check a couple of questions just while this is running. So let's see.
1:23:16Have you tried Claude's auto mode with Claude enabled auto mode? That way, can avoid many of the obvious permission checks. We'll talk about permission checks in a second.
1:23:23Do I keep the markdown plans and issues for later reference? Okay. It's a great question.
1:23:31So let's say that you, uh, have a great idea.
1:23:38You turn it into a PRD. Raise and you then implement that PRD, and the PRD is essentially done. Raise your hand if you keep that information in the repo.
1:23:48So you turn it into a markdown file. Raise your hand if you want to keep that around. Cool.
1:23:54Okay. And raise your hand if you if you don't want to keep it around, If you want to get rid of it as soon as possible. Yeah.
1:23:59This is I think an a question that doesn't have a clear answer. What I'm really scared of with any documentation decision is that let's say that we have a PRD for this gamification system.
1:24:14We keep it in the repo. We go on, go on, go on. Let's say a month later, we want some edits to the gamification system.
1:24:21And we go in with Claude, and it finds this old PRD and says, yes. I found the original documentation for the PRD system. Well, it turns out that the actual code has changed so much from the original PRD that it's almost unrecognizable.
1:24:33The names of things have changed. The, um, file structure has changed. Even the requirements may have changed.
1:24:38We might have actually tested it with users. This is doc rot, where the documentation for something is rotting away in your repo and influencing Claude badly or Claude, agents badly.
1:24:50So I tend to not keep it around. I tend to get rid of it. And for me, because my setup uses GitHub issues, I just mark it as closed.
1:24:59It can fetch it if it wants to, but it's got a visual indicator that it's done. So I tend to prefer ditching these.
1:25:07Thoughts on the beads framework from Steve. I've not tested it, but it seems like sort of another way to manage Kanban boards and issues. Seems very good, but I've not tried it.
1:25:22Let me just quickly check the setup here. Let's take a couple of questions from the room.
1:25:28Anybody got any questions at this point about anything that we've covered so far, especially this last bit? Yes.
1:25:47Like, Like, database migrations? Yeah. I don't know.
1:25:53I hope that answers your question. I'm so sorry. No.
1:25:55No. I think database migrations are a different thing because you have a sort of running record of exactly what changed, and it's more deterministic. And I think yeah.
1:26:04It's an interesting analogy. I'm not sure. Let's talk about it afterwards.
1:26:08That's a good way of saying I have no idea.
1:26:11Yeah. Yeah. So you mentioned that we don't review the PRD.
1:26:14You mentioned you don't review the PRD.
1:26:16Sorry, guys. Um, I'm just trying to listen to this guy's question.
1:26:30Yeah. The question the question here is, should I, the sort of early planning stage, be trying to optimize the plan?
1:26:40This is something I actually see a lot of people doing, and it's a really good idea. So when you Let's go back to the phases.
1:26:51So let's say that you have all of these phases here, and you get to the point where you've sort of figured out everything with the LLM, you understand where you're going, you've created this sort of journey destination documents here, how do you then like, should you then try to optimize and optimize and optimize that PRD until it's the perfect PRD you can possibly imagine?
1:27:14I don't think there's a lot of value in that because I think the journey is really just sort of a hint of where you wanna go, and the place that you need to be putting the work is in QA.
1:27:26And that you can sort of do that AFK, I suppose, but in my experience, you're not gonna get a lot of juice out of it. Like, it's the the thing that really matters is getting alignment with the AI, which is you do in the grilling session initially.
1:27:40Let's have one more question. You know, I've got anymore? Yeah.
1:27:42How do you get in in your workflow
1:27:45to get it to code the way you want it to code? And so by the time you get to code review, it's at least familiar, use the libraries you want it to use. Yeah.
1:27:53We had this question before actually, which was like, how do you enforce your coding standards on the agents, essentially? How do you get it to code how you want it to code?
1:28:02Now there's essentially two different ways of doing it. You've got come on.
1:28:10Push, and you've got pull. What do I mean by push and pull?
1:28:18Push is where you push instructions to the LLM. So you say, okay. If you put something in claw.md, uh, talk like a pirate, that instruction is always going to be sent to the agent.
1:28:30Right? So that is a push action. You're pushing tokens to it.
1:28:33Pull is where you give the agent an opportunity to pull more information, and that's, for instance, like skills.
1:28:44So a skill is something that can sit in the repo and has a little description header that says, okay, agent, you may pull this when you want to. My thinking, my current thinking about code review and about coding standards looks like this.
1:28:59When you have an implementer, what's going on? There we go.
1:29:04Implementer. I'm gonna make this less red in a second. Um, then you want the coding standards to be available via poll.
1:29:14If it has a question, you want it to be able to sort of answer it. But if you then have an automated reviewer afterwards, then you want it to push.
1:29:24You wanna push that information to the reviewer. You wanna say, these are our coding standards. Make sure that this code, um, follows them.
1:29:31So if you have skills, for instance, then you want to push that stuff to the reviewer so the reviewer has both the code that's written and the coding standards to compare to. Hopefully, that answers your question. I can show you an automated version of this as well, actually.
1:29:47Yeah. Let's do that now just while it's fresh in my mind. I recently spent maybe a week or so, uh, building this thing called Sandcastle.
1:29:58And Sandcastle is a I was sort of unhappy with the options out there for, um, running agents, AFK.
1:30:07And what this does is it's essentially a TypeScript library for running these loops. So you have, uh, a run function that creates a work tree, um, sandboxes it in a Docker container, and then allows you to run a prompt inside there.
1:30:23And in that work tree then, it's just a git branch, and you have that code, and you can then merge it later. If I open up, there are some really, really nice ways of viewing this, and it essentially allows you to run these kind of automated loops and allows you to parallelize across multiple different agents really simply.
1:30:45So I'll go into my sandcastle file, go into main dot t s here, and let's just walk through this. So this is kind of like I showed you a sort of version of the Ralph loop earlier.
1:30:57This is where we take it from sequential into parallel. We have here, first of all, a planner that takes in it's has a plan prompt here that looks at the backlog and chooses a certain number of issues to work on in parallel.
1:31:12Remember I showed you that Kanban board where it had all the blocking relationships? It works out all of the phases. So this one will say, okay.
1:31:19Uh, let's say we have, uh, you can ignore all this glue code here. This is essentially just a set of issues, GitHub issues, with a title and with a a branch for you to work on.
1:31:32And then for each issue, we create a sandbox, and then we run an implementer in that sandbox, passing in the issue number, issue title, the branch.
1:31:43This is like the loop that we ran just before. Then if it created some commits, we then review those commits.
1:31:51This is essentially the loop. What do we do with those commits? We pass those into a merger agent, which takes in a merge prompt, takes in the branches that were created, takes in the issues, and it just merges them in.
1:32:06If there are any issues with the merge, you know, with the types and tests and that kind of thing, it solves them. And this has been my flow for quite a while now for working on most projects. It works super, super well.
1:32:18And I recommend you check out Sandcastle if you want to sort of learn more. And to answer your question properly is that in the reviewer, I would push the coding standards.
1:32:30In the implementer, I would allow it to pull. And I'm actually using Sonnet for implementation and Opus for reviewing because I consider reviewing sort of I need I need the smarts then.
1:32:44Any quest actually, let let me before we do more questions, let's go back here. Okay. Where are we at?
1:32:51Okay. We're sort of zooming everywhere in this talk because I'm kind of having to run things in parallel. So let's go back to the improved code base architecture.
1:33:00It has finally finished running, and it's found a bunch of architectural improvement candidates. So it's got essentially a cluster of different modules that are all kind of related that could probably be tested as a unit. Got number one, the quiz scoring service.
1:33:15There's some reordering logic extraction as well. It has arguments for why they're coupled, and it has a dependency category as well.
1:33:23So local substitutable in SQLite within memory test DB. Quiz scoring service currently has zero tests.
1:33:30This is the biggest gap. So this is what it looks like when we come back of improved code base architecture.
1:33:37Okay.
1:33:40So we have nominally kind of seventeen minutes left. I don't know about you, but I'm knackered. I want to let me kind of sum up for you, because I think we're sort of reaching the end of our stamina.
1:33:55I'm gonna be available for the full time if you wanna come and ask me questions. I might do one more check of the slider, but let's kind of sum up where we've got to. So this is essentially the flow, where throughout this whole process, we're bearing in mind the shape of our code base.
1:34:15This is not a Spectacode compiler. This is not an AI that's sort of just like churning out code. We are being very intentional with the kind of modules and the shape of the code base that we want.
1:34:25We are making sure that we are as aligned as possible by using the grilling session, by really hammering out our idea. We're not over indexing into the PRD. We're not trying to read every part of it.
1:34:36We're not thinking too much about it even. We are then just turning that into a set of parallelizable issues, which can be worked on by agents in parallel. We implement it, and we QA and code review the hell out of it and then keep going back to that implementation.
1:34:49One thing I didn't really mention is that in the QA phase, what the QA phase is for is creating more issues for that Kanban board. So while it's implementing even, you can be QA ing the stuff and going back, adding more issues, and the Kanban board just allows you to add blocking issues kind of, um, sort of infinitely, really.
1:35:07And then once that's all done, once you've got code that you're happy with, once you've got work that you're happy with, then you can share it with your team and you can get a full review. So this is kind of like once you get here, this is kind of one developer or maybe a couple of developers sort of managing this, and then it's kind of up to you to figure out how to merge it back in.
1:35:27Of course, all of this can be customized by you. This is just something that I have found works.
1:35:33I'm not trying to, like, sell you on a kind of approach here. What I recommend if you take one thing away from this session is that you should head back you should head to Amazon and just buy a ton of those old books because, I mean, I just found it so enlightening reading them.
1:35:48You know, pre AI writing is always like a really fun to read anyway. And I just on every single page, I found that there was something useful and something interesting to to read.
1:36:02So thank you so much. Thank you for putting up with the heat. Hopefully, body temperatures will reset soon.
1:36:08Thank you very much.
The Hook

The bait, then the rug-pull.

In a packed conference room in Europe, a TypeScript educator poses a counterintuitive claim: the reason most developers are frustrated with AI code has nothing to do with the models, and everything to do with ignoring a canon of software engineering knowledge that predates AI by decades.

Frameworks

Named ideas worth stealing.

04:40concept

Smart Zone / Dumb Zone

LLMs perform best in roughly the first 100k tokens. Design every task to fit inside the smart zone; clear context between tasks.

Steal forAny AI coding session planning or task sizing decision
14:30model

Grill Me Skill

  1. Interview relentlessly
  2. One question at a time
  3. AI provides its recommendation
  4. Reach shared design concept before any plan

Slash command that stress-tests a brief through relentless Q&A, building shared understanding before a plan or PRD is written.

Steal forAny project kickoff or feature brief
41:40concept

Tracer Bullet Issues

Each Kanban issue crosses all system layers to give the agent integrated feedback after every issue, not just at the end of a horizontal phase.

Steal forBreaking any PRD into agent-ready issues
45:00model

DAG Kanban Board

Issues with explicit blocking relationships forming a directed acyclic graph — non-blocked branches can be grabbed by parallel agents.

Steal forAny multi-feature or multi-agent project planning
50:00model

Ralph Loop (AFK Agent)

  1. Pass all issues in context
  2. Pick next AFK-tagged issue
  3. Implement with TDD
  4. Run feedback loops
  5. Commit
  6. Repeat until no more tasks

A prompt pattern for autonomous coding agents that picks, implements, tests, and commits issues sequentially until the backlog is empty.

Steal forOvernight or background feature implementation
1:20:00concept

Deep vs. Shallow Modules

Deep modules have simple interfaces with large internal logic. AI defaults to shallow; you have to be intentional about pushing toward deep.

Steal forCode architecture review before onboarding AI agents
1:28:20model

Push vs. Pull for Standards

  1. Push: always-in-context for reviewer agent
  2. Pull: available on demand for implementer agent

Coding standards pushed to reviewer agents and pullable for implementers, keeping implementation context lean while ensuring review catches violations.

Steal forAny multi-agent coding pipeline with code quality gates
CTA Breakdown

How they asked for the click.

VERBAL ASK
1:32:40next-video
Head to Amazon and buy a ton of those old books — they are an absolute gold mine.

Soft close recommending classic SE books; no explicit subscribe or product CTA. Speaker references Sandcastle and AI Hero implicitly throughout.

Storyboard

Visual structure at a glance.

AI Engineer Europe title card
hookAI Engineer Europe title card00:00
Smart zone diagram on screen
promiseSmart zone diagram on screen04:20
Cadence app (the workshop codebase)
valueCadence app (the workshop codebase)12:45
Grill Me session live in Claude Code
valueGrill Me session live in Claude Code26:20
PRD modules in VS Code
valuePRD modules in VS Code35:50
Research and Prototype phase diagram
valueResearch and Prototype phase diagram48:15
TLDraw Kanban boxes
valueTLDraw Kanban boxes55:00
Full flow diagram on slide
valueFull flow diagram on slide1:05:30
Agent running gamification service code
valueAgent running gamification service code1:18:45
Shallow module grid visualization
valueShallow module grid visualization1:23:30
Sandcastle README on GitHub
ctaSandcastle README on GitHub1:26:40
Final summary diagram
ctaFinal summary diagram1:33:00
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this