Modern Creator
David Ondrej · YouTube

Matt Pocock's Agentic Engineering Workflow (just copy him)

A senior developer's real AI-agent setup, and the argument that the harness — not the model — is where the leverage lives.

Posted
3 days ago
Duration
Format
Interview
educational
Views
102K
3.3K likes
Big Idea

The argument in one line.

The leverage in AI engineering sits in the harness you control — prompts, skills, codebase architecture, and review systems — not in chasing whichever model is newest, because good software fundamentals survive every model generation.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • A working developer who already uses an AI coding agent daily and wants a senior practitioner's concrete setup to copy rather than another model hype thread.
  • A solo builder or small-team lead deciding how to run agents in parallel without babysitting every permission prompt.
  • A vibe coder who can read code and use a terminal but knows their software fundamentals are the ceiling on what their agent can do.
  • An engineering manager weighing whether to hire experienced seniors or AI-native juniors for an agent-heavy workflow.
  • Anyone building or distributing agent skills who needs the distinction between skills the model invokes and skills you invoke yourself.
SKIP IF…
  • You want a step-by-step tutorial with copy-paste commands — this is a strategic conversation, not a setup guide.
  • You are looking for benchmarks comparing specific models, since the explicit thesis here is to stop fixating on the model.
TL;DR

The full version, fast.

A senior developer argues that everyone over-indexes on the model when the real, controllable leverage is the harness around it: prompts, skills, codebase architecture, tests, and review systems. AI has eaten tactical day-to-day coding, so your edge is strategic programming — scoping tasks, designing interfaces, and writing just enough documentation to delegate well. His concrete setup is Claude Code on Opus 4.8 for planning, plus away-from-keyboard agents run inside sandboxes (his Sandcastle tool) over GitHub Actions, organized as a task queue rather than an infinite loop. Prefer procedure skills you invoke yourself over abilities the model triggers, build self-improving review systems instead of trusting a fancier model to catch bugs, and keep yourself in the loop as the one who judges whether the work was actually good.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Voices

Who's talking.

00:00hostDavid Ondrej
00:00guestMatt Pocock
Chapters

Where the time goes.

00:0001:50

01 · Cold open: model vs. harness

The thesis stated up front — focus on the harness, not the model — and the framing of who gets ahead with AI.

01:5005:00

02 · Strategic vs. tactical programming

AI has eaten tactical coding; your edge is strategic programming, scoping, and interfaces. Your skills are the ceiling on AI.

05:0006:00

03 · SerpApi sponsor read

Sponsor break for SerpApi (structured search results for AI agents).

06:0013:40

04 · The teach skill, demoed live

Matt runs his stateful teach skill in an empty dir: mission file, cheat sheet, HTML lessons, spaced-repetition quizzes, learning record — built on real teaching principles.

13:4018:20

05 · Procedures vs. abilities

What makes a good skill: skills you invoke (procedures) vs. skills the model invokes (abilities); context-window leakage; grill me; superpowers.

18:2021:40

06 · Knowledge, skills, wisdom

The three-part model; you can bundle knowledge and skills into reusable skills, but wisdom needs real-world context.

21:4027:00

07 · The actual stack: Claude Code, Opus 4.8, Sandcastle

Matt's real setup — Opus 4.8 medium effort, AFK agents in sandboxes via Sandcastle on GitHub Actions, parallelization.

27:0033:50

08 · The 50/50 debate and the bitter lesson

Model vs. harness as roughly fifty-fifty; David pushes for chasing better models; the bitter-lesson trap; waiting a month before adopting a new model.

33:5042:00

09 · Self-improving systems and review

The Fable security-bug story; build a lock instead of admiring the thief; cheap models on a cron; review as observability into your own system.

42:0049:10

10 · Queues, not loops

The agentic-loop hype reframed as a task queue over GitHub issues and Actions; the medieval-king analogy; pushing human checkpoints toward production.

49:1055:00

11 · Building a business in the age of AI

Fundamentals unchanged — talk to customers; AI gives no edge on the right idea, only on implementation; ask what to remove, not what to add.

55:001:00:00

12 · Senior vs. AI-native junior

Enthusiasm beats experience in raw output; DX overlaps with AX; the tactical-only code monkey is gone.

1:00:001:02:24

13 · Closing action steps

Nuke every skill/plugin/MCP back to a blank slate and layer back only procedures you choose; delegate implementation to AFK agents; where to find Matt.

Atomic Insights

Lines worth screenshotting.

  • AI has eaten tactical programming — the day-to-day writing of code — so your remaining edge is strategic programming: architecture, scoping, and interfaces.
  • You have far more control over the harness (prompts, skills, codebase, tests) than over the model, yet most people obsess over the model.
  • The cheapest way to cut token spend is a codebase that is easier to change, because better guardrails let a dumber, cheaper model do the same work.
  • Your own skills are the ceiling on what AI can do for you — low skills cap the output no matter how good the model is.
  • AI makes senior developers roughly 10x better but only marginally helps juniors, which weakens the case for hiring many juniors.
  • Run agents inside sandboxes — an un-sandboxed agent can delete your home directory or exfiltrate your environment variables.
  • Think in queues, not loops: development is a backlog of scoped tasks that multiple agents pick off, not a single while-loop running forever.
  • Half the agentic-loop hype is research labs selling more tokens by telling you to prompt your agent endlessly.
  • Every model-invocable skill leaks its description into the context window, so 100 abilities means 100 descriptions of bloat.
  • Prefer procedure skills you invoke yourself over ability skills the model triggers, so you stay the driver and keep your thinking.
  • When a smarter model finds a deep bug, the real lesson is that you lacked a system to catch it — buy a lock instead of admiring the thief.
  • You can catch most deep bugs with a cheap model run on a daily cron against a fresh slice of the repo, given the right prompt.
  • Review is not only a gate — it is observability into your own system, so you are reviewing the process that produces code, not just the code.
  • Knowledge and skills can be bundled into reusable skills, but wisdom — knowing when to act — usually requires having done the thing in context.
  • AI gives no edge at having the right product idea; that still requires talking to real customers — the edge is only in implementation speed.
  • Ask AI what to remove from your product, not what to add, to avoid becoming a thousand-feature app nobody can navigate.
  • Don't adopt a new model the week it ships — wait about a month for the hype and latency to settle, the way you would with any release.
  • Away-from-keyboard agents are the unlock: removing yourself from the permission loop lets you run several agents in parallel and just review the output.
  • Dictation is an overpowered developer skill — being able to translate your brain into words quickly is a force multiplier on agent work.
Takeaway

The harness, not the model, is your real lever.

WHAT TO LEARN

AI has absorbed the day-to-day coding, so your edge is now the strategic scaffolding around the agent — the architecture, the skills you control, and the review systems that keep improving themselves.

01Cold open: model vs. harness
  • Treat the harness as roughly half the value: prompts, skills, codebase architecture, and tests are things you control far more than the model itself.
02Strategic vs. tactical programming
  • Lean into strategic programming — scoping tasks, designing interfaces, and writing just-enough documentation — because AI already does tactical coding cheaper than you.
07The actual stack: Claude Code, Opus 4.8, Sandcastle
  • Run agents away-from-keyboard inside sandboxes so they can't damage your machine, which lets you parallelize several at once and only review the output.
10Queues, not loops
  • Organize agent work as a queue of scoped tasks over GitHub issues and Actions rather than an infinite loop re-prompting one agent forever.
05Procedures vs. abilities
  • Prefer procedure skills you invoke yourself over abilities the model triggers, since every model-invocable skill leaks its description into the context window.
  • Use an adversarial interview skill before you implement, so you reach shared understanding and surface bad assumptions before any code is written.
09Self-improving systems and review
  • When a smarter model finds a deep bug, build a system that catches that class of bug — a cheap model on a daily cron often suffices with the right prompt.
  • Keep yourself in the loop as the judge of whether work is good, and treat review as observability into the system that produces your code, not just the code.
08The 50/50 debate and the bitter lesson
  • Wait about a month before adopting a brand-new model so the hype, latency, and cost shake out before you reorganize around it.
11Building a business in the age of AI
  • Build a product by talking to real customers and choosing what to remove, because AI gives you implementation speed but no edge on having the right idea.
Glossary

Terms worth knowing.

The harness
Everything around the model that you control: the prompts, skills, codebase architecture, tests, documentation, and review systems that shape how well the agent performs.
Tactical vs. strategic programming
From Ousterhout's A Philosophy of Software Design: tactical is the on-the-ground daily coding (syntax, bugs, commits); strategic is the longer-term thinking about architecture, interfaces, and velocity. AI has largely taken over the tactical.
AFK (away-from-keyboard) work
Handing an agent a scoped task and letting it run without you in the loop handling permissions, so you can parallelize multiple agents and review their output afterward.
Sandcastle
Matt Pocock's tool for running coding agents inside sandboxes (via Docker, Podman, or Vercel sandboxes) so they can't damage your machine or leak secrets, enabling many agents to run in parallel.
Queue vs. loop
A reframe of agentic automation: instead of an infinite while-loop re-prompting an agent, treat work as a backlog of scoped tasks that get triaged, explored, implemented, and merged — multiple agents picking items off the queue.
Ralph loop
An early agentic-coding pattern (attributed to Geoffrey Huntley) that wraps an agent in a while-loop, passing the same prompt to Claude Code repeatedly until the task is done.
Procedure vs. ability (skills)
A procedure is a skill you invoke yourself to steer the model a certain way; an ability is a skill the model invokes on its own (e.g. coding standards). Abilities leak their description into context; procedures don't if model-invocation is disabled.
DX vs. AX
Developer experience vs. agent experience — how pleasant a codebase is to work in for humans vs. for AI agents. The two overlap heavily, so good software fundamentals improve both.
The bitter lesson
A machine-learning idea that raw compute scaling eventually beats hand-crafted optimizations, since the underlying model keeps improving. Here it's the worry that tuning your harness may be wasted if you should just wait for better models — countered by the value of model-agnostic fundamentals.
Zone of proximal development
An education concept about teaching at the edge of what a learner can do with guidance — encoded into the teach skill to scope each lesson just beyond the learner's current ability.
Resources

Things they pointed at.

02:00bookA Philosophy of Software Design (John Ousterhout)
12:20toolWhisperFlow (dictation)
16:20toolsuperpowers skills repo (Opera)
24:40toolVercel sandboxes
25:00toolGitHub Actions
33:30toolSentry (observability example)
42:20linkRalph loop (Geoffrey Huntley / gHuntley)
Quotables

Lines you could clip.

00:00
Everyone's obsessed with the model, and I think they should be more interested in the harness.
The whole thesis in one line — works as a standalone cold open.IG reel cold open↗ Tweet quote
01:13
How do you optimize for token spend? Have a codebase that's easier to make changes in.
Counterintuitive, concrete, and reframes a question everyone asks.TikTok hook↗ Tweet quote
02:30
AI is just better at tactical programming than you are, because it can do it for cheaper.
Blunt claim that lands the strategic-vs-tactical point.newsletter pull-quote↗ Tweet quote
05:00
Your skills are the ceiling on what AI can do.
Tight, quotable, no setup needed.TikTok hook↗ Tweet quote
23:50
The moment I discovered AFK was the moment I really got into AI coding.
Personal, names the unlock directly.IG reel cold open↗ Tweet quote
27:00
Everyone's obsessed with the engine of the Formula One car. In fact, the engine is only a part of the whole system.
The central analogy, vivid and shareable.TikTok hook↗ Tweet quote
39:40
If someone keeps stealing your bike, maybe buy a lock.
Memorable analogy for building self-improving systems instead of buying a fancier model.newsletter pull-quote↗ Tweet quote
45:00
Queues, not loops. That's all development is, really — a queue of tasks that you need to get done.
Reframes the viral agentic-loop trend in five words.TikTok hook↗ Tweet quote
59:50
Delete every single skill, every plugin, every MCP server. Go back to a blank slate, and see what the agent does.
Concrete, contrarian action step to close on.IG reel cold open↗ Tweet quote
54:20
You should be asking AI what thing you can remove from your app — how do I make this simpler?
Inverts the default product instinct; strong standalone advice.newsletter pull-quote↗ Tweet quote
Topic Map

Where the conversation goes.

00:0001:50denseModel vs. harness thesis
01:5005:00denseStrategic vs. tactical programming
05:0006:00sparseSponsor (SerpApi)
06:0013:40denseThe teach skill demo
13:4018:20denseProcedures vs. abilities; skill design
18:2021:40steadyKnowledge, skills, wisdom
21:4027:00denseConcrete stack: Claude Code, Opus 4.8, Sandcastle, AFK
27:0033:50dense50/50 debate and the bitter lesson
33:5042:00denseSelf-improving systems and review
42:0049:10denseQueues not loops
49:1055:00steadyBuilding a business in the age of AI
55:001:00:00steadySenior vs. AI-native junior hiring
1:00:001:02:24denseClosing action steps
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:00Everyone's obsessed with the model, and I think they should be more interested in the harness. What you can do to get the most out of the harness, giving it the right prompts, giving it the right skills to work with, and improving the environment in which the model runs.
00:13As I sort of said with Fable, like, I the model is useful, but I think the harness has an equal amount of work, and you have much more control of the harness than you do the model.
00:24People are focused on the wrong thing. They're looking at the big shiny new thing when in fact, just focus on the stuff that's been working for thirty, forty years. You know?
00:33And it really does work. Like, people ask me all the time, how do you optimize for token spend? Have a code base that's easier to make changes in.
00:42Right, Matt. So what's gonna be the main difference between people who use AI to get insanely ahead and the majority of people who only get a small boost from it? So
00:51in his book, Philosophy of Software Design, Jon Osterhout talks about the difference between tactical and strategic programming. So I find this distinction so useful when thinking about AI because tactical programming is all about the on the ground day to day stuff, the actual writing of the code, the actual messing about with the syntax, figuring out bugs as they come up, and actually creating the code, creating the commits.
01:15Strategic programming is winning the war, not the battle. It's longer term thinking. It's the general sitting right at the top.
01:22How does the code base need to look? What strategies can I use to improve our velocity?
01:27And for me, strategic programming has always been the most interesting, the most exciting. That's how I was thinking even when I was a junior.
01:35How can we increase our velocity? How can we do more with less? And AI is basically eaten tactical programming.
01:41Yeah. It's gone. Right?
01:42It's all gone. For sure. So AI is just better at doing tactical programming than you are because it can do it for cheaper.
01:48Right? And so
01:51you need to be great at strategic programming in order to get the most out of this infinite fleet of tactical programmers that you now have access to. So does that mean knowing how to orchestrate these agents plus some, like, fundamentals of software design, codebase architecture? Like, how would you break that down into, like, these specific skills that people can learn?
02:10Yeah. Great question. So
02:12strategic programming really hasn't changed in AI. Right? AI is just, um, all we're doing is instead of delegating to junior or mid level programmers, we're delegating to AI instead.
02:24So the things that you need to do good delegation are still the same.
02:29You need to design the hard parts up front. You need to make sure those tasks are really, really well scoped. You need to be thinking about the interfaces between all of the modules in your code base.
02:39You know? You need to be thinking about test scenes and good tests. You need to essentially design a code base that's easy to work in and have just enough documentation that can point AI to the right places where it's gonna make those changes and make them effectively.
02:52I think everybody at this point agrees that the AI progress is very fast, if not speeding up. So I think a lot of people also miss the part of upskilling themselves. Right?
03:01Because the
03:02like, yeah, you can pay for subscriptions. You know, you can get the latest tools, but, ultimately, anybody can do that. But there is still gonna be people who use these tools to massively grow their business, you know, to ship more and better software than ever before.
03:14And there's gonna be people who, like, tried a bit and, you know, maybe used the free version or the cheaper model. So how would you advise people to start teaching themselves
03:23to be better? Yeah. People ask me all the time, like because, you know, I sell developer courses.
03:27Right? So I'm sort of you know, you can take my advice here with a pinch of salt, but I personally feel that my skills are a multiplier for AI.
03:37Right? I if I'm able to oversee a code base and think about how, like, things should be built and just tell AI how to do it, then AI just has so much richer context to work with. And I think of this I mean, I see this everywhere, and people, like, CTOs and, like, people I talk to at conferences tell me this all the time, is that AI makes senior developers just 10 times better.
04:01And it sort of doesn't make sense to hire that many juniors anymore because juniors get a little boost from AI, but seniors just get this ridiculous, huge boost from it.
04:11And they can do so much more with it. So your skills are the ceiling on what AI can do. And if your skills are low, then AI is not gonna be able to go past that.
04:20You know? So getting good with AI is really about getting good at your domain, getting good at what AI is going to be doing for you.
04:29So a better teacher can use AI to teach people better than a random can. You know? So I think skills are more important now than they used to be because, again, you just have this multiplier available to you, and you can delegate more.
04:45So you recently shipped the new teach skill. Right? Can you tell us more about that?
04:48If you've ever tried pulling search data from the web at scale,
04:52you know it's a nightmare. You write a scraper. It works for a week, but then the layout changes.
04:57Then you hit captchas. Your proxies get blocked. Rate limits everywhere.
05:01And suddenly, you find yourself maintaining scraping infrastructure instead of building the actual project. This is where SERP API comes in.
05:08It gives you clean, structured search results from Google, Bing, Yahoo, and more through a single API call. You send a request, and you get back a clean JSON object with exactly the data you want. No captcha solving, no rotating proxies, no broken HTML.
05:25They handle all of it. And for AI work, this is huge. Say you're building an agent that needs live information.
05:31Just use their Google search API. Or maybe you're training an AI model that needs a dataset. Their Google images API gives you pre classified titles, URLs, and thumbnails ready to go.
05:41A ton of production agents already use SERP API as one of their core tools. And you can get started with 250 free credits, no credit card required.
05:51Just scan the QR code on screen or click the first link below the video. Oh, and a huge thank you to Serp API for sponsoring this video. Yeah.
05:58I know a lot about teaching. I've been a teacher for ten years, actually. So I was teaching
06:03singing and voice when I was just straight out of university. Then I became a developer, and now I teach developers. I've been doing that for the last four years.
06:11So I know a lot about teaching, and I I thought, okay. What if I take some of the teaching principles that I know about, such as the zone of proximal development, such as the difference between knowledge, skills, and wisdom, encode that into a skill, and essentially use it to create a course on the fly about any topic.
06:28And that's what I've done, and it's extremely effective. I've actually been learning I'm teaching myself Rubik's cube from this. I can solve a Rubik's cube now from memory, thanks to this skill.
06:39And I've been using this for all sorts of stuff. So yesterday, I was I was messing about with what it might look like to ask the teach skill how to become a senior developer.
06:51And it basically went on this big journey looking at a bunch of trusted resources, getting a big sort of curriculum together, and just produce something that was gorgeous. And so it that?
07:01Absolutely.
07:02Let's give it a go. Because, yeah, I think people would love to see that. Definitely.
07:06Okay.
07:07So, David, what do you wanna learn?
07:10Let's do, like, systems design. I tell you what. I mean, I I I I've got an idea here, which is that a lot of people come to me.
07:17I I teach courses for engineers, really, people who already know how to be an engineer. I'm intrigued by if this skill can teach you basically the basics of engineering.
07:28You know what I mean? To fill in the gaps that you might have if you're a vibe coder. You know?
07:33So I'm gonna pretend that I'm a vibe coder. I'm gonna invoke the teach skill. I'm inside an empty directory here, and I'm just gonna let it roll.
07:41So I'm gonna dictate something out, and let's see how it goes. I am a vibe coder, and I want to fill in my knowledge gaps so that I can ship better software. I know some very, very basic CLI commands, and I know just about enough to read some code and use the terminal, but that's about it.
08:01What do you think I should learn to develop my skills next?
08:07I'm gonna put that in simple prompt, plain English, anybody can ask for this. Exactly. And it's just a a very, very simple request.
08:13And I'm not really talking about the subject that I'm trying to learn. I'm talking about my mission, the thing I want to get out of this, the reason I'm coming to this session with the teacher today. And you can think of this really as a collaborative effort, basically.
08:26I am talking to a teacher, and the agent is my teacher. And it should know how best to teach me.
08:33It's good that you you brought it up and you created the skill recently because, literally, I was thinking about this yesterday. Right?
08:39Especially when Fable came out, I was like, how can I upskill myself to get the most out of it? Because, like, I I I know I'm not where I can be. You know, there's people much better than me, much more skilled than me.
08:49So, like, it's it's great timing that you have the skill now.
08:52Good. Well, this so it's basically saying, okay. I've checked the workspace.
08:56It's a blank slate. The teach skill, you need to run it in a workspace because it saves a bunch of information in that workspace. Okay.
09:03And so for a Vibe coder who can read code and use a basic terminal, the highest leverage gap is almost never more syntax. It's the stuff around the code that lets you ship without fear. That's 100% true.
09:13Yeah. So git, reading errors, debugging, how software actually ships, testing, totally. Git is where we start.
09:19Okay. But first, the mission. I don't want to guess.
09:22Three quick questions. The first thing the skill does is it basically aligns with what you want to do. I think of teaching and learning as not getting information into your head, but orienting you in the world, putting you in a new place in the world.
09:39And this is kind of a bit abstract and a bit, you know, spiritual almost, but it's not really. Mean, essentially, you need to learn these skills in order to do something into in the world.
09:49Yeah. And that's your mission. Right?
09:50So it's like, what are you building? What does Ship's Better software mean to you right now? What's a concrete project you're working on?
09:56So let's imagine I answer this. Let's imagine I was a I'm a voice coach wanting to learn how to be a better coder. You know, that's actually who I was back then.
10:04So I'm a voice teacher and a singing teacher. I want to build a scheduling app to help me schedule my students, to help me retain notes on my students, to teach them better, and to, you know, build something that they can help practice with. That's the kind of app that I'm looking at.
10:18So probably a full stack application with a database with some kind of authentication,
10:24but that's way beyond my abilities right now. By the way, I'm using WhisperFlow for dictation. It's really good.
10:28Yeah. So this is gonna be also the game, like, how fast you can output your tokens from your brain and input them back into your brain.
10:35Totally. Yeah. I mean, like, dictation is if we do a little sidebar on dictation, anyone who's not doing dictation is just so much faster.
10:44Right? It's very fast for me because I'm, like, quite a fluid speaker, so I can translate my brain into words quite effectively. But it's a skill.
10:52It's You a skill at the end of the day, and people can can learn to verbalize faults better and faster. Exactly. And it's a skill that is actually overpowered if you're if you're a developer.
11:02It really, really is. Like, I found that being able to communicate and being able to speak was something was just ridiculously overpowered in the development world, and so it has proved.
11:12So it has created a mission dot md here. So it's basically saying, okay, who is this person?
11:19What do they want to build? Why it matters? What success looks like?
11:22Being able to ship that app, not break it, get it live, and trust that it works for real students. This is now going to orient everything about this skill and what it does next. So you can see it's doing some searches here.
11:34So it's searching for some trusted resources. How does a full stack web app, front end, back end, blah blah blah blah blah. Let me set up your resources, a learning record, a reference cheat sheet, and your first lesson.
11:44So it's gonna start churning out some material that's running locally. And this is going to the idea of this is I think of their skills as stateless skills that don't need any state on the local system or any kind of, like, memory about what was done before.
12:02And then there are stateful skills. So skills that rely on information running locally. And this teach skill is a stateful skill, because if you think about working with a great teacher, a teacher remembers what you've done before.
12:17A teacher knows about where you're sort of Yeah. Need to go next, knows what your mission is, all that stuff. And so it's saving a bunch of states locally, so it can remember everything.
12:27And it's, first of all, created a reference. So we got a reference cheat sheet. It's now gonna create the first lesson.
12:33And these are created as HTML. This means we can open it in a browser and have, like, a really rich thing to look at, because learning stuff in the terminal is just brutal. Yeah.
12:42So you're using ClothCode with Fable. Right?
12:45I'm using ClothCode with Opus 4.8 with medium effort. So I'm not using Fable. Not quite yet.
12:51I haven't decided whether I want to get into Fable yet or not. Really? Well, I mean, yeah.
12:57I I I don't I don't really believe in all the kind of like, yes was it yesterday when it was released? It's just so much unbelievable amount of noise.
13:07People saying they've one shot at this, one shot at that. And, yes, it does seem to be a step change. It does seem to be slightly better.
13:13But then you've got to weigh that against the cost of the tokens and That's true. How available it is, the latency of it. I prefer to essentially not try a new model when it comes out, and wait about a month just to see how things check out.
13:27That's what I did with Opus 4.5, which was the last time I really had a massive new feeling about a model, and it worked fine.
13:36You know? You're not you're not losing that much by just waiting a little while to see how things shake out. Alright.
13:42So this is the file it created. This is lesson one. Get your project's undo button.
13:47So So we can see that it's using a more rich like, actually seeing this in the HTML is a lot richer and Yeah. Nicer than doing it in the terminal.
13:56This is saved locally, so you can always go back and reference this. And it's giving you actual things you can do in the terminal, giving you proper exercises to go and do it. So you make a folder, go into it, start getting it, create a file, check the status, stage it, save the snapshot.
14:13Because, of course, it's running, like, on my system, it knows what my setup is. It's probably already checked whether I've got git installed, that kind of thing. So it's, you know, it's perfectly Personalized education.
14:25You know? Exactly. Totally personalized.
14:27So which command saves a snapshot of your stage changes? David, do you reckon you can answer this one for me?
14:33Get go and save snapshot of your stage changes. Git commit.
14:37Git commit. Bam.
14:39So again, it's using techniques that are well known in education for increasing Mhmm. Storage strength.
14:46Right? So quizzes are such an awkward thing.
14:49Like, I sort of hate quizzes, but quizzes are just unreasonably effective for increasing the strength that something is stored in.
14:58What command shows what has changed right now? David? Git status.
15:02Git status. Yeah. Ouch.
15:04Bugger. I pressed the wrong thing. What does git add do to a change?
15:09Stages changes.
15:11Stage changes. A commit is best pictured as a Save point.
15:16Save points. You broke a file but haven't committed. To restore it, you run?
15:23Git restore?
15:25Yeah. I think it's git restore, isn't it? Yes.
15:27There it is. Very good. Okay.
15:30And so it then sends you off to read a primary source, if you fancy it. So the pro git book. And then invites you to ask your teacher follow-up questions and create the next lesson.
15:42And so the idea of this is you I think of knowledge as like a graph. Right? It's like a big forest through which you're exploring.
15:49And what this is doing is it's creating a linear path through that graph. It's basically going, okay. You've learned this.
15:55Now that's I I know that you've learned it. It's in your learning record. We can see it's retaining a list of learning records in the top right here, which is your mission and your starting point.
16:05So it's captured your mission, a decision to start with git, zone of proximal development, current estimates. You get the idea. So it's great.
16:14I freaking love it. And that's what I would recommend to anyone starting with especially, developments, because it's sort of I mean, I'm a developer.
16:23I know what developer education is, and so I've sort of put that into this teach skill. And I think I've always thought coding was quite easy to learn, but I didn't have that much trouble when I was learning it myself. And I I think this is a great way to do it.
16:39So is this a life on GitHub somewhere? Where can people find this? GitHub, Matt Pocock skills.
16:44And if you head there, you just run this CLI command, npx skills latest add matpocot skills.
16:52You can choose the teach skill, and it will just save to your local setup. So whether you're using ClawCode,
16:58whether you're using Codex, it will work, and you'll be able to then just invoke teach inside a fresh workspace. So you have, you know, perhaps the most at least one of the most famous and popular skills repos. What separates a good agent skill from a bad one?
17:13It's
17:14such a deep question. It's such a deep question, because it depends what you want. You can think of there as being two types of skills.
17:24There are skills that are procedures, skills that you intend to run yourself. And then there are skills that are more like abilities.
17:34Those are ability like, things that you intend the model to invoke itself. And so a good ability, for instance, might be your coding standards, let's say.
17:45So let's say your agent is sort of doing its own thing, kind of working along, and it needs to check how you like your React code written. So it's going to write some React code. You it pulls in the ability, great React coding standards, let's say.
17:58And then it reads it, and it understands, okay. I shouldn't use useEffect. I should use something different.
18:03A procedure is more like something this is how I prefer my skills written. It's something that you invoke yourself to get the model to behave a certain way.
18:14It's something I love is my grill me skill.
18:18That's one of my most popular skills. What it essentially does, it turns the model into an adversarial interviewer.
18:25So this is under productivity, under grill me. It's incredibly short.
18:30And you can see it's literally just four sentences, I think, this skill. Maybe five sentences. And it's unreasonably effective because it just turns the agent into an adversarial interviewer asking you questions, interviewing you, and popping up with ideas that you might not have considered until you reach a shared understanding.
18:49I've been using this for coding, first of all, just like as a replacement for plan mode.
18:55So before you actually go and implement some code, you go, okay. Here's my idea. Interview me about it.
19:00Let's reach a shared understanding. Let's flush out any weirdness or any unexpected stuff before we get in as much as you can. And it's just unreasonably effective.
19:10And this is a procedure. This is not an ability. I tend to prefer my skills as procedures.
19:16I like to be the one in control. I like to go, okay. We'll do grill me, and then we'll go let's write a product requirements document.
19:23So we use two PRD, for instance. Then let's take that PRD and turn it into individual issues so that we can work through them. That's just personally how I like to do it.
19:33But other skills such as superpowers from Opera, which is probably the most popular skills repo out there, it takes the opposite approach, and it prefers things to be more like the model is in control. But I've always preferred to me personally be in control because I know my skills, I know my abilities.
19:49I don't want to delegate my thinking to the model. Yeah. I mean, that, I think, is one of the see, I'm, like, playing with this idea of the list.
19:57It's like a list of abilities, you know, knowledge.
20:02Basically, something that, like, if you could take the average, you know, 100 x developer that uses AI versus, you know, one x developer or whatever, would be the list of the differences. Right?
20:14You can say, like, okay. Some of these are, like, raw intelligence, you know, blah blah blah.
20:18But most of them are probably teachable. Most of them are some skills, some knowledge, something like that. So I'm obsessed with this idea, and I think one of them is kinda knowing when to have the AI ask you.
20:29Right? Like, kinda this groom y style of skill. Because, personally, I found out, like, the biggest difference instead of, like, saying one shot this app, I describe my vision for this app and say, like, list out the 10 most consequential decisions.
20:42Right? The software design decisions, architectural decisions, product decisions that will shape this project and ask interview me until you understand 98% about it.
20:51Right? So kinda that is, like, one of the things I'll put on the list.
20:55What what are some of the things you think are on the list?
21:00Well, can we can I challenge the idea that this is possible?
21:05Is it alright if I take this question in a different way? Sure. Because skills are really hard to write, especially because every single skill that you write, it leaks description, this description here, into the context window.
21:20Right? Yeah. And you can disable this.
21:23So you can there are some skills in here, I think, in my engineering zoom out, I think, which has a disable model invocation true.
21:32So this one, this skill can only be invoked by the user, and this means its description is not leaked into context. Every single ability let's say we have the list.
21:44Let's say we have a 100 different skills.
21:46You're gonna be leaking 100 descriptions into the context window. Right? Okay.
21:51Maybe let me rephrase. I didn't mean it for the AI. I meant the list is the person.
21:56Right? Like, if you had to say, like I know it's difficult to, like it's maybe a reductionist to take someone who's, like, really insanely productive.
22:05You know? Maybe, like, some some of the top people at OpenAI are an Anthropic who, like, worth hundreds of typical developers. Right?
22:11What would be the list of their abilities, skills, knowledge that compare them to a average developer?
22:19Yep. Got you. Well, this I mean, the you're kind of heading in my direction, I think, which is I prefer to hide most of these descriptions from the AI itself and keep all of that knowledge inside the human, right, inside the developer.
22:35And so I prefer that's how I prefer my skills to be used is you essentially are the driver. You know? You take the steering wheel.
22:42And so I do think that this is such an exciting time to be a senior dev and to, like, be able to share and, like, proceduralize proceduralize, maybe, your work into reusable chunks.
23:00Right? Like, in in a code base, you have a function that's repeated three times.
23:05You take that function, and you pull it out into a shared function that is then, you know you reduce the duplication, basically. And we're able to do that now with our own procedures, with how we build software.
23:18We're able to take these like, okay. I've, you know, made this plan a 100 times. I know how to make good plans.
23:24I can turn that into a skill, distribute that to my team, and everyone can be planning in the same way, contributing back to that same skill, making everyone on the team better. So you're raising the floor, really, on what engineers can do. It's such an exciting time.
23:39And what I would say, though, is that skills, like there's like I'm gonna sort of confuse our terminology a bit.
23:48I think of there as being three things that you need to be good at anything, which is you need knowledge. You need the fundamental sort of, uh, what is that thing?
23:56Like, understanding it in your head. You need the skills. You need to be able to have done it a bunch of times to, like, um, you know, in muscle memory.
24:03And then you need wisdom. You need to know when to do it. You need to know how it fits in in the real world.
24:11And wisdom is almost impossible to obtain without actually having done the thing in the exact context where you need to do it.
24:20So if you want to be like someone at Anthropic, sure, you can gain the knowledge, you can gain the skills, but then how are you gonna gain the wisdom?
24:27Right? Like, you need to probably go to Anthropic to gain the wisdom to actually understand how to do the thing. You know?
24:33But I think it's, like, being able to bundle the first two, knowledge and skills, into something that's reusable is such a fascinating outcome of this weird age we're living in.
24:46So currently, we talked about skills. What's your agentic engineering setup?
24:51Like, what tools do you use? What models? How many agents?
24:56Yep. Um, so my setup is, um, I use Claude code essentially for planning and for, um, some implementation locally.
25:06So I'm using Opus 4.8 with, um, medium effort. It's kind of what I've landed on, and it works fine. I do most of my development and a lot of my work now AFK.
25:17So with me away from the keyboard. And the way I do that is with something I built, which is a tool called Sandcastle.
25:26And Sandcastle is essentially a way to run agents inside sandboxes. Okay. So you can inside sand like, if you don't run an agent in a sandbox, then it's gonna do weird stuff.
25:38So it might, you know, randomly delete your home directory or, you know, exfiltrate your environment variables out to bad sites, etcetera.
25:48With Sandcastle, you're essentially able to plug in things like Docker or Podman and run agents, run either this is what it looks like.
25:57Run Claude code inside some sandbox, which is extremely cool, extremely effective.
26:03And it means that you can paralyze a bunch of agents at once, either on your own machine, or you can use, like, Vercel sandboxes, for instance, to just ping up a remote agent and then pull the commits back into your local workspace.
26:18I've been doing that, and I've been combining it actually with GitHub actions. So we can see inside, for instance, here, inside the actions tab of map poker sandcastle.
26:27This one, this was an agent review action, which happened a little while ago, which checks out the branch.
26:36This runs on on a PR. It runs the review agents, which is just a prompt to have locally.
26:42We can see all of the things the agent did. It's checking various things, blah blah blah blah blah. Type check round clean, And then it replies saying, cool.
26:51It all looks good to me. So that's mostly how I've been doing things is running agents using Sandcastle on GitHub actions and essentially just telling them to do things.
27:04And that has been extremely unreasonably effective, because it you just get to paralyze as much as you want. You're not worried about constraining the resources on your local machine. And, yeah, it's just very, very quick to just spin up an agent and get it to do something.
27:19So in terms of models, are these 5.5 extra high? Are these another claw codes? What do you prefer?
27:24These are, I think, again, just claw code Opus 4.8 medium, I think.
27:31I don't think I've varied it too much, to be honest. I mostly don't worry about models that much. I mostly just use I think yeah.
27:42This is my sort of hot take, I suppose, which is that everyone is obsessed with the model.
27:47Everyone's obsessed with the engine of the Formula one car. Whereas, in fact, the engine is really only a part of the whole system. Right?
27:56You've got the entire chassis. You've got how it how it moves through the air. Everyone's obsessed with the model, and I think they should be more interested in the harness.
28:08What you can do to get the most out of the harness, giving it the right prompts, giving it the right skills to work with, and improving the environments in which the model runs, improving the code base and all that stuff. So, yeah, as I sort of said with Fable, like, I've the model is useful, but I think the harness
28:26has an equal amount of work, and you have much more control of the harness than you do the model. That's true. I would maybe challenge you a bit on this because I don't see why you cannot do both.
28:37Because, like, obviously, I I agree that you need the right skills, you need the right setup. All of that matters. But then if you swap in a better engine, all of that is instantly better.
28:47Yep. It totally is. Um, but I think they you need to think of them as fifty fifty.
28:52Right? So instead of the model being 90%, instead of the 10% optimization of Harlan like, everyone's so focused on the model, people are not so intrigued by so okay.
29:06Let's go back one step. There's a famous idea in ML, which is the bitter lesson. You know the bitter lesson?
29:12Yes. Yes. Lesson.
29:14Yes. The bitter lesson is the idea that whatever you do in machine learning research, compute raw compute will just beat you every time.
29:24Because compute is increasing at such a high rate that you can just essentially trust that the underlying thing will get better, and that will beat any optimizations you put on top of it. And there's a a sort of idea here that maybe I'm falling into the bitter lesson. That instead of, like like, optimizing my setup, optimizing my harness, I should just wait for the models to get better.
29:45Wait for the engine to get better, and then my car will be faster. I don't know. I still think there's a lot to be gained by just optimizing the harness and focusing on creating, like, good code bases that the agent can do well in instead of hamstringing the agent before it even gets started.
30:02I would say probably I I agree that you shouldn't wait. Like, that's that's that was a very stupid idea.
30:07People just waiting around for AGI or not doing anything. Obviously, I completely agree with you there. I would say I was I'm somewhere in the middle.
30:14I would say, like, I'm actively trying to improve my setup every single day, trying to, you know, get faster at using these agents, figure out, okay, should I be using Cmax here? Should I be using should I put this on VPS?
30:25Should I be using Telescal here? Trying to, like, actively improve everything else except for the model, but also trying to use the best model possible.
30:33Because fundamentally, like you said, you might be falling into that. I would say maybe if it's fifty fifty now for the simplicity of this argument, what if, like, the model really becomes a lot better?
30:44Right? Like, let's let's assume the next generation. Right?
30:46Like, Opus six, Fable six, GPT six, whatever, seven. Like, don't you think these models will require less steering and, like, less hand holding as they become more competent or or no?
31:00I'm not a pundit. Right? This is what I say to every single one of these questions.
31:04I'm trying to do the best with what I have right now, and I don't I don't have the insight to know whether these things will get better. I don't really want to make predictions about the future. I think that if I try to keep my workspace and my harness agent agnostic as much as possible.
31:25If I try to apply good software fundamentals to what I'm doing, if I do stuff that's always worked, then it will probably continue to work in the future. You know what I mean?
31:34Yeah. So if I try to over optimize around the model, if I get too focused on the model, I will lose focus on
31:41Yeah. No. I the fundamentals.
31:43That's that's that's my point of view. Yeah. So, basically, you're focused on, like, okay.
31:46What has been true for the last ten, twenty, thirty years? You know, the the the really best principles of great software.
31:54And it's likely gonna hold up with the next model rather than people going from the model first and, like, okay. This model maybe requires shorter prompts. This model, you know, sucks at that part.
32:02Let me patch that part. Like, building up, you know, properly proper foundation rather than, like, starting with the model, maybe. Exactly.
32:10People are focused on the wrong thing. They're looking at the big shiny new thing when in fact, just focus on the stuff that's been working for thirty, forty years. You know?
32:19Yes. And it really does work. You know?
32:22If you have a code base that's easy to change
32:24like, people like, people ask me all the time, like, how do you optimize for for token spend? Right? How do you optimize for token spend?
32:31Have a code base that's easier to make changes in, because then you can employ a stupider model. If your code base architecture is better, then you can get a cheaper model to do the same work, because it your guardrails are better. It's easier to explore.
32:46It needs to spend fewer tokens banging its head against the wall. If you're hamstringing your model from day one, then you will need a smart model to get the most out of it. But yeah.
32:56So I think thinking from the model first is the wrong way to do it.
33:01Yeah. So, basically, I would say, like, the exact opposite of you is, like, the quintessential vibe coder who, like, is switching tools every single week.
33:08Right? Like, there is a new replete update that goes to replete agent, switches to lovable, switches to this and that, constantly switching, and never learning any programming principles, anything about software engineering.
33:19Nothing. You're like your approach is basically, the difference is in approach. It's not like you don't believe in AI.
33:25Obviously, right now, you you're heavily trying to be at the cutting edge of AI and educating people how to use it. It's more about the difference of approach.
33:32It's like, listen, guys. Learn the fundamentals. Learn how code works, how good software looks like, and this is gonna be valuable no matter what.
33:40No matter if OpenAI is ahead, Anthropic is ahead, Gemini is ahead versus the exact opposite approach, which which unfortunately, I think most of the people who are new to AI take, is like jumping on the latest trend and, like, switching everything the moment, you know, some new update or tool comes out.
33:55Totally. And I think, you know, that's you know, you can do that, and that's exciting. But you're not really increasing your skills that way.
34:03And it's your skills, I firmly believe, that are the ceiling to what AI can do. You should be focused on yourself, you know, upskilling yourself for this new world instead of thinking, right, how do I delegate my thinking?
34:16How do I delegate more? No. You should be pulling more into your own domain and delegating only the tactical stuff.
34:23Keep the strategic mindset. Keep thinking about, you know, the next months and weeks ahead, the road map of where you're going in your code, instead of just trying to delegate that to. You know?
34:34People are obsessed by the idea that, you know, you can just delegate everything to AI, and you can't. You really can't. And I don't see I mean, again, I'm not a pundit.
34:42You know? I'm just looking at what we have right now. Yeah.
34:44And it doesn't yeah. I am the person in the real world that's driving this stuff.
34:50I need to be the one making product decisions. I know where I'm going. And I think me as a developer, I should be in control, and I need the skills to be able to do that.
35:00I agree. One note I'm gonna share on Fable is that happened yesterday, which is a bit scary, and it definitely doesn't follow security practices, is that I was setting up a new, like, a new agent for for, like, Twitter.
35:13And, basically, the Twitter API was bugged. The developer console was wasn't loading some buttons. And I tried it on a different browser.
35:19It still didn't work. I disabled all extensions. It still didn't work.
35:22So I gave it, like, a few solid minutes to try to debug it, and I failed. I mean, I didn't it wasn't the main thing I needed to get done, so I didn't really try as hard as I could. But I gave it to Cursor powered by Fable.
35:34It used the built in browser inside of Cursor. You know, I had to log in, obviously, to the console. But apart from that, it started clicking.
35:42It created API keys copy them. Again, I do not recommend this for production apps. This is just a simple thing for me.
35:49And then it figured out when it did the testing that those API keys were in a different, like, app in the console, and they actually weren't using the credits I charged up. So then it moved the app, again, using the built in browser inside of Curstor.
36:01And, like, for me, I really felt like, what am I doing here? Like, obviously, I described what we're building, why we're building it, some of the, you know, kind of my version of grill me at the start. But then I felt like, okay.
36:12I just logged in into the console, and I just charged up a few dollars, but, like, everything else that AI was doing. Right?
36:19So, like, I felt like my value in this project was a lot lower than with previous models. So what what's your thoughts on this?
36:28I mean, if you think about the AI's output, right, what it what it was doing at the end there, It needed like, how does the AI know at the end that it's done a good job?
36:40Right? What it is the theory here that you can disappear from the project completely? No.
36:44You're still needed. Right? Like, all we're doing here is we've just given the AI a set of tools, and we're we it's, you know, we've given it a scoped task, and it's performing that task.
36:54Right? You know, it we've given it a goal, and we said, you know, do blah blah blah blah blah. I don't think of that as that particularly magical.
37:00You know? That's something that agents can do now. You just give them the tools, and they go and do it.
37:05But to decide whether that's the right thing to do to security test that at the end of that, that's something that you're needed for.
37:13Right? You, David, are needed for that to know whether it's done a good job. And so, yeah, we can delegate more, but I don't think that's a reason to start thinking, you know, or have AI psychosis or anything.
37:25It's just, yeah, it's a reasonable thing that the AI can do with computer use.
37:29I've also seen a lot of people report, like, they they were, you know, maybe looking for optimizations or doing some feature.
37:37And then, again, I'm talking about Fable because it just came out. It's topical.
37:40Right? So it's on top of my mind. But a lot of people reported that it found, like, deeper bugs that they didn't notice at all, whereas other models completely missed those.
37:49And that I I I would like again, I would challenge you slightly that that's a sign of, like, AI being able to do more. I'm not saying we need to be completely removed from the loop, but, like, if the AI is, you know, redesigning the front end and it finds a issue in one of the, like, back end API endpoints, like a major security issue, I would argue that that's, like, AI being more involved.
38:10It's not a fifty fifty at that point.
38:14Yep. So you're saying that the better a the better the engine is, the more value you can bring to the business just by having the engine, and those effects are emergent.
38:25You don't know what you're gonna get by increasing the power of the Yeah. It will still know the vision. Right?
38:29It will still know what you're doing here. Like, this is an educational repository
38:33for my students in my paid community, or this is something just for my team. It will be used by roughly five people.
38:38The purpose of this is x y zed. It will still know Yeah. The core idea, the initiative that comes from you.
38:44But in terms of the actions and, like, what happens, my argument would be that as the models get more powerful, more and more of these is gonna be done by the AI. But not only that, the AI will spot what needs to be done, such as the example with the, you know, debugs that the user wasn't even debugging.
39:00Totally. But I I think that's we think that the model is the only way to get there. Right?
39:05What you could be doing is, in your repository, is you could run an a cron job that runs every single day, let's say, and does a security review.
39:14And every day, it checks a new part of the repo. Right? And you could use a relatively simple model for that, and you'd probably get some decent results.
39:20I mean, this idea that there are deep bugs that you know, or deep sort of security things inside your application that the model could spot and others cannot.
39:30You know? Like, sure. That's like, it sounds attractive, but you could probably also uncover those bugs with cheaper models if you just looked in the right places, you know, and you gave it the right prompt, let's say, for want of a better word, or the right harness.
39:44So I don't think there's something that's necessarily special about the model that does those things. Or, you know and I think that's, again, fifty fifty.
39:52If you had a harness that sort of was looking specifically for those things, then you would find them. And I think we're lagging behind in our practices and expecting the model to just pick up the slack.
40:02You can absolutely just run Opus and get it to do that stuff. You know, people were talking about this, like Yeah.
40:08Yeah. Yeah. When Opus 4.5 came out.
40:10Woah. All these security things that Opus it's just, like, sure. It's found them, and you can just get that with a harness and just get it to do it again and again.
40:19I'd like
40:20yeah. I I I understand. I understand.
40:22Like, you're basically pushing against the hype wave. You know? You're you're trying to, like, implement some sense, some wisdom into this.
40:28Say, like, guys, okay. The models are getting better. Yes.
40:31But at the same time, let's not lose the obvious, you know, optimizations, the obvious things that has always always been true. Maybe, like, if you had a better harness, you could support it even to the previous generation model, or maybe you didn't have to spend $2,000 on API tokens, maybe only 200, you know, stuff like that.
40:47So, yeah, I I completely agree with you there. You're trying to be like voice of One thing just to to finish there, which is that what
40:54is this what is this thing that you've learned from Fable looking at your code and spotting a security issue? What you've actually learned sure. You've learned that Fable is good, definitely.
41:03But you've also learned that there are security issues in your code. Right? Yeah.
41:06And you should probably have something that runs and checks for more security issues in the future. We need to build loops into our loops, they'll take Into our sis I mean, we can talk about that as well if can talk about some opinions there.
41:21You need to build these systems that just check your like, you need what am I trying to say?
41:30You need to figure out why it happened. Like, why it even got to this place? You know, it's like if someone keeps stealing your bike, maybe buy a lock.
41:38Yes. Exactly. Maybe we need to be designing systems that are self improving over time.
41:46Right? And this is something that we've been doing as software engineers for a long time. We write test suites so that we can test our own code.
41:53We do human reviews so that we can make sure things are looking the way they need to. We refactor so that we can change code better in the future.
42:00And sure, a model has uncovered that we need to do a bit more of that, so let's do a bit more of it. But we don't need to use the fancy model in order to get those insights. See, that's what one of the things I would put on the list.
42:11Is, like, the the thing that really separates the people who are gonna go super fast with the AI and build better and more software versus people who are not. Like, most people in that situation, they would just say, oh, yeah. Fable is great.
42:22Fix the bug. It fixes the bug. But, like, the the people I don't know if it's, like, 10x developer.
42:27It's almost like 10x AI builder, you know, because everybody's becoming more of a builder, whether it's a designer background, a developer background.
42:35It's like that person would look at the underlying issues. Like, how did that even happen? How did I have this bug for so long that I didn't notice it and try to patch the underlying issue, you know, whether it's a new skill, a new system, better staging process, whatever?
42:48That I think I would put as one of the things on the list of your human capabilities or things you should have to get the most out of AI.
42:56Totally agree.
42:58Alright. So you mentioned loops. This was super viral on Twitter.
43:01Maybe it still is, but, like, you know, a week ago. I think it started with Peter Steinberger, if I'm not mistaken.
43:07But, basically, people are, like, obsessing over agentic loops. Half of it, I would say, is like the research labs selling more tokens. You know, basically, you should be running loops to pay us more endless tokens, stop prompting your agents, figure out what loops it can run forever permanently.
43:21Half of it could be useful. What's your thoughts?
43:25So what we're essentially talking about here is the difference between human in the loop work and AFK work. Right?
43:32Human in the loop work being the human, you are there with the agent, talking together, and figuring out something.
43:39So really useful for planning, really useful for some kind of more complicated implementations, really useful for unscoped work, you know, stuff that you just need to figure it out locally with the agent. And then we're talking about AFK stuff.
43:52So AFK away from keyboard, you ping off the agent, and it goes and does something. Now, I think that I mean, the moment that I discovered AFK was the moment I really got into AI coding, and the moment I was really able to massively increase my output.
44:09Because then instead of me having to sit in the loop, handle all the permissions requests, handle all of the, you know, anything the agent needs to ask me, the moment I can just remove myself from the equation, I've paralyzed myself. Suddenly, there are two of me, you know, three of me, four of me, five of me, able to go and produce so much more code that I then go and review.
44:28This idea that loops are the only way to do it is crazy. You know? Like, we're essentially talking about the history of this goes back to Jeffrey Huntley.
44:36Where is it? G Huntley. Ralph.
44:39It goes back to Ralph. Remember Ralph? Yep.
44:42I was talking about Ralph in January, I think. The original article comes from July 14.
44:48Interesting. And essentially, it's a loop. So this is the idea where you have a while loop that says, okay, pass this prompt to Claude code, and then eventually, you'll be done.
44:59Now, it's essentially just running Claude code again and again and again.
45:06That's the idea of the Ralph loop that I was talking about for a while. And what I realized is I don't really need to run this as a loop. Right?
45:14The only thing I need out of this is the AFK agent to take on a specific task and do that task. The way I mostly think about these things as queues. Okay?
45:24Okay. Queues, not loops. The queue is really the backlog of tasks that I need to complete.
45:29I'm looking at the sandcastle issues right now. These are bug reports coming in about sandcastle, feature requests, things like that.
45:36I need to scope the item. Let's say it's this, for instance. So I've done a bit of triage here.
45:42It's sort of explored. Okay. Is this trivial?
45:44Is this possible? This was done AFK. Right?
45:48So this this item has been picked off the queue. It's been explored, been put back on the queue. I might then need to go and actually implement this.
45:56Looks like yeah. This looks pretty good. I'll actually add the agent implement label, and I'll go and implement this in my GitHub action sandcastle setup that I was talking about earlier.
46:06Now, this isn't a loop, really. Like, it's sort of just it's a queue that eventually gets resolved.
46:12This will come off the queue once it gets once the pull request gets merged. And that's all development is, really.
46:20You just have a queue of tasks that you need to get done. Project managers add more stuff to the queue. You complete the tasks in the queue.
46:27Like, that's how we've always done it, and there are multiple nodes picking stuff off the queue, multiple developers. And so an idea that there's a single loop that just sort of goes and completes all the tasks doesn't really match with how, like, you developer teams generally work.
46:43When it's all sort of inside GitHub actions like this, anyone, any developer can add one of these labels, trigger something, and can just get work going.
46:52So yeah. I I think the idea of the loop is useful, but it's not the whole picture.
46:58And I think an idea of a queue where you're picking tones off is is better. But mostly, it's just sort of nonsensical, really. Like, when people talk about you need a loop prompting your agent, we're really just talking about AFK agents.
47:10Yeah. I guess, uh, when you talked, I don't know why, but the image that came to my head is like a medieval king managing a a kingdom with, like, some ministers or whatever.
47:20And, basically, assuming, you know, the king knows the best, know has mold the most context, not like a king that just, like, randomly got inherited empire.
47:29Right? So if you deployed a minister into some region, far region, and you never heard from him, never gave him commands, he would be running on a loop.
47:38And that could go wrong or could go right depending on, you know, how complex the issues are in that region, how smart the minister is, whatever. But ultimately, as as the king in that medieval kingdom, you wanna do the queue approach. You wanna have people come to you and say, like, we have a problem.
47:53Upcoming invasion. You know? Or there's a famine in this region.
47:56And, like, you have this queue of problems and you are still in charge, so that would be the equivalent of a human here with, you know, a bunch of agents, bunch of AIs. Still you would be prioritizing. Okay.
48:07We have these 50 bug reports. Only three of them are critical. Let's fix those first.
48:12Okay. We have these resources. This brand deal, this company wants to work with us.
48:17Check their reputation first. Is that a good way for to think about it? Totally.
48:22And what we're doing here is, like,
48:25you're still able to build tons of automation into here. Let's say that I had some kind of telemetry set up for Sandbox for Sandcastle, or like an observability tool like Sentry or something. I could get a bug report from a live application, create an issue from it, immediately tag that issue as, like, explore the issue.
48:42Maybe the agent could return some structured data from the explorer saying, can we fix this immediately, or does this need a human in the loop? It goes and implements it. It goes and reviews it.
48:50And then maybe it has a little tag on it saying, can we automatically merge this, or does it finally, like, ping the user to go and do it? Like, I see these systems as you need human in the loop checkpoints, and you need to push those further and further right, further and further towards the final thing as as or the final output as you can.
49:11So you would essentially get these, like in instead of, like, seeing the bug reports, you would see the bug reports, you would see the exploration of the code base, you would see the fix, and you see, can we yeah.
49:25Just, like, that's that's what you get as the human instead of seeing the bug reports. And it's just so much richer Yeah.
49:31And it means it's one button click away instead of a whole debugging session away. So that's
49:37I mean So then the question is, like idea here. Where Yeah.
49:40So the question in that situation becomes because it's not a loop. Right? It only runs when the bug comes.
49:45There's no point for it to running infinitely, which is paying OpenAI or infinitely. But my question was, like, you know, again, as the AI gets more powerful, where because you mentioned you push yourself further further to the right to, like last step is pushing to production.
50:02What are the like, when does it cross the threshold where, like, these type of things, whether it's like a small UI change, you know, user requests a new color scheme, whatever. Like, it could be approved automatically. Right?
50:13And then maybe we go more and more. So how does that look like? Do you see what I'm what I'm getting at?
50:17Oh, how do you remove human loot checkpoints is what you're Where do you decide, basically, where it's it's trivial enough for you to not even look at? Right? Like, maybe maybe all the agents you have, which, again, you set up the harnesses, didn't have your skills, you use a good model, and all the agents are like, okay.
50:35This is a small bug. It was just a misaligned UI element. There is no, you know, harmful intent from the user.
50:42The user isn't trying to hack the application. We're just gonna merge it into prod right away. That will presumably grow, like, the the scope of things that could be merged to prod right away.
50:51So how would you think about
50:53Well, what I'd say is, like, what do you gain from review? Right? Sure.
50:57You gain okay. Like, you gain the ability to gate things, gate dangerous things from going into production.
51:04So prevent security, bad stuff happening.
51:07You know? Yeah. Prevent, you know, let's say, Claude code, source code being leaked to the world.
51:13You know? You you prevent that bad stuff.
51:18But you also gain insight into your own system, into the into the plumbing. Right?
51:23So you're watching the thing do its work, and you're assessing, did it do a good job? And so that second one, you don't wanna lose that because you like, again, we're talking about the harness.
51:34Right? You want to improve your harness over time, and you want some observability into it. Now, you could remove some human in the loop checkpoints.
51:41So you could say, okay, this this PR is just an internal refactor. It just moves some code around.
51:47It doesn't actually change any behavior. And you could have an AI that kind of says, okay. You don't really need to review that one.
51:53But then who reviews the AI that's doing that? Right? How do you give feedback to that over time?
51:59You probably do need to check some of the PRs that the agent says are fine to review, to check if they are actually fine to review. Review, and then you improve that over time. And so we need to think about this.
52:08We're not just reviewing the code. We're also reviewing the system that produces the code, and that is important and useful.
52:15But I agree. The goal is to remove human loop checkpoints where possible. Definitely.
52:20So maybe the better way rather than, like, okay. Let's say, in a
52:24average day for this application, AI autonomously fixes 20 things and pushes to production right away because they were super small.
52:31At the end of the day, instead of you, like, reviewing all these because I'll be boring and slow, maybe you get a custom, you know, teach skill HTML file and say, like, okay. This is the common patterns in the bugs that were fixed. Right?
52:42So, like, instead of you having to go through all of the GitHub comments, PRs, whatever, which is not really optimized for this agentic era. I mean, again, GitHub was created a long time ago. It would be a custom software, a custom HTML file, whatever, that's you know, knows you, your learning style, your common mistakes.
53:01It has a history of the bugs in the past, you know, whatever. And it will be more optimized to helping you improve yourself and the system.
53:09Totally. I mean, one really cool like, what we're talking about here is in making review seamless and taking taking the human effort out of review.
53:20One thing that I've seen people do, which is crazy, is on any front end change.
53:25It gets the AI to record a video of itself walking through the code, and, like, the the thing that changed. It then calls a text to speech API and overlays some speech on top.
53:36So it's like the AI is talking to you while it walks through the code, and you just have a video on the PR of the thing working. Like, that sort of richness is something that we should be building into everything that we do and trying to optimize for human review and make human review faster. Because everyone's sort of moaning about, you know, like, oh, man.
53:55We've got so much code to review. But probably you could be using AI to help you review the code.
54:01Right? Like, in in all sorts of interesting ways that I think we're just scratching the surface of.
54:06Absolutely. So a lot of people wanna build something with AI.
54:10Right? Whether, like, you could start with some personal tools, some, you know, something for your team, but a lot of people wanna build a, like, business, whether it's AI startup, whether it's some other business.
54:19How would you think about that? Like, you know, a lot of people there's there's a group of people who say, like, oh, yeah.
54:25SaaS, you know, subscriptions. They're gonna be more valuable than ever because you're gonna be adding more seats for the agents. There's a group of people who say, like, SaaS is dead.
54:33How are you thinking about building a business, building software in the age of AI?
54:39Well, I I don't think that much has changed about it, to be honest. Like, again, I'm not a pundit.
54:44I don't really watch markets. I don't really, like, care whether SaaS dies or thrives. Like, if you're building a business, what you need to do is the fundamental stuff.
54:53You need to go and talk to customers. Need to figure out what they need, and then you need to build stuff, um, like, you need to build prototypes that look like what they need and solve their actual problem. I don't think anything has changed there.
55:03And I think you can learn to do that and be better with it, but I don't think AI gives you any particular advantage there. Because what you need to do is go out in the real world and have conversations and figure out what it actually is people need. So I think all of the classic product design books will still make sense here.
55:20It's just you have a massive leg up when it comes to actually implementing it. And the procedures they talk about, you can start delegating them to AI too. So mostly, though, it's just about having the right idea and building the right thing, and that's not something that AI can help with if you're not also talking to actual people and figuring out what they want.
55:40As soon as you figure out what people want,
55:42you're good to go. Yeah. I think that's actually the thing that AI is notoriously bad at.
55:47It's like the original ideas out of the box. And, yeah, like, that that would be probably one of the main pieces of advice I would give to people is, like, you need to be choosing the features that get added. Right?
55:58If you see somebody who's, like, delegating all of that, it's like, what's the next week thing we should add? It's like, no. You should be in charge of the product.
56:04You can yeah. Obviously, you don't have to, like, learn the exact syntax or whatever. You don't have to read every file, but, like, you cannot be asking the AI to build your app.
56:13Like, you need to have the vision. You need to know why you're building it and, like, what problem it's solving. Absolutely.
56:18You should be asking AI what thing you can remove from your app, basically. You should be asking, how do I make this simpler? How do I improve the UX?
56:25How do I actually focus in on what people want instead of ending up, like, you know, one of those dreadful
56:31VC funded apps that we've all seen where there's a thousand features and you can't find the thing that you want to do. So, again, this is just product design fundamentals.
56:40We mentioned that senior devs get, like, 10 x improvement and, you know, speed up. How do you because from my experience, that's true, but only if they actually use the AI tools.
56:51There's a group of still that are kinda refusing to believe it or, you know, AI is not that good. They tried it a year ago, two years ago.
56:58They were, you know, disappointed. But, obviously, tools, harnesses, models are much better. But my counter argument or maybe it's not a counter argument is like, what about just hiring the true if you were hiring, hiring young people who are true believers in AI, who, like, know these tools inside and out.
57:14They use them all the time. They know what's the best model, what's the best skill, what's the best, you know, agent in each situation. And, obviously, they need to have some technical fundamentals.
57:25But, like, how would you reconcile this tension of, like, these are seniors who have ten, fifteen, twenty years of experience and they get a 10x versus these are, like, true AI believers who might not have as much experience as the seniors, but, like, are better operators at using the AI?
57:43Well, hiring great juniors has always been the goal of any company, basically. Because if you find a great junior, then anyone who's enthusiastic will do a better job than someone who's more experienced, basically.
57:54Like, enthusiasm beats experience just in pure output and because they develop so much faster and they learn so much faster. And so people who are really excited about this new age and know a lot about this stuff, if you can just pair that with a little bit of software fundamentals with Because what we're talking about here is I think of there as being a difference between DX, developer experience, and AX.
58:17Right? Agent experience. And so agent experience is the experience that the agent has working in the code base.
58:25And anything you can do, whether that's better skills, you know, increasing the power of the model works, course, know, improving the harness, and improving the code base as well is like that's amazing.
58:37Often, people forget about improving the code base, actually, for better ax, you know, improve about they forget about all the edges you can get with, like, good software fundamentals. And so that's where the senior will be useful, because the senior knows how to build good dx.
58:53Right? They know how to if they're a good senior, they know how to build a code base that can work well with humans.
59:00And there's a huge overlap between good d x and good a x. But they're just coming at it like the junior who's great at AI is just coming at the problem from a different point of view from the senior.
59:12What was your original question? How do they get hired, or, like, how do you or, like Well, how how would you hire out of both of them? Sure.
59:18But not not not, like, who would you hire, but, like, who will maybe get more alpha? You know, who'll be more valuable? Like, is it, like, the senior who has a lot of these experiences, you know, the right way of thinking about software, but maybe isn't as true of AI believer and versus somebody who's, like, fully embracing AI to the maximum and knows how to use it to the fullest?
59:37I think if you have an experimental mindset and you're excited about AI, then you're gonna get a ton out of it, whether you're junior or senior. And I think, again, if you're intrigued by the harness, first of all, and intrigued by improving ax everywhere that you can, then you're gonna you're gonna thrive and love it. Now, there's obviously a lot of, like, good reasons that people have for not wanting to get on the AI train.
1:00:00You know, they might just be a bit, you know, squeamish with the ethical stuff, you know, like, anthropic stealing everyone's novels and just pumping them into Claude. But it is here, and that's how the job is now.
1:00:18If you're just a tactical programmer, just plumbing away, doing your work, you're gone.
1:00:24Right? Like, that's how, you know, you can't be a code monkey anymore. You need to think strategically.
1:00:28And so seniors can absolutely make the most of that, but juniors can learn that too.
1:00:33Alright. My closing question is gonna be practical for the people watching. If you could take the average AI enthusiast and give him, like, one or two action steps to do today to either improve his setup, improve his harness, learn something, what would those one or two things be?
1:00:49First thing I would do is I would delete every single skill, every single plugin, every single MCP server. I would go back. I'd delete your claw.md, delete your agents.md, go back to absolutely nothing, and then observe the agency what it does.
1:01:02In my experience, everyone bloats up their context window with too much stuff, with too many instructions. Go back to a blank slate, and see what the agent does.
1:01:13Once you're seeing what the agent does in that basic sort of mode, then layer things on top of it, and make sure those things are procedures procedure skills, not ability skills.
1:01:26Layer things on that you yourself decide. And my skills repo is is a great place to start there. If you really miss something, if you really miss, like, brainstorming from superpowers, then bring that back.
1:01:36If you miss this, if you miss that, and make sure that you install them in a way that you can customize them, you can play around with them, and experiment. You know? If you're noticing problems, then try to find solutions to fix those problems.
1:01:48And try as much as you can to delegate the implementation to an AFK agent. AFK is just incredible way to work. It's just takes a little bit of setup, but once it's set up, it's just goes crazy.
1:02:02Alright, bud. Appreciate your time. Where should people find you?
1:02:06Find me on Twitter. Find me at aihero.dev, and I've got a newsletter where I post about all this stuff.
1:02:10So a ihero.dev. Especially if you want to learn about my skills and learn about updates to them, then go to a ihero.dev/skills.
1:02:18Alright. I'm gonna link all of that below. Once again, thank you for your time, Matt, and have a great day.
1:02:23No worries, David.
The Hook

The bait, then the rug-pull.

The video opens cold on the punchline of the entire conversation: everyone is obsessed with the model when they should be obsessed with the harness. Over the next hour Matt Pocock — who runs the most-used Claude and Codex skills repo — defends that claim against a host who keeps trying to talk him into chasing the newest engine, and walks through the exact setup he uses instead.

CTA Breakdown

How they asked for the click.

Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this