Modern Creator
Nate Herk | AI Automation · YouTube

How to Build Effective Claude Code Agents in 2026

A 68-minute screen-share where Cole Medin walks through the five-part system that turns prompting-and-praying into directing your coding agent.

Posted
3 days ago
Duration
Format
Interview
educational
Views
30.5K
865 likes
Big Idea

The argument in one line.

You stop vibe coding the moment you sandwich the agent's work between heavy upfront planning and a verification harness that proves the work is actually done, then turn every failure into a permanent rule.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You use Claude Code daily and keep getting mediocre results from the newest model, suspecting it is the model rather than how you load context.
  • You run a business on AI automations and want a structured operating system instead of one-off prompts that you rebuild from scratch each time.
  • You are non-technical but want to direct a coding agent confidently, including understanding the code it writes well enough to trust it.
  • You are moving a team off ad-hoc vibe coding toward a shared, repeatable standard for using AI coding assistants.
  • You want concrete patterns for chaining multiple agent sessions so one large task does not collapse halfway through.
SKIP IF…
  • You only want a beginner walkthrough of installing Claude Code or basic terminal commands — this assumes you already use it.
  • You are looking for a specific framework's setup tutorial rather than the mindset and engineering principles behind directing agents.
TL;DR

The full version, fast.

The conversation reframes Claude Code from a coding tool into an operating system you direct like a product manager. The core is a four-step loop — plan, build, verify, evolve — wrapped around two disciplines most people skip: heavy upfront planning that manages a scarce context window, and a verification harness that lets the agent prove its own work instead of just claiming it is done. Context matters because every model has a dumb zone (roughly 250k tokens on Opus) where it starts missing the obvious, so you load only what is needed and reset between tasks. For large jobs you chain multiple single-purpose agent sessions assembly-line style rather than trusting one session end to end. Security means assuming the agent will touch anything it can reach, enforced with hooks rather than prompts. And every bug becomes a permanent upgrade — a new rule, doc, or skill — so the system gets smarter every week.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0005:46

01 · Intro

Cold open promise — be the director of your coding agents. Cole's background (engineer since age 8, Fortune 500, all-in on agentic AI), the two creators' history, and the framing that Claude Code is a second brain, not just a coding tool.

05:4607:41

02 · Sponsor

ClickUp / Brain 2 ad read.

07:4113:17

03 · Stop vibe coding, start directing

The four-step loop — plan, build, verify, evolve — introduced as the system that replaces prompting-and-praying. Planning and validation are the two things people skip. Every loop ends with a chance to evolve the system; treat the agent like an employee you train.

13:1719:46

04 · Make the agent prove its work

Verification means 'prove to me it's done.' Cole's Excalidraw skill renders a PNG so Claude can inspect its own diagram for overlaps. Defining a harness — the wrapper of model plus tools plus context — and the silly video-game example of slowing the frame rate so an agent can verify like a user. Plus understanding code via slash-by-the-way sidecar conversations.

19:4627:01

05 · Why you plan more than you build

The PLAN.md north star — goal and success criteria, codebase and docs analysis, integration points, task-specific rules, granular task list, validation strategy — plus the workflow: prime, research with sub-agents, write the plan, have the agent ask questions, then execute. Why Cole skips plan mode for a custom planning skill he controls, and why he builds on raw Claude Code rather than OpenClaude or Hermes.

27:0133:42

06 · The dumb zone

Attention is scarce. The million-token window is a false sense of security; every model has a dumb zone (around 250k on Opus, ~100-125k on Sonnet) where it starts missing the obvious. Context fills fast, MCP servers dump 20k tokens up front, and the needle-in-a-haystack problem worsens deeper in. Skills give procedures on demand rather than dumping everything at once.

33:4244:23

07 · Harness engineering and the Ralph loop

Chaining agent sessions assembly-line style: planner hands off to implementer, implementer writes an execution report, reviewer validates. The Ralph loop as the foundational pattern. Why sub-agents struggle with handoffs, why agent teams are powerful but token-heavy, and Cole's Archon project for deterministic workflows that pick when the model works versus when code runs.

44:2351:26

08 · The security problem nobody plans for

Assume the agent will touch anything it can reach. Three false senses of security: a prompt that says never delete, blocking delete SQL, and even blocking the delete command (it can still write and run a script). The real-world incident where a proactive agent emailed the whole list a discount code by mistake. Hooks as the actual permission layer — inspect every tool call before it runs.

51:2659:59

09 · Every bug is a permanent upgrade

System evolution as the most important habit: after every issue, add a rule, a reference doc, a plan update, or a new command so it never recurs. The floor keeps rising — doc to command to skill — and you start welcoming bugs. Finding edge cases up front by asking the agent 'how could this go wrong,' then engineering a test to break it. Debate panels and adversarial development for decisions.

59:591:07:56

10 · Top 3 Claude Code features

Cole's three: skills (number one — any reusable prompt becomes a skill, and the skill-plus-CLI combo beats MCP for token efficiency), sub-agents (parallel research and context extraction), and hooks (security plus the second-brain memory loop). Nate's three: skills, status line, and routines. Closing: be the product manager for Claude Code — give it the why (intent engineering).

1:07:561:08:12

11 · Outro

Where to find Cole Medin (YouTube, LinkedIn), and the free resource guide in the Skool community.

Atomic Insights

Lines worth screenshotting.

  • Every large language model has a dumb zone, and on Opus it kicks in around 250,000 tokens — well before the advertised million-token window.
  • The million-token context window creates a false sense of security; attention is scarce, so what you load up front matters more than what the model can theoretically hold.
  • You are not vibe coding when you sandwich the agent's coding between planning and validation that you are heavily involved in.
  • Without verification checks a first pass lands around 65-70 percent; add them and the same task comes back at 92 percent on the first pass.
  • Connecting twenty MCP servers floods the context with tens of thousands of tokens of tool definitions up front, which is why the latest Opus still acts dumb.
  • With coding agents you spend more time planning than building, because the agent's success is entirely a function of how good your plan is.
  • Telling an agent never to delete a database does not stop it; if you block the delete command it can still write a script that runs the delete.
  • Assume anything the agent can read or touch, it will — even if you never ask it to — and that assumption is what saves your database.
  • Hooks, not prompts, are the real permission layer: a hook can inspect every tool call before it runs and block dangerous commands.
  • A skill plus a CLI beats an MCP server for giving an agent a tool — it is more token-efficient and you control exactly how the agent uses it.
  • Treat every bug as data: instead of just fixing it, add a rule, doc, or skill so the same failure can never recur, making each bug a permanent upgrade.
  • Sub-agents are great for parallel research but poor for chained work, because passing a clean handoff document between them is hard and token-heavy.
  • The Ralph loop chains multiple agent sessions — each handles one phase and writes a handoff report for the next — so no single session hits the dumb zone.
  • A spin-up debate panel of seven persona agents that research independently and argue to consensus is a better way to get a decision than asking one model its opinion.
  • Adversarial development — a second session prompted to be mean and play devil's advocate — surfaces problems a happy-go-lucky single session hides.
  • You are the product manager for Claude Code: you do not have to say how to build it, you have to shape the vision and give it the why.
  • Giving the agent the why behind a task shapes the how far better than describing implementation steps — the 4.8 docs explicitly recommend this.
Takeaway

Direct the agent; don't pull the lever.

WHAT TO LEARN

Reliable agent results come from wrapping the build between heavy upfront planning and a real verification harness, while protecting a context window that gets dumb long before it gets full.

03Stop vibe coding, start directing
  • Sandwich every agent build between a plan you wrote and a verification step you stay involved in — that sandwich is the literal difference between directing and vibe coding.
  • Treat the agent like an employee you train: every loop ends with a chance to improve the system so next time is better.
04Make the agent prove its work
  • Build a harness so the agent verifies its own work as a user would (render a PNG, drive a browser, run the input) — it lifts a first pass from ~65-70 to ~92 percent.
  • Do not care about early-pass mistakes as long as the agent can iterate to a clean final output on its own.
05Why you plan more than you build
  • Write a PLAN.md before non-trivial work covering goal, success criteria, integration points, task list, and how the agent will prove it is done.
  • Force the agent to ask clarifying questions before building so you are aligned on what gets done and how it gets validated.
06The dumb zone
  • Front-load context deliberately because attention is scarce and what you load up front decides the quality of the output.
  • Stay out of the dumb zone — roughly 250k tokens on Opus — by resetting between tasks rather than trusting the million-token window.
  • Let skills pull procedures on demand instead of dumping twenty MCP servers' worth of tool definitions into the window up front.
07Harness engineering and the Ralph loop
  • Chain single-purpose sessions assembly-line style for large jobs so no one session collapses in the dumb zone halfway through.
  • Use sub-agents for parallel research, not chained work, because clean handoff documents between them are hard and token-heavy.
08The security problem nobody plans for
  • Assume the agent will touch anything it can reach, scope keys and permissions, and enforce with hooks that inspect every tool call rather than relying on prompts.
  • Remember that blocking a delete command is not enough — the agent can write a script and run it, so design for two-step workarounds.
09Every bug is a permanent upgrade
  • Turn every bug into a rule, doc, or skill so the same failure cannot recur and the system gets measurably smarter each week.
  • Find edge cases before they bite by asking the agent how this could go wrong, then engineering a test that tries to break it.
  • Reach for agent teams for research and consensus — debate panels, adversarial review — not for deep development.
10Top 3 Claude Code features
  • Make any reusable prompt a skill, and pair a CLI with a skill instead of an MCP server for a more token-efficient, controllable tool.
  • Give the agent the why behind a task; shaping intent and vision produces a better how than dictating implementation steps.
Glossary

Terms worth knowing.

Dumb zone
The point in a conversation where a model has absorbed enough context that it starts missing obvious things and making mistakes a fresh session would not. On Opus it begins around 250,000 tokens, and smaller models hit it far sooner.
Harness
The wrapper around a large language model — the system prompt, tools, and context that let it know what it is working on and how to act. Claude Code itself is a harness; you also build a verification harness so the agent can test its own work as a user would.
AI layer
The part of the harness you build yourself on top of the model and tool: your CLAUDE.md, skills, hooks, and MCP servers that connect the agent to your CRM, task manager, and other platforms.
Hooks
Small pieces of code Claude Code runs when an event fires — session start, session end, or right before a tool call — used here to security-check commands and to auto-summarize sessions into a memory file.
Ralph loop
A harness pattern that strings multiple agent sessions together: one session reads the spec and defines phases, then a chain of agents each completes one phase and writes a handoff report for the next, avoiding the dumb zone on large tasks.
Harness engineering
Building the workflow that orchestrates many coding-agent sessions to handle a task too large for one session, making a non-deterministic system as deterministic as possible.
Agent teams
Claude's mechanism for multiple agents that can communicate with each other, a step above sub-agents. Powerful but unrefined and token-heavy; best for research and consensus, not deep development.
Adversarial development
Running a separate session after a build whose only job is to play devil's advocate and attack the first session's work, surfacing problems a single agreeable session would miss.
Vibe coding
Throwing a request at a coding agent and accepting the output without upfront planning or after-the-fact validation — prompting and praying, like pulling a slot-machine lever.
Intent engineering
Shaping what gets built by giving the agent the why and the vision rather than step-by-step implementation instructions, which the agent uses to figure out a better how.
Archon
Cole Medin's open-source project: a CLI plus skill that builds deterministic workflows on top of Claude Code, letting you pick exactly when the model reasons versus when plain code runs, instead of having the agent orchestrate everything.
Resources

Things they pointed at.

16:40toolPlaywright (browser automation for verification)
16:50toolVercel agent browser
20:40toolMatt Pocock's 'grill me' skill
35:20conceptRalph loop (multi-session harness pattern)
38:00productArchon (Cole Medin's open-source deterministic-workflow CLI + skill)
23:00toolOpenClaude / Hermes (open-source agent frameworks Cole chooses not to use)
04:00toolExcalidraw diagram skill (Cole's)
1:01:20toolClaude routines (scheduled cloud agents)
Quotables

Lines you could clip.

00:35
The main thing I wanna talk about today is how we can be the director of our coding agents.
states the entire thesis in one sentence, no setupIG reel cold open↗ Tweet quote
27:20
Large language models have what's called the dumb zone. With Opus right now, it's usually around 250,000 tokens.
names a specific, memorable, contrarian concept with a hard numberTikTok hook↗ Tweet quote
04:10
Without the verification checks, maybe it's 65 or 70, but now you can get something that is 92 on the first pass.
concrete before/after numbers that sell the whole verification ideanewsletter pull-quote↗ Tweet quote
45:00
If you have the mindset that anything that the agent can read or can touch, you have to assume that it will.
tight, quotable security principle that stands aloneTikTok hook↗ Tweet quote
01:30
It ended up sending an email to our entire list with a discount code, and it was not supposed to go out.
a real horror-story payoff that hooks anyone running automationsIG reel cold open↗ Tweet quote
52:40
Every bug becomes a permanent upgrade, so once you have this system in place you actually almost welcome bugs.
counterintuitive mindset flip in one linenewsletter pull-quote↗ Tweet quote
1:07:00
You could think of yourself like the product manager for Claude Code — you don't have to describe how to build something, but it's important for you to shape the vision.
reframes the viewer's whole role in one sentenceIG reel cold open↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:00What would you say by the end of this podcast that everyone will have learned from you? The main thing I wanna talk about today is how we can be the director of our coding agents. Everyone is hearing nowadays how large language models can support up to 1,000,000 tokens in their context.
00:13That's like the Harry Potter book five times over. Large language models have what's called the dumb zone. With Opus right now, it's usually around 250,000 tokens, and I feel like it gets into the dumb zone.
00:23It definitely comes with a false sense of security
00:26of people now thinking that they have the million. With coding agents, you spend more time planning than you actually do building. Without the verification checks, maybe it's 65 or 70, but now you can get something that is 92 on the first pass.
00:36If you tell it never to to a database, it's still gonna do that. If you don't allow it to delete a folder, it can still write a script to do that. Recently, did happen to us.
00:46The agent was trying to be proactive, and it actually saw something on its task list, but it misinterpreted it. And it ended up sending an email to our entire list with a discount code, and it was not supposed to go out. If you have the mindset that anything that the agent can read or can touch, you have to assume that it will.
01:02Even if you never ask it to, that assumption is what's gonna save you from having your database deleted.
01:10Alright. Cole, thank you so much for being here today. I'm so excited to dig in.
01:15I'm excited to be here. Yeah. Thanks for bringing me on to your podcast, Nate.
01:18I'm looking forward to Absolutely. Yeah. It's been a a long time since we've talked, so I'm excited to hear what you've been up to and to hear kinda like the sauce that you're gonna drop on everyone today.
01:26So real quick, what would you say by the end of this podcast that everyone will have learned from you?
01:31Yeah. So the main thing I wanna talk about today is how we can really be the director of our coding agents. And specifically, Cloud Code because that's what most people use right now.
01:39That's what I use. But really, it's creating that system where you have your your way of working with Cloud Code that evolves itself over time.
01:46And we're gonna talk about more than just using it to code. Really, I use my Cloud Code as my second brain, I like to call it. I know Nate kinda calls it as AIOS.
01:55Everyone has their term for it, but really, like, Cloud Code as the tool to make your business AI native, we're gonna get into all of that. And just some high level strategies that honestly you can start applying today.
02:05I love that. Yeah. I'm I'm super excited to dig in because, you know, I don't come from a formal software engineering background.
02:11And I think that I would I would guess that the majority of my audience doesn't either. But, obviously, with the the products being called Cloud Code, I think a lot of people that I bring that up to who aren't super deep in AI space, they obviously think that it's a tool that is for coders, and you need to understand code in order to use it.
02:26So I love that framing. And real quick before we jump in, you know, me and you have we've known each other for quite a bit. Feel like, you know, right when I kinda quit my job and started on this space, you were one of the main channels that I followed and I still follow to stay up to date and to to learn about how to work with AI in the right way.
02:43And we've kind of just been able to see each other grow and and, you know, check-in, so I'm really excited to dive in. But I wanted to make sure you got a chance to real quick give everyone a quick intro if they haven't seen your channel before on what you do and,
02:56yeah, what you're up to. Yeah. Sounds good.
02:58You know, before I give an intro though, I kinda wanna share something a little bit about what you're talking about, like, we first met. It's funny because I I actually remember. I had, um, about 50,000 subscribers when Nate first reached out to me, and he had, like, 10,000.
03:11And now it's a little bit different. I have, like, 200,000. You're you're almost 800,000 outright.
03:15Like, it's pretty crazy. It's been really cool to see you grow, how fast you've grown. But, yeah, we're both, like, smaller channels at the time.
03:22So, yeah, it's it's been a long time, a wild journey. But, yeah, anyway, as far as what I actually do, so like Nate said, I come from a software engineering background. So I've been an engineer my entire life ever since I was eight years old, actually.
03:35I I started with this language called Scratch. It's developed by MIT. So I was just, like, building video games as a kid, like Super Mario Bros.
03:42And Pokemon, like, really cliche stuff. Um, but that that's what got me into the world of coding. And so I took that through high school, college, got my bachelor's in computer science, And, um, then I had just, a software engineering job in a Fortune 500 company.
03:56And it was great, but I always wanted to be an entrepreneur. And so when generative AI started really become a big thing at the end of twenty twenty two with the release of ChatGPT, you know, and it took the world by storm, that's when I knew, like, okay.
04:10This is where I wanna go all in because there's, like, a really big opportunity for a software engineer specifically to build agentic applications. And so I started doing a lot of that, like, for my company and for friends with their startups, pretty much dedicating all day and all night to it for a very long time, like, over a year.
04:26And so it got to the point well, I know a year might not feel like a long time, but in the AI space, a year is a long time. Oh, yeah. So it it got to the point where, like, okay.
04:33I got some things to teach people. So that's why I started my YouTube channel. So, originally, it was, like, really, really technical.
04:39Like, I was there, like, writing line by line. I wasn't even using AI coding assistance back then, just showing how to build AI agents with, like, you know, lane chain and lane graph at the time. And now that's evolved to a lot of different things.
04:51Like, I I do a lot of, like, focusing on AI coding assistance, which is why we're talking about that today. And, yeah, I quit my my full time job, like, three months after starting my YouTube channel, which I think is about the same for you, Nate. Yeah.
05:03Because it's crazy, like, how fast when you when you do it right and and you're teaching people valuable things, like, how fast the channel can explode. And so now now what I'm up to is I have my AI community similar to Nate where I've got course content and weekly workshops that I do. I've also been doing some more enterprise level training.
05:19So coming into a team and doing, like, a four hour session, helping them adopt a full system for using AI coding assistance so they can really have as, a standard for the team. Mhmm. You know, get away from vibe coding to really have a structured approach and helping them actually bring that into their existing processes and tech stack and things like that.
05:37So that's been pretty awesome. And so, like, really like that and everything I teach in the community,
05:42I'm bringing a lot of that here to, uh, what we're gonna be chatting about today. A 100%. Real quick, guys.
05:47Quick break to tell you about today's sponsor, ClickUp. ClickUp is the software to replace all software, which I think is pretty funny, but very true. If you guys have been following me for a while, you know that I've been using ClickUp for a long, long time.
05:58Everything that I do with my team lives in ClickUp, all of our communication, all of our project management, all of our chats, and everything I was doing with my clients back when I was running the agency day to day, we were also inviting them to a ClickUp. So it had replaced Slack for us, and it had also replaced our project management tools.
06:11So if you're already using ClickUp, you have to try this new feature called Brain two. But if you don't use ClickUp already, then Brain two is an amazing reason to try out ClickUp. It's kind of like a supercomputer that can do a ton of cool stuff, and I'll talk about in a sec, they have super agents in here.
06:24But you can switch between the different chat models that you probably already use and love. Right here, you can see that I've used Brain myself to look through everything that's going on in our projects and then create me a monthly presentation for the team. So what that could look like is me asking Brain to create an investor presentation pitch deck for our text to speech startup called Glido, and I told it to just use mock data, but make sure that it's professional and engaging.
06:44And just like that, we have the deck, which I can open up full screen right here. We've got the voice AI platform that makes every brand sound human. And as I start to navigate through here, can see that we also have animations in here.
06:53So it's not just a static, you know, slide deck. We get to actually go through and we feel the animations. And think about the fact that this was just a one sentence prompt.
07:01If we really started to put more and more data into this thing, it would be really, really solid. And this right here is just one of the many use cases of brain two. So it's not just a chatbot, like I said.
07:11It can do things and you can build your own super agents in here. And what I think is really cool about the super agents is they're twenty four seven agents. You can tag them in ClickUp.
07:18You know, you can at message them, and they'll wake up and respond to you, and they can search through everything, which is why, in my opinion, it's a lot cooler that ClickUp is doing this compared to something like chucking an OpenClaw or Hermes agent into ClickUp because these agents already have full context and can search through everything.
07:33So right now, because you're watching this video, you can claim the super awesome offer that is on screen right now by using the link in the description. Now let's get back to the video. Yeah.
07:41Well, I am just I'm so glad that that we both took the leap because it's it's, you know, it's not an easy decision, but your brain just gets it. And so it's been great to see, you know, the consistency in what you've been up to.
07:53But I think that if you think back to, I don't know, five, ten years ago when people were going out to get their, you know, CS degrees and stuff, it's like that was such a safe bet at the time. You know? And I don't think a lot of people were predicting how much how quick that was gonna flip as far as, like, you know, that graphic of what is AI being applied to.
08:13And right now, it's just majority is coding and software engineering. And, obviously, everything's gonna catch up, but it's just great that you were able to, you know, make that pivot and be ahead of the curve, and now now here we are.
08:23So being able to us have this conversation, one of us coming from, like, a nontechnical background completely and one of us coming from a technical background, it's gonna be really cool. So, yeah, let's just jump right in.
08:33Yeah. Sounds good. Cool.
08:35So for for what I have prepared for today,
08:38um, you'll see, like, you'll see it shine through that I come from a technical background. Mhmm. But but really what it comes down to is, like, I'm gonna bring these concepts into using Cloud Code for far more than just coding, like I alluded to at the start.
08:52And so I think, um, you know, for me, like, I I really enjoy leaning on my technical expertise because a lot of the ways that you'll use an AI coding assistant for your ops, your AIOS, your second brain, whatever you wanna call it, it you are going to be borrowing from software engineering principles whether you realize it or not.
09:11So a lot of times just as you learn how to use these tools effectively and you're just learning best practices from Nate's YouTube or Anthropix blog or Boris Churney or whoever, like, they're bringing software engineering principles and a lot of, like, product management manager principles as well. And so, yeah, like, some of the examples that I have here, um, that will cover, like, they're a little technical, but that's really just, like, to illustrate how how I started using this tool.
09:36And then, of course, I'll, like, generalize things a lot as well and, um, give some specific examples too. Um, so if you if you want, Nate, I can just, like, dive right into Yeah. The first part that we have here.
09:46Okay. Cool. Yeah.
09:47So I I got, like, just quick over I mean, we'll go pretty quick through this because I I wanna keep this pretty casual. I know you you do as well, Nate, but just, like, a few different pillars here of how we can go from simply using clog code to what a lot of people call vibe coding, you know, prompting and praying where you're you're pulling that lever like a slot machine, getting to the point where we're really directing it and having that system for reliable and repeatable results.
10:11And, um, it really can be simpler than you would think. Right? Like, most of what people do that you really shouldn't do is you throw in a request and you don't do much of the planning upfront or the validation after.
10:25Like, those are the two things that I really wanna talk about here. And that applies to writing any code or any kind of application.
10:32It applies to evolving your system, like, as you're creating skills and integrations for Cloud Code or even just using it to automate things in your business. Um, and so, yeah, the approach is you always want to plan with context, build out that thing that you're looking to do, and then have an approach for verifying, like, as high level as I can possibly keep it.
10:52And then the other, like, kinda golden nugget here is every time you go through this loop with Claude code, any kind of agentic workflow or thing that you're building, there's always gonna be an opportunity at the end to evolve your system. And we'll talk about what that means in a little bit here, but, like, really that comes down to there's gonna be something in the way you work with Claude code that you can improve so that next time it's gonna be better.
11:16Mhmm. And I'm I'm being high level here on purpose because I'll get into some more examples. But a lot of people don't think about doing this.
11:23Right? Yeah. They kinda, like, get to the point where it's like, okay.
11:25My application works. Like, this website looks good. Or it's now able to automate creating invoices, like, whatever it is.
11:31And they're like, alright, we're done. Like, let's next time I wanna create an invoice, I'm just gonna go through the same process again. But, like, really, there are gonna be those problems that come up over time where you can engineer so that they happen less often.
11:43Right? That that system evolution is kinda what I like to call it. Yeah.
11:47So you're having you're having it learned just like you would an employee. Right? Absolutely.
11:50Yeah. My my second brain, I literally call it my cofounder. Right?
11:54So I wanted to, like, learn me better over time and how I like to work, how I want it to work as well. Mhmm. Yeah.
12:00And I think this four step kind of framework or whatever you wanna call it, it
12:05yes. When you kind of maybe look at it like this, it might feel like it's a technical software engineering thing. But if you just relate that back to the same way you would maybe, like, let's just say build a tree house.
12:14Like, you would plan that thing out first. You would draw it out. You would understand how much wood you need and where.
12:18You would get the right gear. And then once you've built it, you're not just gonna put your kids on it. You're gonna, test it.
12:23You're gonna make sure that things are gonna fall. So it's just a great way to think about it, and especially if you think about some of the the pitfalls that these models have with, like, the sycophancy essentially just being a yes man.
12:35If you say, hey. You know, I wanna do this. Does that look good?
12:38And they're just gonna say, yeah. It does without Exactly. Yeah.
12:40Looking over the plan. And then Yeah. On the verification side, you know, sometimes they do tell you something's done, but it's not.
12:46So having your own method of doing that as well, really important. Yeah.
12:50By the way, guys, I know we are diving into a ton of information in this episode. So what I did is I broke all of this down into a free resource guide that you can access for completely free by joining the free school community. The link for that is down in the description.
13:01Also, if you wanna check out some of the key moments from this episode and all future podcasts on my channel, then go ahead and check out the AI automation society YouTube channel where we're gonna be posting some of the best moments from the podcast over there. I'll link that YouTube channel in the description of this video as well.
13:16Anyways, thanks, guys. Let's get back to the podcast. Yeah.
13:18Verification really comes down to prove to me it's actually done and working. Mhmm. Right?
13:23And so, like, for any kind of coding task, that's things like unit tests and linting and, like, that's where it gets a little bit more technical. But, like, really, you can apply that to anything.
13:32Like, I this is an example that I'm gonna spoil right now. I use Clog code to generate this entire diagram. Like, I I had a feeling you did.
13:40Yep. Yeah. Yeah.
13:41Yeah. So I have I have a skill. It's my ScalaDraw diagram skill.
13:44I've I've covered it on my YouTube channel, actually. So I use it to build this whole thing. And Yep.
13:49I was gonna talk about this example a bit more right here when we really get into, like, verifying the work. But I think it's just such a good, like, non there's nothing to do with coding here. It's just creating a diagram.
13:58But as far as far as verification goes, I actually have it take the Excalidraw diagram and render a PNG. So there's, like, an integration that I built into the scale for Claude code so it can render it as an image.
14:11And as a lot of you know, like, Claude code is able to understand images incredibly well now. For, like, the last year, it's been so good at even viewing, like, like, I zoom out here, like, there's quite a bit of context. Mhmm.
14:22But, like, it can pick out the tiniest piece of text in a larger image like this. And so I have it look at that and then figure out, like, if there's any kind of, like, padding or spacing issues, like, if there's any sort of overlap. And and trust me, there was.
14:35Like, I had to iterate a couple times to build something this big. But then the the point is, like, it is able to iterate by itself. So we don't really care about the initial mess ups that it has.
14:45As long as it, like, does that by itself, we just care about that that last thing it hands back to us when it says it's done. Absolutely. This if we have this step when it says it's done, then, like, it actually is or at least it's closer.
14:55I mean, it's still probably not gonna be perfect, but you get the idea. Yeah. Yeah.
14:59A 100%. I've done something pretty similar with my video editing pipeline with the motion graphics it adds, and sometimes things would be out of bounds. But like you said, the whole idea is
15:07it's almost never gonna be a 100% on that first pass. But without the verification checks, maybe it's 65 or 70. But now you can get something that is 92 on the first pass.
15:17Right. Exactly. Yeah.
15:19Yeah. It it's it's good. So, I mean, verification, validation, whatever you wanna call it, like, that is one of the biggest things that I'm focusing on right now.
15:27For any kind of application or automation that I'm creating, I want some kind of harness for the coding agent to be able to validate its own work. Yeah.
15:36So I'd code to validate its own work. And for some things like website design, it's actually pretty easy. There's a lot of tools out there.
15:44Maybe you've heard of Playwright or Vercel's agent browser for it to really just spin up the site. Right?
15:50It can run the command to start the website, and then it can visit it just as a user would. It takes screenshots along the way to prove things to you or even just view the the UI itself. It's pretty easy.
16:01For other kinds of things that you'll build, um, it can be kinda hard to have the agent really verify its own work effectively. One, like, really simple example, kinda silly example, um, I in my spare time, like, I've I've always loved, like, video games as a kid.
16:15I mean, like, talked about with Scratch. I mean, I was building, like, Pokemon and and, uh, Mario Bros and stuff. And so, like, I've actually, like, been doing a little bit of just trying to I mean, I hate to admit it, but vibe code video games.
16:26Right? It's just a hobby. I'm not trying to, like, do some something too crazy and it's more just like having it run-in the background for fun.
16:31But, like, one of the things I had to think about is, like, how do I build a harness for the coding agent to be able to actually play the video game? Right.
16:38It's a bit trickier because they can't like, coding agents, they need time to think. Right? So if you have a game that's running at 60 frames per second, it's not really gonna be able to react to things the way that a human would.
16:48So thinking about a system where it can basically, like, slow down the frame rate. I know it's kind of like a silly example, but it's just like that's one of the biggest things you have to engineer for for anything is, like, how would the agent actually verify that as a user would? Because just, like, looking at the code it creates or the skill it builds for you, like, that's not enough for it to just do that sort of, like, review high level review, which is good, but, like, you need a way for it to really, like, use the application or whatever you're making as you would.
17:15Yeah. Absolutely. And real quick for anyone that might not have heard the term harness before, what is your kind of quick definition of that?
17:22Yeah. No. That's good.
17:23Actually, she called me out. Gets it gets technical. Right.
17:26Yeah. So usually when people talk about harnesses, they're talking about something more like what I was gonna talk about a bit here at the end.
17:35So what I'm talking about as far as, like, validation is more of, like I mean, it's it's kind of I I have to think about, like, how to actually explain what a harness is. Really, it's it's the wrapper around the large language model, the tools and context that it has access to so it knows what it's working on and how to work on it effectively.
17:57So if we think of, like, a harness for AI coding, Claude code is actually a harness. Right? Like, when you download Claude code and you run it, it loads a system prompt on top of Claude as a large language model.
18:09It gives it the tools so it can run commands and create files on your computer. Um, that's what really makes it a harness. And and then when I was giving the example of, like, a harness for testing, it's more like, uh, giving it a system where it's like, okay.
18:23These are the commands I can run to start the game and then, like, slow down the frame rate so that I can interact with it frame by frame and, like, really stop and analyze and think before I take another action. So it you can think of it kind of like a so I I mean, maybe I will just jump ahead here. You can think of the harness as the thing that just wraps the model.
18:40And then there's also that that component of the harness that you get to build yourself. I call it the AI layer. And so for Claude code, that's like your claude.md and your skills and your hooks and any kind of MCP servers that you're bringing in to connect it to your other platforms, like your CRM or your task management software.
18:57Right? That's that's building on top of the harness. Mhmm.
18:59So it's kind of like the large language model is the reasoning. It's it's the brain at the center. And then you pick the tool, like Clog code or Codex or whatever, and then you can sort of, like, build the context and integrations on top.
19:12Absolutely.
19:13I love it. Yeah. Well said.
19:14I think something something fun anyone listening should try real quick is if you go to an AI model and ask it to explain an AI harness or an agent harness, I would be willing to bet it does the whole car analogy where the engine is the AI model and the car is the harness. So let me know if you guys run that and and see if that's what you get.
19:32Sounds good. I mean, we could we could test it right now. No.
19:36No. We won't don't need to do it right now. But, yeah, that's that's your homework for today.
19:39Yeah. Yeah. Yeah.
19:42Cool. Yeah. So we're I mean, we're so we've talked about, like, validation a lot.
19:47Planning is the other thing that I really wanna hit on because most people don't do enough of it. Mhmm. And it takes it takes patience.
19:54And this is like one of those, um, software engineering disciplines that I like to bring into, um, even when I'm talking to someone who's not writing code or who isn't technical. Is you have to spend I mean, with coding agents, you spend more time planning than you actually do building because you you really put a lot of your effort up front into the plane and then you use that to delegate as much of the coding as you possibly can or for a lot of us, of the coding to the AI coding assistant.
20:20And so its success is really just dependent on how good is your plan. Usually, you have some kind of like, lot of people like using markdown. Right?
20:26I use markdown a lot. So I have, a single markdown document that outlines, you know, like, goal, what are we building here, what does success actually look like.
20:35And, like, of course, with that comes the validation strategy, um, that we've already talked about. So how does it know that, uh, the work is done and working well?
20:43And then, um, not to get, like, too technical here, but especially more for any kind of, like, coding task, you're gonna have, like, the integration points. Right? Like, if you're building on top of an existing automation or application or website, whatever, like, what are the parts of the code base that we actually have to touch?
20:58And so if you are more technical, you can sort of evaluate, like, make sure it's understanding is correct of, okay, what files are we really gonna create and edit here? Not that you need that.
21:09Um, and then once you have that plan, then this is kind of what my workflow looks like. And then this is for anything. So you you do some kind of, like, context loading up front, any sorts of, like, documents that your agent needs related to the task at hand.
21:23And then I'll typically have it do some kind of research, usually using sub agents for that. So if I'm building a new application, maybe I'll have one sub agent research. What's a good tech stack for this?
21:34What's a good, like, approach if there are people that have built similar applications? Right? So, like, especially if you're not as technical, that can be really useful for it to just gather a lot of information and then propose a plan to you.
21:46And so that's when you you create the plan with the coding agent. This is also where usually you wanna have the coding agent ask you a lot of questions. Like, I know, Nate, you just put out a video today on Matt Pocock's grill me skill, which is really good.
21:59Like, you need to make sure that you that the coding agent is not assuming a ton of things about what you want it to do. Like, the workflow you want to build, the skill you want to build, whatever. And so having it ask you a lot of questions to clarify those things is good.
22:12So that way, you can be confident that once you have that final plan, like, is about this is what we're gonna go and do now, that, uh, both of you and the coding agent are aligned on, what's actually gonna be done, and and how you're gonna validate it.
22:25Absolutely. Yeah. I love it.
22:26When you do that, are you typically using in ClogCode plan mode, or are you kind of planning but not in plan mode?
22:33Yeah. Usually, I don't use plan mode. Okay.
22:35It's it's good, but plan mode, like, puts Claude Cohen into a bit of a different behavior that I'd rather be able to control my control more myself. So Yeah.
22:44My skill for planning is, like, instructions for how I want to ask me questions and then just, like, generally how I want to go about researching and organizing things into a plan. Yeah.
22:54And so, like, I wanna define the sections. If you don't, then you're just using Clodcodes plan mode.
23:00Like, it'll build something actually pretty much like this, but I just like having that more, that that higher level of control.
23:07I think that's a theme that you get a lot through my content in general is that I I'd like to have control and customizability because in the end, that's how you get the best results. It's just it's kinda like that learning curve to get to the point.
23:19Like, for example, I I don't use OpenClaw or Hermes. I have my own second brain that literally is just built directly on top of Claw and Code. And I'm a big proponent of that even though those other open source tools are very powerful because you're running something that you don't understand and it's harder for you to, like, really take as your own.
23:36And it's not like a foundational component that you can create your own system on top of. So you're more like adopting someone else's system. And these tools have done a really, really good job making it easy to extend and and really make your own.
23:49But, like, in the end, building something from the ground up is always gonna give you the most control even though that can be pretty daunting. Yeah. I hear you.
23:57Yeah. That's interesting. I mean, it it really does make sense.
23:59I always love
24:01you know, that's something I just say a lot, which is a very simple theory is just to be genuinely curious, to understand what's going on, especially when I don't understand what these lines of Python code that it that just got written mean, you know, and the whole idea of dark code.
24:18And I guess, what do you think about that whole idea? Because I know you talk a lot about vibe coding and and preaching, understanding things at their core. So when someone is generating automations or code that they don't understand how to read Yeah.
24:33How do they actually feel secure and safe about that?
24:37Yeah. That's a really good question. So Pretty loaded too.
24:40No. I'm not. That's that's good.
24:42I I welcome it. So I'll I'll answer in two ways. I'll answer first by saying that, like, maybe not everyone loves to hear this, but, like, if you are using AI coding assistant to write code because you're building your second brain, you're creating automations, whatever it is, I would recommend at least trying to get to the point where you can understand the code.
25:03And really, at first, that can be as simple as just asking Claude Code or whatever coding agent to explain what it just wrote. Because code can look pretty intimidating, but when you get over that, like, initial hump, like, it kinda reads like English.
25:19And maybe that's just me being extremely ignorant because I've lived and breathed it since I was eight years old. But it starts, like, as long as you understand the core primitives of, this is a class, this is a while loop, this is a if statement, like, starts to read like English. You're like, okay.
25:32I understand when this part of the code is going to execute now. I'm just asking your coding agent constantly. And so, um, I mean, like in Cloud Code, there's the slash by the way feature.
25:42So, like, you can always just kinda have a sidecar conversation where it's like, hey, help me understand, like, what the heck is going on right here. And then it doesn't have to to dilute your main context and just kind of, like, keep throwing context at at Cloud Code.
25:55Like, you can have that separate conversation for your own understanding and then go back to the main task at hand without it being affected. So I would recommend that. And then, you know, if someone is really not inclined to learn how to code, like, that's just not your goal, you wanna use clogged code to automate things and not have to, like, engineer applications.
26:13I totally get that as well. Really comes down to your validation strategies, what's gonna dictate how confident you can really be and what is created. So if you're spending a lot of time in this is why I say, like, whenever you're building something with Cloud Code, the way that you don't vibe code is that you sandwich the delegation of the coding between the planning and the validation process that you're heavily involved with.
26:37Right? Like, the only reason I'm ever gonna say, alright, Claude. Go rip through this is because I made sure I created a really detailed spec, and I've defined, like, this is how you're going to tell me that you're done and how you can be confident that you actually are.
26:51I love it. Very well said. Nothing to appreciate there.
26:55Cool. Alright. Sounds good.
26:56Yeah.
26:58Um, Yeah. As far as, like, creating that plan with the coding agent, the most important thing is to manage the context, like, what your coding agent is going to really be paying attention to at the start of any kind of planning session.
27:13So the the thing here is that attention is scarce. And so there's a big misconception right now for a lot of people where they think that, like, it doesn't really matter how much you throw at a coding agent because everyone is hearing nowadays how, like, large language models can support up to 1,000,000 tokens in their context when they're like, oh, that that's like the Harry Potter book five times over.
27:34Yeah. I I forget the exact but people, like, always throw, like, some some analogy where it just, like, makes it pretty obvious where it's, 1,000,000 tokens is an insane amount of information.
27:44And it actually is, but there's two massive caveats here. The first one is that that context will go way faster than you think.
27:52Because if it's reading through, um, a bunch of skills that you set up for it or a bunch of code, that can be tens of hundreds of thousands of tokens very quickly.
28:01And then the other thing is, uh, large language models have what's called the dumb zone. And so you have the the little bit of context up front.
28:10Maybe I can just draw, like, a quick little analogy here. So if, like, this is oh, that is a fat marker.
28:17Hold on. Okay. I'm I I give up already.
28:19I'm not gonna try that. Okay. So have to imagine this with me here.
28:23But imagine you have a box that represents the the LLM's context window.
28:28You have that initial part at the start of the conversation up to the first, you know, 100 or 200,000 tokens where the large language model feels very sharp or at least it feels like it's at its best. Once the conversation surpasses that first 100, 200,000 tokens, obviously, it, uh, depends on the model. When you reach the dumb zone, you get to the point where it just feels like it's overloaded with information and it starts missing things and making mistakes that seem so obvious to you.
28:55Or like the kind of thing where you're like, if I had a fresh context here, like, there's no way it would have made that mistake. Like, it writes a really bad line of code or it, uh, doesn't use a skill that you've thought it should have known to use. Right?
29:09Like that kind of thing if it's in the middle of a larger workflow. And so that that's why I say attention is scarce. Like, don't don't get to under that false notion that you don't really have to care about how much you give it.
29:20Mhmm. Like, if you're trying to have it handle a larger workflow, you still have to you have to be very careful, like, what you give it up front versus what you allow it to discover when it actually needs.
29:30And, like, that's one of the most important things with skills with Claude is you're giving it procedures and best practices, but it gets to decide, like, okay. Now I need to rely on this process or this information you're not just dumping a bunch of things up front.
29:44A lot of people do that, like, even with MCP servers back in the day. They would they would connect their, like, 20 MCP servers to clon code, and each one of them was was, uh, filling the context with, like, 20,000 tokens up front of information because it has, like, all the tool calls or the tools that come with the MCP server.
30:03And so their large language model would always act super dumb. And so they're like, I'm using the latest Opus. Like, why am I getting terrible results?
30:11And it's it really comes down to just how much of the context is filled right away. Yeah. Oh my gosh.
30:16It drives me nuts. It it truly drives me crazy when you hear
30:21people blaming the model when it really is kind of a skills problem. And we see this at you know, when you look at these studies and surveys too about business adoption Mhmm.
30:32Where it really is these people either have not yet felt the ROI because they can't they don't know enough about how to use it truly.
30:42Right. And also people claiming that they have the skills too, but they're just not doing it.
30:47And, like, the adoption is then another problem. But, I mean, obviously, I'm not doing heavy heavy coding, building software, and and apps.
30:56But, you know, we're doing some pretty cool things, and I've seen some people do some really awesome things. And it's just Yeah. There's a lot of things.
31:01Like, you know, if you kinda think about your your diagram that you had, you get the model in the middle, you got the agent harness around that, and then, obviously, a huge layer is what you put in there as well and the way that you manage your stuff. And I think that the 1,000,000 context window specifically for, you know, like, let's just say, Opus 4.8 at the moment.
31:18Obviously, it's great, but it definitely comes with a false sense of security of people now thinking that they have the million. But when and I know this might be outdated by next month or two months away.
31:30But let's say right now when you're in Cloud Code, when do you typically do your compact or a session handoff and clear, and when do you get out of there?
31:40Yeah. So with Opus right now, it's usually around 250,000 tokens and I feel like it gets into the dump.
31:47That's my exact number too. Oh, really? Okay.
31:49Yeah. Yeah. Yeah.
31:49Good. Cool. So and that, by the way, is, like, really subjective.
31:53Like, I'm not gonna bet a million dollars on on, like, the on Boris Churney or someone saying, like, you know, it's it's also 250,000, like Quarter million is clean.
32:02Right? Yeah. It just it's it sounds good, and it is, like, pretty accurate, I would say.
32:06Like, Opus 4.7 was around, like, 200,000. And then, like, Sonnet 4.6 is, like, honestly, probably only, like, a 100 to a 125,000.
32:16Like, it as you go to these smaller models, like, the dumb zone becomes a pretty small
32:22amount of context relative to, like, what it theoretically can handle. You just never wanna get to that point. So then with the dumb zone thing, I've also heard stuff about the model being really good at remembering things that are at the front and the very end, and the middle is where it loses.
32:36So where does that play into the whole dumb zone conversation?
32:39Yeah. So, basically, that issue is just amplified the more you get into the dumb zone. Yeah.
32:45And, yeah, as far as, like I mean, we don't have to get into, like, the super technical details for how the attention mechanism works for LLMs. But, yeah, you can think of I mean, like, the analogy I always like to use is the needle in the haystack problem.
32:55Yeah. Like, if you have that, like, little piece of information that you want the agent to remember in the middle of a massive conversation, it's like trying to find a needle in the haystack.
33:04Like, can't expect the model to just because of the way that large language models are engineered, um, you can't expect it to, like, always be able to pick out that little piece of information.
33:13100%. Yeah.
33:15Yeah. I wish you could. That would be nice.
33:17There wasn't a such thing as a dumb zone. It would make it much more convenient for us to hand it massive tasks and let it just rip through things. But Mhmm.
33:25A lot of the reason we have to create a harness and, like, a lot of the things I'm focusing on right now on my channel and just, like, generally what I'm building is creating harnesses that build a workflow that combine multiple coding agent sessions together.
33:40And so, basically, it's like one model does the planning, and then my orchestrator will, like, automatically take that handoff document, like the plan, and then feed it into another agent for implementation. And then when the implementation is done, it'll create, like, an execution report, and then it'll hand that off to the next agent to validate things and do a code review.
33:59And it might sound like, uh, like, that's a lot of engineering, and it is, but it's very necessary right now. Because if you're trying to do any kind of, like, real work for, like, production grade software or building an automation that's, like, critical for your business, you can't just throw the whole thing at a single Cloud Code session unless you can, like, confidently build it in that, um, that zone that you have before you get to the dumb zone.
34:21And most of the time, you just can't do that. Mhmm. Or at least you can't really trust that's gonna be the case because you never know how much it's gonna have to iterate on something.
34:27Mhmm. So that's why I'm really, like, I guess you could say bullish right now on, um, harness engineering, which is like building a the workflow that, uh, orchestrates many coding agent sessions to handle a larger task.
34:41And, like, a really basic example of that kind of harness is the Ralph loop. It went, like, super viral at the start of this year. Mhmm.
34:48Um, so I feel I feel like even if you haven't heard too much about harness engineering, you probably have at least heard of the Ralph loop. And that's, like, really, like, the foundation of that kind of harness. Right?
34:57Like, the Ralph loop is stringing together multiple coding agent sessions. I I wish I had one of my diagrams up for this right now. I'll just have to explain it verbally.
35:05But, like, you know, basically, you have the first Claude code session read in your larger spec for, like, a bigger automation you want to build, and then, um, it'll define, like, the the task list. Like, first phase is this, second phase is this.
35:20And then it'll have many coding agents handle one phase at a time, but it'll, like, do it all automatically in a loop. That's why it's called a Ralph loop. Because, like, agent one will do phase one, and it'll write up its little report, like, it's hand off to the second agent that'll continue the work.
35:34And, like, the main reason the Ralph loop matters is because you can't have one agent handle that larger task without getting into the dumb zone and, like, you know, halfway through phase two.
35:45Mhmm. Right? Like, you have to break things up.
35:47Yeah. So it sounds like from, like, a
35:50a high level view, the idea or kind of the mindset that you've got, like, this assembly line, and you have an agent doing something. Each agent kind of does one thing really well and Right.
36:01Hands over their input to the next agent in a way where the agent has enough context to understand what has been done and what is left to do and what its current job is.
36:11Yeah. Exactly. Yeah.
36:13Assembly line is a a really good analogy. And, um, I mean, that that applies to a lot more than than just writing code. Um, like like, one example that comes to mind when I think about, like because I I know that I've been talking about, like, coding as an example for a lot of things.
36:29But I I work with a lot of companies that are in sort of, like, the, like, b to b side of things. And when you're b to b, like, you do a lot of, um, creating quotes, like estimates.
36:42Right? Like, you have, um, construction company or, uh, like, I work with companies in the print industry where, like, they'll have, like, a request for, alright. Make me, like, a 100,000 flyers or whatever.
36:52And, like, for those companies, one of the biggest opportunities for them to use AI is to use something like Claude to help them take in a request and automatically create an estimate, like a quote for how much that, uh, job's gonna cost. Because that's, like, a really, really laborious job, like more than you would think.
37:11Like, when I when I've talked to these companies, like, it's crazy how much work goes into that because they have to, like, take the request and they have to understand, like, how much labor goes in this, you know, parts, obviously, like, depending on the industry. And they have to do research on, like, the latest prices for things and making sure they're getting it from the right vendor.
37:27Like, there is so much that goes into that. And so, like, that kind of thing, uh, it's it's, like, a really good example. Like, nothing to do with creating code.
37:34It's still using something like Claude code because you can use coding agents for this to go through that larger workflow of, like, looking at their inventory, looking at prices, comparing vendors, um, all based on what's gonna be needed to accomplish that task, like that remodel, that the 100,000 flyers for whatever that request is from the other company and then creating that estimate.
37:57And then understanding how the company works, like what kind of padding they want on top of, um, based on the, uh, the labor and the cost of for the parts or whatever. Like, there's a lot that goes into that. And And so, like, that's the kind of thing where, like, you'd build a workflow where you have one agent that's going to research inventory, one agent that's going to look at, uh, prices and and compare prices for parts, and then one agent that's going to draft the PDF, and then maybe another one that's gonna make it look good.
38:21I mean, I'm kinda stretching the example here, but you get the idea of, you you actually don't have just one agent handle the entire thing for something that big. And you are gonna be doing a lot of planning. Mhmm.
38:30Right? Like, you're gonna plan. You're going to have a validation at the end.
38:34So, like, what kind of calculations can I do at the end to make sure that, like, this job, uh, has the the margin that we want on it, for example? Yeah. Yeah.
38:42And I think
38:43I think back to one of our biggest failures back when I was still kind of in the day to day of running the agency was that exact use case, was having to look through tons and tons of examples, past quotes, past client work, past proposals, and and needing to generate these quotes with so many different factors that go into it.
39:04And that was one of our biggest failures because me, personally, I under scoped that build. And we went into it not realizing how much actually is necessary to get to an accurate quote.
39:15So that was a great lesson for me to learn, not only about the importance of asking enough questions and scoping, but just in the way that you split up the work.
39:25And I think, you know, obviously, Cole mentioned he's he's talked a lot of these examples have been kind of around coding, but I don't really do much coding. I mean, at the end of day, these automations are code.
39:34So, yes, it's coding. Yeah. But I'm not doing, like, software.
39:38I'm not building products, but every one of these theories that we talked about in these mindsets and frameworks has you know, directly applies to the knowledge work is kinda what I like to call it of of what I do on the day to day and what probably most of you guys need to do that gives you an insane amount of leverage right away in Cloud Code.
39:53And I think that when you think about your job or you think about some of your responsibilities, it's not just one responsibility.
40:02It is you can drill that down into so many little subtasks. Like Cole just said, like, agent does the research, one agent does the PDF generation, all these little strings of subtasks that flow up together to actually make the overall responsibility, which might be 10 little tasks that get strung together.
40:20So when you can actually break down a process by just writing it down or or, you know, flowing it out on on a piece of paper, it makes things a lot more clear.
40:30Right. Yeah. Yeah.
40:32And and one thing I wanna say here is that a lot of people, they wanna simplify it down to just using sub agents. So, like, for this this larger workflow, what if I just have my main clog code dish out a bunch of tasks to sub agents? And, like, that can work for some things.
40:47I do love using sub agents, especially when I'm initially planning any kind of automation or or, uh, application, but it's hard to really make those communicate well with each other. Like, we've talked a lot about handoffs here.
41:00A lot of times, one agent, when it's taking that next step in a workflow, it has to understand the work that was done with the by the previous one. Whether that's work, you know, actually writing code or if it's just doing research or if it's pulling information from your CRM, for example. Like, it has to have that kinda hand off document.
41:16And it's really difficult to, um, do that well with sub agents.
41:21Claude Claude has tried their hand at doing something with agent teams. So they they that's kinda like the step above sub agents where they can really communicate with each other.
41:30But, uh, that is, like, really unrefined. It's a really good idea, but it's really unrefined and it's very expensive, like token heavy. Yeah.
41:36And so yeah. Like and that's actually what I'm working on. So there's a open source project that I'm working on called Arkon.
41:43And that's really the problem it's solving is how can we more like, the word I use is deterministic. Like, how can we build the AI model, like, build clog code into a system instead of having clog code trying to orchestrate everything?
41:56Because that's when it becomes difficult for communication and everything becomes very token heavy. Right?
42:01So, like, the the way that I like to put it is we want to, um, pick when the AI model works in a workflow instead of having it drive the whole thing. Mhmm.
42:12Yeah. Yeah. How do you make such an autonomous nondeterministic
42:17system as deterministic as possible? Pretty much. Yep.
42:20Yep. As deterministic as possible. I wish I could say make it deterministic.
42:23Yeah. That's never gonna happen, unfortunately. That is fundamentally impossible.
42:28Yeah. I love it. Yeah.
42:29Completely agree with you there.
42:31Cool. Yeah. So, I mean, really, we we've talked about most of other things I have in the diagram here.
42:37Like, we've talked about verification, making sure that it's able to check its own work. And, um, yeah.
42:44I mean, like, the the main thing here is we don't really care about what it does. It's for on its first pass.
42:49If we build a system where it's able to iterate, that's all we really care about as long as it doesn't take billions of tokens to get to that final stage. But, like, when I'm whenever I'm using Cloud Code for something, I'm never optimizing for speed.
43:03I mean, at least, like, I don't want it to be unrealistically slow. But any kind of task I have for it, I don't really care if it's something that I have to, uh, have it work through for a half hour or an hour and a half. Like, I'll send off that request, and it'll just go to another cloud code session for whatever else I have to work on.
43:19Or I'll do something, believe it or not, without an agent for a little bit. Like, if I have to, uh, record a video. Well, I mean, maybe I'm using an agent in the video, but you get the point.
43:28But, anyway, like, the the point is that I don't really care how long it takes because I just care getting the best results possible. Mhmm. Um, and so, yeah, that's why, like, I I spend a lot of time engineering systems for coding agents to check their own work, whether it's browser automation for a website or the silly example I gave earlier, like a way for it to sort of, like, play a video game as a human would.
43:49And that's, like, a really fascinating problem for me to solve right now. It's just like that verification layer at the end for a coding agent, which, um, also extends to things like security as well.
43:59And so, like, that's not something as interesting to talk about right now. But, like, security is pretty important to me. It's something that, um, vibe coders get very burned for.
44:07I mean, you hear those horror stories, like, at least once a month of, uh, you know, like, their super base private or, um, secret key getting leaked in their, uh, JavaScript files and things like that because they're just completely vibe coding. Like, I mean, that's, like, the simplest example.
44:20But, yeah, like, that kind of part of verification is really important as well.
44:25Yeah. And on that whole element of security and Mhmm.
44:30What could go wrong, when you think about sort of, like, the permission layer that you're putting around your agents, I see a lot of false sense of security once again, where people think that their prompts are a good enough permission layer when really that permission layer needs to be scoped keys or you actually can't touch this at all.
44:51Because I think I was talking to my team, and we kind of got to this conclusion of if you have the mindset that anything that the agent can read or can touch, it will.
45:03Like, you have to assume that it will. Even if you never ask it to, that assumption is what's gonna save you from having your database deleted.
45:09Yes. And that that's funny you bring up that example specifically because I was just about to say, like, if you tell it never to to wipe a database, it's still gonna do that. Mhmm.
45:19Like, there was a a story that went viral, like, a month or two ago. Was someone, like, really high up in Meta that had their database wiped.
45:27I'm still not convinced that's real. I feel like they might have been I don't know. Because people get so much attention when they have stupid stories like that.
45:33But but conspiracy theories with coal. Right. Yeah.
45:36Right. Yeah. But, like, it is it is definitely possible.
45:40And I do know some stories of that actually happening just to a smaller extent. It just feels so weird that or it sounds so stupid that it's like their actual production database was wiped. But, I mean, even if you have a test database wiped, it can still be a bummer if that slows you down a lot.
45:53And so so, yeah, it is super important. You never want to assume that just because you tell an agent to not do something, it never will. I mean, it's the same thing, like, if you tell a kid
46:03to not do something, they just might not listen. I mean, even even adults. There actually, recently something did happen to us, which is kinda why we started talking about this.
46:12Okay. We had this this incident where the agent had the right intentions. It was trying to be proactive, and it actually saw something on its task list, but it misinterpreted it.
46:22And it ended up sending an email to our entire list with a discount code, and it was, like, not supposed to go out. So we had to, like, change the code, update the page.
46:33We emailed out an apology. So if you guys are on the email list and you got that, that's what happened. But it's just like, know, I wasn't mad at the person who was kind of responsible for the agent.
46:41It was just a really good opportunity for us to think about, okay, why did this happen? And, you know, she wrote up a case study. We sent it to the whole team, and everyone was like, okay.
46:49That's a really, really good reminder of how careful you have to be. Because, you know, if you connect to an MCP server and you don't limit the permissions, it has everything. You know?
46:58Yeah.
46:58Yep. No. That's good.
47:00Yeah. The the main way that I restrict actions from my coding agent is with hooks.
47:05So, like, clog code hooks is a really good way because, basically, a hook and clog code is a little piece of code that you can run whenever a certain event happens in the tool.
47:16So whenever you start a session, whenever you end a session right before clawed code uses a tool, you can run some kind of code that does a security check. I mean, there's a lot of other things, like, I love using hooks for security. And so what you can do in Claude is every time it's about to invoke a tool, like it wants to write out to a file or make some request to the web, you can, uh, check against that command to make sure it's not trying to mess with a folder you don't want it to touch or run some kind of command you don't want it to run.
47:46And there's a lot of different ways you can check for that that we don't have to get into right now. Um, but that's, like, one of my favorite ways to make sure it's, like, not reading my environment variables or it's not running a a delete command for a database.
47:58Mhmm. And it it it's really hard to make sure you're you're covering all the loopholes because there's a lot of things that coding agents can do Yeah.
48:07To get around those kinda checks as well. Mhmm. A lot of people have false false sense of security around that as well.
48:12So you kinda have that, like, first false sense of security where it's like, well, I told it to never delete my database. And then you have the second level where it's like, I block all delete SQL statements. But then there's that third level that you have to, like, make sure your engineering for, like, for example, a coding agent.
48:27If you if you don't allow it to call the, like, delete, like, remove command to delete a folder, it can still write a script to do that. So it just has to do two steps, like, write the script and then run the script.
48:39And then it's still able to remove a file or folder on your computer. So it's I mean, they're less likely to do that. So it's still, like, you're getting there if you are at least have, like, that that second false sense of security.
48:51But, like, you gotta be really safe. You gotta like, it's it's actually a tough problem to solve. Yeah.
48:55And then AGI is it's it's scary.
48:58Yeah. But I would love to see and maybe you've already got one out, but I would love to see a Cole Hooks masterclass because I actually just recorded one. And Okay.
49:08I don't use hooks that much, to be honest. Like, I really don't. I think my my main hook that I have is just to give me a noise notification when it's done or when it needs me.
49:16But, yeah, like, I have underutilized hooks for sure. And I'm not sure if that's because they're mainly valuable when you're doing heavy coding, but I would assume that there's a lot of things that I could be doing in my day to day where hooks would be really good.
49:31And I need to definitely look into a little bit more how I can be utilizing them. But, anyways
49:37Yeah. Yeah. I I definitely should do a master class on hooks because there's a lot of ways that I use them.
49:43Yeah. Since we're on the topic, like, one of the really interesting way to use hooks is you can use them to automatically suggest like, you can have Clogco, like, automatically suggest ways to improve your AI layer, like, make your rules better, make your skills better.
49:59And Oh, interesting. A lot a lot of tools like Hermes and OpenClaw, they kind of do this. I don't think they, like, explicitly use hooks.
50:06But, like, OpenClaw, for example, every, like, ten, twenty turns, I think it's configurable, it will, like, kind of compact your conversation and store it as a memory. Right?
50:16So that you have, like, the whole, like, daily log thing with the, uh, memory dot m d file. Like, all of that comes from what's essentially a ClaudeCode hook. Like, so with the way I use Claude code with my second brain is, uh, every time I have a memory compaction, which I try to avoid those, I don't wanna get that far in a conversation, um, or I end a session, it automatically creates a summary of the conversation, puts that in a daily log, and then I have a process every day.
50:41It's basically like Cloud Code Dreaming, where it's gonna look at the daily log and then extract any, like, really important things to store and sort of, like, promote to my primary memory file. Like, here are the decisions that I've made recently or, like, things that I'm actively working on and we're at them.
50:56So, like, hook hooks actually drives the whole thing. Like, this terminal that keep that this is, the second time it's popped up. Uh-huh.
51:01That's actually a hook that just fired there. So I'm I'm just, like, testing some other things. I forgot to turn it off, which is unfortunate, but actually now it made for a good illustration.
51:09It's great for I had a hook run as I'm talking about it. So I'm just testing something else right now. Yeah.
51:15Just just yet another way to make these nondeterministic
51:18things as deterministic as possible. So what do we have next after this verify the work section?
51:24Yeah. Yeah. So, really, this is the last thing.
51:26So we've already talked about the the harness. But the the last and, honestly, probably the most important thing is the system evolution that I talked about just a little bit earlier. And and really the the mindset here is what like, I this is I out of everything that makes it so you're really directing Clawdocode instead of just being a user of it, the building the system is the most important thing.
51:48So anytime there's an issue that comes up, instead of just fixing the issue and moving on, it's an opportunity for you to, with the help of the coding agent, with the help of Claude Code, figuring out, like, what could we make better so that this doesn't happen again. Like, maybe there's a new rule that we can add in our claude.md or there's a new document that we can give it when we're in our planning process or there's maybe an update to our skill that we can make.
52:14And I'm being kinda general on purpose here because there's a million different ways that we can improve our system. And so this is kind of like the example that you gave, Nate, where you had the email go out or at least it went out to way more people than it should have. And so you wrote up that report of, like, here's what happened.
52:29Here's what we can do better. And so it's kind of doing that, but, like, for the agent so that going forward, it has that rule so it it doesn't do that thing anymore. Like, maybe it didn't run all the validation you wanted it to.
52:40So now you just, like, make sure that, like, that's a part of the rule where it's like this, make sure you don't forget this validation kind of as a silly example. But that way, every bug becomes a permanent upgrade.
52:51So Mhmm. Once you have this kind of system in place, you actually almost welcome bugs. Yeah.
52:55Like, I want something to go wrong because then I can make sure it never happens again.
52:59Right? Like, I almost have I almost feel kinda nervous when everything is going too well because then it's like, oh, shoot. I have no way to, like, make my agent better right now.
53:06So it's it can kinda become nice. Yeah. Because it Absolutely.
53:08Yeah. Should get better over time. I've got an interesting question for you.
53:11So Yeah. I completely agree. Every single time that you have a failure, you should look at that as data and an opportunity to improve the system.
53:18Mhmm. Now what what about before you get those failures? How do you think about, to your best to the best of your ability, finding those edge cases or predicting what edge cases might happen and trying to build in guardrails before the whole testing part.
53:35Well, it can never be perfect, which is why I lean on this so much. But, generally, when you're looking out for edge cases, um, I mean, Claude code is actually pretty good at it.
53:46It's not gonna cover, like, nearly all the edge cases. But even just asking it, like, how could this go wrong is a question that sometimes people are are honestly, like, nervous to even ask. Yeah.
53:54But it's a really good question. Once you're done with the implementation and, like, this is a part of, like, my code review skill that I have built in where it's like, ask yourself what could go wrong here, and then try to engineer a scenario where you're really testing that.
54:08Like, if I'm building an automation where I think there might be an edge case where it doesn't handle this kind of input correctly, I'm gonna as a part of my agent's code review, have it, like, create the like, it'll invoke the application with that input, like a web hook or whatever, and try to break it and see what happens.
54:25And if it does break, then, I mean, that's obviously gonna be, like, going back here, a part of our verification where, um, it'll then address that thing and then do the tests again.
54:36Right? Like, iterate. Like, find a problem, fix it, and then also retest.
54:39Right? Don't forget the retest because maybe you're fixing to actually address the problem. Mhmm.
54:44Yeah. Well, I think, you know, something that I've realized after
54:48responding to YouTube comments, q and a's in the community, chatting to you, and and just seeing what's going on when people are learning these kinds of tools is that you really at at you know, the simplest way to describe it, you just have to treat it like your best friend who is the smartest person in the world.
55:06Meaning, you know, treat it like a mentor. It's not going to laugh at you if you ask it something stupid. Yeah.
55:11You just need to be curious and you need to ask ask the questions that you are wondering in your head. And I think when you kind of get over that that idea that it can teach you anything, and it can, for the most part, especially if you ask it right, it can help you figure out the majority of your problems when you have, you know, that that sort of uneasiness because maybe you don't understand what it did.
55:30So that's like a huge mindset shift for anyone I've talked to that is, like, trying to get into it and doesn't understand it. If they you know, maybe they text me a question or they drop a question.
55:40It's just like the response can a lot of times be,
55:43have you asked Claude code that? I know. Yeah.
55:45I feel bad saying that, but, like, yeah, it comes it comes to that a lot. Yeah. Where it's like, no.
55:51You shoulda just I usually how I can be more helpful is, like, telling them what to ask exactly. Like, give it give it this link, give it that thing, and then here's how I'd ask it.
55:59But, yeah, a lot of times it does come down to that. And, I mean, you can't you can't just ask Claude for everything because of, like, the sycophancy you mentioned earlier. Sometimes if you're asking it for its opinion like, asking a large language model for its opinion is a really slippery slope.
56:15Yeah. But what we can what we can ask it for is to, like, understand how something works.
56:21Like, that's when it can do a really good job. So, like, going back to the example earlier, if you're not technical but you wanna try to actually be able to understand the automations and things that it's building for you, like, that's a really good thing to ask it because it's not gonna like, there's no sycophancy there.
56:34Right? It's not just trying to appease you. It's it's just helping you understand.
56:38The way to appease you is to explain the thing right. Yeah. So, like, that that's a really good case just trying to understand anything.
56:45And then, like, what we were talking about just here with verification, like, trying to find edge cases. If there's anything where there's, like, actual empirical data, like, there is a way to verify that, like, this this automation doesn't handle this input well. I mean, there's no room for sick of fancy there.
56:58It's like it it either works or it doesn't, and there's not really, like, any kind of gray area or opinion. Right. So if you think there might be an edge case or it thinks there might be, it can test it, and then it's it's black or white.
57:09Right.
57:10So I want to hear what you think about this because you briefly mentioned the agent teams earlier.
57:17And Yeah. I actually find myself using them quite a bit for mainly one specific use case, and I wanna hear what you think about it. So, really, the time when I reach for agent teams is when I am trying to help you know, I'm trying to decide something, but I don't wanna just ask for Clodco's opinion like you just said.
57:36Yeah. And so what I'll do a lot is I'll spin up, like, a debate panel or, a war room. Nice.
57:41And I will say, you know, like, one of you guys is a CEO, one is a beginner, one is a college student, and just like a bunch of different personas, sometimes even, like, seven. And I will just have them all do independent research, form their own opinions, and then I'll have them debate. And then I'll just be able to read the debate, and I'll be able to sometimes I'll say, like, keep debating until you all come to some sort of consensus.
58:02Nice. But I do that quite a bit, and that doesn't mean whatever the agent team spits out, I do. But sometimes it's just really great for me to read through all those opinions.
58:10But I wanna see, do you do you like that? Do you think that's a major flaw? Like, what what thoughts do you have about that?
58:15I do actually like that. Uh, I've I've never done that before. You've never done that.
58:19But you should definitely try. No. Yeah.
58:21I feel like tonight, I literally gotta try that. It's really fun. Yeah.
58:24I like that idea a lot. Because some something that I have done experimentation with that's sort of similar, I call it adversarial development where basically after a Claude code finishes building something, I'll have a separate Claude code session, um, where I prompt it specifically to play the devil's advocate.
58:41Like, I want you to be mean to the other Claude code session to, like, really make sure that it's not just being happy go lucky when there are actually some some problems that need to be surfaced. And, like, that works really well. So just generally, like, pitting large language models against each other Mhmm.
58:57Is a good idea. I wish I had tried that before. So, yeah, I'll I'll give that a shot.
59:01I think that that's that's a really good use for Asian teams. Because at that point at that point, it's like you're not relying on it to getting the perfect answer. Like, it's very token heavy and the communication's not really perfect.
59:13That's why I don't really recommend agent teams when you're trying to do, like, deep development or, like, building any kinds of, like, complex automations. But when it's more like research and just, like, forming a consensus, I I think that it does real it would do really well for that. I'll try it out.
59:26Cool. Yeah. Let me know if you try it out and what you think.
59:29But I've never Yeah. Really talked about that or made a video because
59:32I know people would go do it and then be like, you just killed my five hour limit. Right. Yeah.
59:37Unfortunately.
59:38So How much of your limit does it use when you typically do it? I mean, on the 200 a buck
59:43$200 a month plan, anywhere from, you know, 4% to to 10 sometimes, like, know It's not too bad. It's not too bad. But, you know, if you if you say something like a don't stop until everyone agrees and they just keep going, then it you could you could run into some some trouble.
59:58But to close us off here, I have to ask.
1:00:02I just did a video about my favorite features in Cloud Code. And I prefaced that whole video and basically said, this is not a list of the best features or, you know, the most used or the the most useful.
1:00:15These are basically the way that I use Cloud Code on my day to day, the ones that I like the most. And I I had, like, a a numbered list of 12 at the top. But I would love to hear from you to put you on the spot.
1:00:24If you had, a top three based on because I I'm I'm assuming we use very differently. What would you say are, like, your top three favorites?
1:00:31Yeah. So Hooks is definitely, maybe not like my favorite favorite, but probably the one that, like, most people wouldn't put in their top three.
1:00:39And that's because of what I've been doing with it for security and then the whole, um, integration with the second brain. So it's able to basically, like, extract summaries and, like, remember things over time.
1:00:49So we definitely need a Hooks Cole video. Yeah. I honestly, I should just do that next week.
1:00:53Yeah. We definitely need it. Yeah.
1:00:55Okay. Yeah. Thanks, Nate.
1:00:56Yeah. So, yeah, Hooks Hooks is definitely number one. Okay.
1:00:59And then here, because I I mentioned some of the things here. Like, mean, really, when I there's kind of two different sorts of Cloud Code features. You have, like, the components of the AI layer, like rule skills hooks, then you just have, like, general capabilities of the harness, like, um, agent teams and slash, by the way, and and dispatch, like, dynamic workflows, things like that.
1:01:21Right? It's either, like, it's something that you use or it's something that you build on top of. Sub agents would be probably, like, number two just because, like I said, there's dangers to using sub agents, but just using them to, like, sprawl out and research a ton of different things, I use it for that all of the time.
1:01:40And especially when I'm working on more complex code bases or building out larger automations, I'm using sub agents to basically, like, extract context from certain parts of my system. Right?
1:01:52Like, you're responsible for getting a grounding here of, like, how are we gonna have to mess with the front end in this application, how we're gonna have to mess with the back end. And then, honestly, probably, like, my number one so I guess, like, hooks would be two and sub agents would be three.
1:02:05Okay. Probably my my number one is skills even though it's, like, super That was mine too. Shay.
1:02:10That was mine too. Like, yeah. It's gotta be skills.
1:02:12It's just the best. Yeah. Yeah.
1:02:14Like, skills skills dictate everything. Skill I have a skill for making this diagram. Yeah.
1:02:18I have a skill for scripting my YouTube videos. I have a skill for building PowerPoints. I have a skill for, um, I mean, if could you literally Just so versatile.
1:02:27Yeah. Yeah. It's just any kind of reusable prompt, you just make it as a skill.
1:02:32Mhmm. And Cloud Code has done a really good job continuing to evolve just like the way you can parameterize things. Like, they have, like, path scope skills now, and you can, like, set if this one is to be invoked only by you or if the agent can decide to do it as well.
1:02:45And then, like, talking about, like, verification, like, getting back to here, like, having that browser automation skills so it knows how to use a CLI. Like, that's a whole another thing.
1:02:56It's, like, the the skill plus CLI combination is just really, powerful. Because, basically, any platform or tool you want your coding agent to be able to use, it's either gonna be an MCP server. And, like, those are still good.
1:03:08But, honestly, what think is even better, like, more token efficient is having a CLI so it has access to your CRM or GitHub or whatever through the CLI. And then the skill, it tells it how to use that CLI. And then more Mhmm.
1:03:21It more specifically, like, how you want it to use that. Like, how do you want to this CLI to be integrated in your workflow?
1:03:28It's like that combination, I'm leaning on that for everything. It's Like, my arc ARCON tool I was talking about earlier, like, it it is a CLI that has a skill that comes with it. So, like, if you want those more deterministic workflows where you get to pick, like, when do we have the LLM, when are we just running code, then, like, you build that as a workflow.
1:03:45And then now, Archon, with its skill and its CLI, becomes a tool that my second brain can call upon whenever it wants to dispatch one of those workflows to go handle a GitHub issue or run this automation, whatever it is.
1:03:57Very cool. I love it. Yeah.
1:03:58Love the list. Top three were skills is number one.
1:04:03Okay. Number two, I had status line. Oh, nice.
1:04:07Yeah. Love was just a quality of life thing. You know?
1:04:09Just seeing the model, the effort, the window. I love that. And then my number three was routines.
1:04:15I love the Okay. The cloud routines. I I I just think it's so cool that, you know, I know I know we've got the SDK and whatnot, but it's just nice to be able to schedule something that is just my cloud code going.
1:04:25And, yeah, I think those are my top three, and I'm sure they'll they'll move around. But, yeah, I appreciate you sharing yours. It was interesting to hear.
1:04:31I'm glad that Hooks made the list, so I'll I'll definitely be keeping my eye out for that video, though. Sounds good. Yeah.
1:04:35Alright. What do you use routines for? Well, I've got one going now that is a a trading bot.
1:04:42I had that originally going with an open call agent, but I switched it over to routines just to see how how it would do there. But then other things just like it's actually doing worse there. Shoot.
1:04:52It's doing worse there right now, but I don't know if it's I mean, the market and everything as well. But I think OpenClaw had just out of the box, it had better memory capabilities for that sort of thing.
1:05:04So Yeah. Makes sense. But then, you know, just your other standard stuff, like checking in on the team and giving me updates throughout the week and end of week reports.
1:05:14Just very, very simple things, but Mhmm. Nice to throw the routines in there. So, yeah, I really appreciate you walking us through all this stuff today.
1:05:22Is there anything else that you wanna leave everyone with?
1:05:27Uh, that's a good question. Yeah. I I I would say that no matter how technical you are, really what it comes down to is you could think of yourself like the product manager for Claude Code.
1:05:37So you don't necessarily have to describe how to build something, but it's important for you to shape the vision. Right?
1:05:43Like, what are we going to build? And then a lot of people are calling this intent engineering now. I mean, it's kind of another buzzword.
1:05:49But basically, like, you wanna you wanna give, like, the why. Like, Claude code, this is why we're building this thing. Because that really actually ends up shaping the how quite well.
1:05:57So, like, that's a big part of your planning process. That's going to take you far. And, like, it it it seems kinda silly because you really start to get into sort of, like, the personification of Claude code when you're you're telling it why you're doing things.
1:06:10But, like, it actually makes a difference. You kinda have to, like, get over yourself and and be like, it's kinda cringe to treat it like a person, but, like, that actually is how you get the best results. Just just do it.
1:06:19It it actually helps a lot. And you have good plans and good specs going into whatever you're building with Clawd or automating.
1:06:25Great tip. Great tip. I actually did just yesterday read in the Clawd docs on how to prompt 4.8 that it said It said to give it the context for why you're doing something, and it will Yep.
1:06:36Probably do a better job. So that's awesome. Cole, where can people find you if they wanna watch more of your stuff or get in touch?
1:06:44Yeah. So YouTube channel is the main place for me to put all my content. So you can just search my name, Cole Medin.
1:06:50Uh, it is not spelled as you think. It's m e d I n. Sounds like Medin.
1:06:55Everyone says it wrong. Um, but, yeah, that's that's my YouTube channel. And then, uh, also do a lot of posting on LinkedIn as well.
1:07:01Same name, obviously.
1:07:03There we go. Yeah. I think for the first multiple months I knew you, I thought it was Cole Meden, and I was saying I was saying Meden all the time.
1:07:09Nice. Yeah. Good to know everyone cleared up.
1:07:11It's Cole Meden.
1:07:13That's right. Yeah. It's a Swedish last name.
1:07:15And, uh, yeah, Nate, there's there's people that have said it way worse than you. Like, someone called me Meldon,
1:07:20uh, live on stage at a chess tournament in high school. Like, it's it's been worse. Oh, man.
1:07:24Yeah. I don't know. A lot of people have hallucinated the l in there.
1:07:27I've noticed that. I'm not sure why. Oh, really?
1:07:29Yeah. Okay. I've had a lot of people spell it to me as Meldon or Medlin.
1:07:33Oh, wow. Okay. Because I that was actually a one time thing for me.
1:07:36That's the problem. I've gotten that a lot for some reason. But Wow.
1:07:38Anyways, yeah, thank you so much for hopping on, Cole. I was here to not only chat with you, but I also learned a lot as well.
1:07:47So thank you so much as always. It's a pleasure to get to speak with you, and, hopefully, we can do it again soon. Yeah.
1:07:52Sounds good. I appreciate it. And thank you as well, Nate.
1:07:55This is awesome. Absolutely. Chatting with you.
1:07:57Awesome. There we go. Alright.
1:07:58Take it easy, Cole. Yep. Have a good one.
1:08:00Thanks so much for watching today's episode. I hope that you guys enjoyed. Don't forget that I broke all of this down into a free resource guide that you can access for completely free using the link in the description to join our free school community.
1:08:10I'll see you guys in there. Thanks so much.
The Hook

The bait, then the rug-pull.

The promise lands in the first ten seconds: by the end of this you will know how to be the director of your coding agents, not just a user pulling the lever. What follows is a 68-minute screen-share built around one Excalidraw diagram, where software engineer Cole Medin turns a year of building agentic systems into a five-part operating manual for Claude Code — and Nate Herk, by his own admission a non-coder, stress-tests every piece against the knowledge work the rest of us actually do.

Frameworks

Named ideas worth stealing.

08:00model

The direct-it loop (Plan, Build, Verify, Evolve)

  1. Plan with context
  2. Build (delegate to the agent)
  3. Verify with a harness
  4. Evolve the system

The repeatable loop that replaces vibe coding. You plan heavily, delegate the build, verify with a real check, and every pass ends with an opportunity to improve the system so next time is better. Same result, on purpose, every time.

Steal forany recurring AI-assisted task — wrap delegation between planning and validation you stay involved in
19:46list

PLAN.md — the north star spec

  1. Goal + success criteria
  2. Codebase + docs analysis
  3. Integration points
  4. Task-specific rules
  5. Granular task list
  6. Validation strategy

A single markdown document that defines what you are building, what success looks like, which parts of the system get touched, the rules, the step list, and crucially how the agent will know it is done. Its quality determines the agent's success.

Steal forthe spec doc you load before any non-trivial agent task
20:40list

The planning workflow

  1. Prime / load context up front
  2. Research with sub-agents (tech stack, prior art, options)
  3. Have the agent propose the plan
  4. Make the agent ask you a lot of clarifying questions
  5. Reach alignment on the final plan
  6. Then execute

Front-load context, fan out sub-agents to research, co-write the plan with the agent, and force it to ask questions so it is not assuming what you want before any building starts.

Steal forkicking off any new build or automation
13:00list

The verification ladder

  1. Automated /validate auto-checks
  2. Rules / lint
  3. Tests
  4. Human + agent review

A harness that lets the agent prove its own work in tiers, with a fail-fix-rerun loop. The agent is allowed to mess up on early passes as long as it iterates to a clean final output. Without it a first pass is ~65-70 percent; with it, ~92 percent.

Steal forany deliverable where 'it says it's done' is not the same as done
27:01list

Six ways to manage context

  1. Separate sessions per task
  2. /prime first
  3. Specialized primes
  4. On-demand context (skills load only when needed)
  5. Git log to memory
  6. Sub-agents to extract context from parts of the system

Because attention is scarce and the dumb zone is real, control what fills the window: reset between tasks, prime deliberately, let skills pull procedures on demand instead of dumping everything, and use sub-agents to fetch only the slice of context you need.

Steal forstaying out of the dumb zone on long sessions
51:26model

Build the system — bug to permanent upgrade

  1. New rule (e.g. use @ aliases)
  2. Reference doc (e.g. auth-flow.md)
  3. Update the plan (always add tests)
  4. New /command (reuse it forever)

After every issue, ask 'what to fix?' and convert the lesson into a rule, doc, plan change, or command. The floor keeps rising — doc to command to skill — and the system gets smarter every week, so you start welcoming bugs.

Steal forturning every post-mortem into a durable system improvement
44:23concept

Three false senses of security

  1. Telling it never to delete (it still might)
  2. Blocking delete SQL statements (it writes a script instead)
  3. Blocking the delete command (it writes then runs a script)

Each layer is more robust than the last but none is sufficient. Assume the agent will touch anything it can reach, scope permissions and keys, and enforce with hooks that inspect every tool call.

Steal fordesigning the permission layer around any agent with real access
1:04:10concept

Skill + CLI over MCP

To give an agent a tool, a CLI paired with a skill that explains how to use it is more token-efficient and controllable than an MCP server. The skill encodes how you want the CLI used in your workflow; Cole's Archon ships exactly this way.

Steal forwiring any platform (CRM, GitHub) into your agent without bloating context
CTA Breakdown

How they asked for the click.

VERBAL ASK
1:07:56link
Don't forget that I broke all of this down into a free resource guide that you can access for completely free using the link in the description to join our free Skool community.

Soft, value-first CTA repeated from a mid-roll mention; points to a free resource guide and the Cole Medin channel rather than a hard sell.

Storyboard

Visual structure at a glance.

cold open
hookcold open00:01
sponsor
sponsorsponsor05:58
direct-it loop
promisedirect-it loop12:21
plan first
valueplan first16:37
dumb zone
valuedumb zone27:04
harness eng
valueharness eng33:41
verify
valueverify45:36
build system
valuebuild system52:25
top 3 features
valuetop 3 features1:00:57
outro / CTA
ctaoutro / CTA1:06:55
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this