Big Idea

The argument in one line.

Replacing manual prompting with an automated loop only works when the loop is constrained by a deterministic harness that routes work to the right model at each step and keeps all state in an external database, not in any agent session.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use Claude Code daily and have hit runaway token costs or context-window exhaustion on longer tasks.
You want to understand what Boris Cherny and Peter Steinberger actually mean by loop engineering before the term becomes pure cargo-cult.
You are building multi-agent systems and need a concrete pattern for orchestrator-worker architecture with durability and cost control.
You are evaluating Archon, Neon, or Retool for AI development infrastructure and want a practitioner demo, not a marketing page.

SKIP IF…

You are new to AI coding tools — the video assumes daily Claude Code use and familiarity with GitHub workflows.
You want a deep dive into any single framework; this is a survey with live demos across several.

TL;DR

The full version, fast.

The viral claim that the best AI engineers no longer prompt their agents contains a real insight but glosses over brutal economics. Running an LLM orchestrator that decides everything uses a million tokens on a simple app, and a single-session loop bloats context until the agent collapses. The practical answer is a two-layer system: deterministic YAML workflows (Archon) handle the process logic so agents only reason when they must, different models are assigned per step by cost, and an external Postgres database keeps all state so any run can be resumed. The video demos this end-to-end with four parallel GitHub-issue fixers, Neon database branches for isolation, and an open-source dashboard that makes every agent decision visible.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:42

01 · The Loop Engineering Buzzword

Peter Steinberger's 8.3M-view tweet plus Boris Cherny's Fortune article set up the premise. Cole signals honest skepticism and promises a practical three-part breakdown.

01:42 – 05:52

02 · The Core Concept of Loops

Live demo of /loop, /goal, and /routines in Claude Code. Shows the agent writing its own /loop prompt and working through a PLAN.md task list one item per cycle.

05:52 – 08:34

03 · Downsides and Token Costs

Three problems: quality ceiling, token cost explosion (1M+ tokens for a simple app), and context bloat in same-session loops.

08:34 – 13:18

04 · Deterministic Workflows with Archon

Archon YAML DSL: each step runs in its own isolated agent session. Bash steps are deterministic. Different models per node — Haiku for classify, Claude for implement, Codex for review. Human-in-the-loop gates.

13:18 – 17:29

05 · Orchestrating Parallel Coding Agents

One orchestrator Claude Code session dispatches four Archon workflows in parallel, each in a git worktree plus Neon branch. Validates PRs then launches four review workflows.

17:29 – 21:11

06 · My Pi Loop Engineering Dashboard

Agent Control Plane demo: TypeScript app, Kimi K2 orchestrator, Neon Postgres state store. ExcaliDraw diagram shows state-outside-model architecture. Live Kanban board built over 16 rounds.

21:11 – 23:58

07 · Deploying Control Systems to Production

Retool (sponsored) used to deploy the React dashboard to a cloud URL with permission groups, audit trails, and human-in-the-loop approval gates.

23:58 – 24:39

08 · Outro

Closes with Agents Prompting Agents diagram and the reframe: call it harness engineering, not loop engineering.

Atomic Insights

Lines worth screenshotting.

Loop engineering is not a new discipline — it is just orchestration with a rebranding; the hard part has always been the harness, not the loop.
Running a single LLM orchestrator to manage everything costs over one million tokens for a simple single-page app — cost scales with reasoning surface, not just output.
Using the same frontier model for every step is the primary reason loop engineering budgets explode; classification and exploration need a small cheap model, not Opus.
/loop in Claude Code runs in the same session indefinitely, which means long loops bloat context until the agent halves its own effectiveness.
Deterministic bash steps inside a workflow are cheaper and more reliable than asking an LLM to decide what the next action should be.
Git worktrees plus database branches are the two infrastructure primitives that make parallel coding agents actually safe — without them they step on each other's work.
Storing all agent state in Postgres instead of inside a session means any run can be resumed after a crash, cancel, or machine restart with no lost work.
An orchestrator that reads from a database every round starts each cycle with a clean context window — the loop never degrades regardless of how many rounds it runs.
Human-in-the-loop gates are the mechanism that makes loop engineering worth running on real production work rather than just demos.
The Boris Cherny workflow is genuinely powerful for proof-of-concept exploration but should not be the default for production code where quality matters more than throughput.

Takeaway

Build the harness, not just the loop.

WHAT TO LEARN

Autonomous agent loops only pay off when the process logic is deterministic, the state is stored outside the model, and different models handle different cost tiers of work.

Running a single frontier model as orchestrator for every decision is the primary driver of loop engineering cost — classify and explore steps only need a small cheap model.
Keeping all agent state in an external database instead of inside a session means loops can run indefinitely without context degradation and can resume after any interruption.
Deterministic bash steps inside a workflow are cheaper and more reliable than asking an LLM to decide what the next action should be — remove LLM decision-making wherever the answer is already known.
Git worktrees and database branches are the two infrastructure primitives that make parallel coding agents safe — without isolation, agents step on each other's changes.
Human-in-the-loop gates at key checkpoints prevent a bad orchestrator decision from propagating through an entire long-running job before anyone notices.
Loop engineering and harness engineering describe the same thing — the value is in the harness, and the term loop engineering undersells that complexity.

Glossary

Terms worth knowing.

Loop engineering: The practice of building automated loops that prompt AI coding agents on a schedule or trigger, rather than having a human write each prompt manually. Popularized by the Claude Code team in mid-2026.
/loop: A Claude Code slash command that re-runs a prompt on a recurring interval, allowing the agent to poll an external system and act autonomously without human input each cycle.
/goal: A Claude Code slash command that sets a completion criterion and forces the agent to keep iterating until that criterion is met, similar to the Ralph loop pattern.
/routines: A Claude Code slash command for scheduling recurring background jobs — e.g., waking up every hour to read a spec file and execute the next unchecked task.
Archon: An open-source AI coding workflow engine by Cole Medin that packages multi-step agent pipelines as YAML files, runs each step in its own isolated Claude Code session, and supports model mixing, deterministic bash steps, and human-in-the-loop gates.
Agent Control Plane: Cole Medin's open-source TypeScript dashboard for running orchestrator-worker loops. The orchestrator reads task state from Postgres, writes worker prompts, and workers report results back to the database — keeping all state outside the model context.
Worktree: A Git feature that creates an isolated working copy of a repository on a separate branch. Used in parallel agent systems so multiple agents can modify code simultaneously without conflicting.
Neon branch: A Neon Postgres feature that creates an instant copy-on-write database snapshot for each branch, giving parallel agents their own database state without collisions.
Orchestrator agent: The top-level agent in a multi-agent system that reads current state, decides which tasks to dispatch next, writes prompts for worker agents, and monitors outputs. Does not do implementation work itself.
Worker agent: An agent spawned by the orchestrator to execute a specific bounded task. Runs in its own isolated session and reports results back to shared state.

Resources

Things they pointed at.

08:34toolArchon ↗

17:29toolAgent Control Plane (open source) ↗

21:11productRetool ↗

12:42toolNeon Postgres ↗

08:34productDynamous Agentic Coding Course ↗

Quotables

Lines you could clip.

01:33

“I don't prompt Claude anymore. I write loops, and the loops do the work. My job is to write loops.”

Provocative declarative quote from the head of Claude Code — stands alone with zero context→ TikTok hook↗ Tweet quote

05:48

“Loop engineering is really not that complicated. I don't even know if it deserves its own term.”

Hype deflation from someone who just built the thing — satisfying counterpoint to viral posts→ IG reel cold open↗ Tweet quote

11:52

“You don't always need to spend the most per token for every step of your workflow.”

Concrete, actionable money-saving principle that clips clean→ newsletter pull-quote↗ Tweet quote

24:28

“I would just fold loop engineering into harness engineering. It doesn't quite deserve its own buzzword.”

Clean reframe of the whole video thesis — works as a standalone conclusion→ IG reel cold open↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

Apparently, we're not even supposed to be prompting our AI coding assistants anymore. The real skill is designing loops that prompt your agents so they work for you twenty four seven. And I gotta say, I am not sold on this idea right now.

It feels like some of the bigger players in the AI space like Peter Steinberger, the creator of OpenClaw, Boris Cherny is doing this as well, the lead at Clawd Code. They're pushing this new fad, whether they like it or not, of loop engineering.

It's becoming the next buzzword, and I promise I'm not gonna be hyping up loop engineering here. There are some good lessons to be learned from what's surfacing, but also with loops, and you've probably seen this with dynamic workflows and Claude code, for example, they're not always the most reliable and they are extremely token hungry.

So unless you have an infinite budget like Peter pretty much, then you have to be really careful with these kinds of systems. They're not always practical.

And so that's what I wanna cover with you in this video. I just wanna get really honest and really practical with you. We're gonna cover three things.

We're gonna cover loops in a really simple sense. It's not actually that complicated. So I wanna show you how you can run these, and then I wanna talk about the trade offs and then solutions to that.

So really nice and structured here. And so as far as some of the solutions we'll get into towards the end of the video, I wanna show you how we can build a system where we can really observe the loops, the orchestrators and the workers, how we can optimize for cost with the workflows we build and using different providers like pie.

And so it really it's like, here's how you can run loops. Here are the downsides. Here's how we can solve for them.

And really get to the point where we're building harnesses for these longer running tasks because it is really powerful for certain things, but then also covering the honest trade offs with it. Okay.

So we saw what Peter said. Now let's take a look at what Boris, the creator of Claude Code said, and it is really similar. He said, I don't prompt Claude anymore.

I write loops, and the loops do the work. My job is to write loops. Okay, Boris.

I think we get it. And like I said, loop engineering is kind of a buzzword, but also there are some really good takeaways when you dive into this. So, like, Boris through a lot of, like, interviews and podcasts has shared his workflow.

We can get glimpses into how it works. A lot of it is built around the newer features in Claude code. Like slash loop is the most basic example.

And I told you, we're gonna simplify things here. Loop engineering is really not that complicated. I don't even know if it deserves its own term.

And so with slash loop, we set an interval for running a prompt. So like for example, every five minutes, I'm going to check for new GitHub issues in this repo and handle any of that come in. So it's pretty neat.

We set up Claude to basically wake itself up every five minutes, and of course, you can adjust this. And it's going to look for input in an external system like GitHub, for example. And so as long as our terminal is up and running with Claude code, it's able to autonomously handle this.

So basically, it's a every five minute loop looking at GitHub issues. There's also slash goal that we have in Claude code and codex. So we set some criteria like here is how you know you are done, and then we're forcing the coding agent to work until it is done, kind of like Ralph loops that went viral a few months ago.

And then last, we have slash routines. And so these are the scheduled jobs. Like every hour, I want you to wake up, look at some larger spec document, and then handle the next task.

And so really loop engineering is combining or creating a system around all of these things, routines slash loop, so that we can give a larger scope of work as input to an AI coding assistant and have it work through it incrementally. Right?

Because we never want to have a coding agent try to handle too much at once or it will get completely overwhelmed. And the main idea with loop engineering is we wanna have some main orchestrator agent that we talk to. We do minimal prompting, just telling it what we want at a high level, and it figures out how to set up the loop and the entire system.

And it's really easy to do this in Claude code. This is really cool. You just tell it to use the loop skill.

So there's a capability built right into the tool where it knows how to set up these loop systems based on what we ask it to do. So I passed in some kind of simple spec document here, like I just have this as an example. These are the tasks that we want it to go through incrementally.

And so my prompt is telling it to load the skill, so it knows how to set up the loop. And then every cycle, it's just gonna do the first unchecked task, do the validation, and then that loop is done. And then on the next loop, it'll go through and do the next task.

And so eventually, all the tasks will be complete. And then our primary Claude code session here that set up everything is going to report back to us.

Right? So like right here Claude code is sort of the orchestrator, but then also the workers because it sets up the loop itself.

But it is really cool to watch this run. So I'll send off a request here and I'll wait for it to run a little bit. I'll come back and show you kinda how it works.

But you can see that it loads the loop skills, the very first thing. So it knows how to orchestrate things and it'll do the slash loop by itself that I just showed how you can do manually. Okay.

So I came back a couple minutes later and it's already done with the first two tasks. So it's gone through two iterations of the loop already. And so if we go up to the top, we can see that it says this is sequential task list.

It's gonna do the first task and then it's going to schedule a quick wake up. So it sets up the slash loop by itself. And if we scroll down a little bit after it does and validates the first task, we can see that it's resuming with a slash loop wake up.

And look at that. I didn't write this prompt myself at all. I know this is a very, very basic example, but I wanna stay simple on purpose.

But it wrote the prompt slash loop work through plan dot m d one task at a time. And so the kinds of systems that Boris is building is obviously gonna be a lot more elaborate with how we're telling it to run the looping and building in routines and describing how we want it to prompt and work through our context. But at the basic sense, this is really all it takes.

And so now it's just gonna keep knocking things out one at a time. So that is loop engineering in the most basic form possible. But now I wanna get into some of the downsides here, which some of them are definitely pretty obvious to you already.

Problem number one, there is no way you're gonna convince me that loop engineering is the way to get the best results possible with AI coding assistance. I mean, come on. This has to be hyperbole here.

Boris Journey says that there are days he manages tens of thousands of AI agents at once. Like, really? Is is that actually practical?

Is that going to scale? Like, are you really building Claude code with tens of thousands of agents per day? I mean, maybe that does explain some of the bugs we have in Claude code.

I feel like there's constantly a couple annoying ones. But, yeah, overall, like, building these kinds of loops, if we make it a very tight controlled system, that's what we'll talk about in a little bit, I think it's good. And for building proof of concepts and exploring ideas, like, think it's really good.

But it's not like I want to drive all my AI coding with them. And then the second big problem is cost. Because with loop engineering, we're relying on some kind of orchestrator to set up the system and really determine how to get to the end goal.

So it figures out how many workers to spin off, how many loops to do, and that gets super expensive. So the dashboard that I built that I'll show you at the end of this video, I built cost tracking into it. And so for a single run, like here are all the loops that the orchestrator went through, it costed me over a million tokens just to build a relatively simple application.

And yes, I'm sure there are a lot of optimizations that I can do here. But I think you can see just by looking at this, I mean, is part part of why I built the dashboard. You can see why it would be so expensive because we send in our initial spec to the orchestrator and it has to reason about that and then figure out how many workers to spin off.

Then it has to prompt them all. They each spend tokens and then the results come back. The orchestrator has to then reason about that again and then send off the next wave.

No matter how you design the system, there's a lot of context passing and reasoning to make everything work here a distributed way. So it's a really powerful system and it's cool how far you can take this kind of stuff with the self validation, but man, does it get so expensive.

And then really quick, the third problem with loop engineering is at least for a lot of setups, you're not really working between different coding agent sessions. Like when you're just using slash loop in Claude code like Boris talks about a lot, it's really just continuing in the same coding agent session.

So if you loop for a while, you're gonna completely bloat your context for your LLM and overwhelm it. And so we need a system where we can distribute the work actually between different coding agent sessions and make it so they can all communicate to each other and have an idea of like where they fit in the larger goal.

And so that's what I wanna cover for the rest of the video here. So I wanna talk about how I actually work on a day to day basis because this will cover how we can solve for a lot of these problems we have with loop engineering. So I use my tool, Arkon, but I'm not just trying to like push Arkon on you here.

I just wanna talk about how I use it in a way that solves for the problems of cost, reliability, and how do we actually orchestrate many different coding agent sessions. And so go through this with me here.

So Arkon is my harness builder. It allows us to build workflows that orchestrate many coding agent sessions to handle larger tasks.

And so for example, a really classic AI coding workflow is you do your planning, you do your implementation, and then you do your code review or your testing. And so we can build this as a single Arkon workflow.

There's a ton of content that I have on Arkon on my channel. I'll link to a video right here to help you get started if you're interested in this. But again, I just wanna focus on like how I use this on a day to day basis to kind of do loops.

Like you can do loop engineering with Arkon. You can build the Ralph loop with Arkon. So I'll show you an example of a workflow here.

If I go into the default workflows, there's a ton that we have that ship with Arkon. Let's take a look at fixed GitHub issue, for example. And so I don't wanna get too in the weeds here, but I just wanna show you really quickly at a high level how this workflow works.

And another really important thing with Arkon is that we're not having the agent drive the entire thing. It's more deterministic because we set up the process in this workflow file and then we even have certain steps that are deterministic.

Like the agent is not driving it. We are guaranteeing that it's going to happen. So when we're building these loops and larger tasks that we have our coding agent knock out, we want to actually take the decision away from the coding agent as much as we can, only applying the reasoning of the LLM when we actually need it to write the code, for example.

Like, we might want our agent to write the code but not actually decide the test to run because we know what it looks like for our test to pass. And so for this workflow, first we extract the issue number. So the input here is some GitHub issue that we want to fix or address.

So we extract the context, we fetch the issue context, and then we classify it. So we have a large language model decide at first, are we addressing a bug or are we implementing a new feature?

And then the workflow is going to be dynamic based on that decision. So the kind of thing that your orchestrator would usually decide, we're more in forcing a specific process here. Like this is the kind of thing that I want to sort of layer on top of loop engineering.

Like let me be in the loop. Let me determine how the workflow can progress. And so then we research the issue, investigate it, and then we go do the implementation and the validation and we create the pull request.

Right? Step by step, each one of the steps, we're using markdown documents as context. So we're handing things off between the steps, but then each step is running in its own coding agent session.

So if we're handling a larger GitHub issue, it's not like this entire thing is running with slash looping cloud code, getting totally overwhelmed with each of the tasks that we're doing as we're planning, implementing, and validating. And the way that I can manage cost here is every single node in this ARCON workflow, I can actually decide what model am I going to use.

And so, for example, with the classify step here at the top when we're figuring out, you know, what kind of issue do we need to address in the rest of the workflow, so it's kind of like the orchestrator decision. We can use a small model like maybe using Haiku or MiniMax m three, Kimi k 2.7 for example.

Right? Like what we can do in Arkon is even mix providers. So we can use clog code for the implementation and then we can use Codex for the review.

And then for all of our context loading and exploration upfront, we can use a smaller model like KimiKey 2.7. And so that's one of the other big issues I see with using slash goal or routines or loops in Claude code is you're just using one model for pretty much everything. That's part of the problem of why it's so expensive.

Because when we're doing larger amounts of work like this, of course, you're gonna have to spend more tokens. But you don't always need to spend the most per token for every step of your workflow. And I know I'm really really driving this in the ground right now, but yet another reason you want some kind of harness like what you can build with Arkon is we have durability.

So this is my Neon database. I'm storing all of my logs and runs in Postgres so that I can resume a workflow even if my machine goes down or I cancel things like whatever I do, I'm always able to resume on exactly the step that I was in that larger loop or that larger workflow.

So I have all my conversations, the code bases that I'm operating on with Arkon, everything is durable and super easy to resume any work that I'm doing.

Okay. Cool. So now I wanna show you how I actually use Arkon on a day to day basis.

A lot of ties that we can draw to loop engineering and things that I really fixed with it. Right? And so at a very, very basic sense, one of the most classic workflows that I use with Arkon is fixing GitHub issues.

Most of the input for my day to day work is issues in a repo. Either I'll create them or someone else will. And so we can use Arkon to send off workflows to run-in parallel handling multiple GitHub issues at the exact same time.

And this is very much like loop engineering because we have our primary clog code here as our orchestrator and it's figuring out based on my higher level request, I'm going to create the prompts and dispatch the workflows. Work trees are also a really important part of loop engineering. Boris talks about this as well.

If we're having many different agents handling tasks in a loop, we need to make sure they're running in isolation so they're not stepping on each other's toes. That is how we scale our output with AI coding assistance. And so we have our Cloud Code here kicking off four workflows to handle GitHub issues.

It's going to validate the PRs after, like, sure that they're actually created, and this is where we can come in with human in the loop as well. And then it'll run four more workflows to validate, like perform a code review on each of the issues as well.

So very comprehensive, kind of a loop in a sense where it's like handle the issues, validate, and then do a code review. And, uh, another thing as far as like making this more reliable is with our con workflows, we can also build human in the loop within any individual node in the workflow. So we can always have it pause for us to validate something before it continues, which is one of the biggest problems with loop engineering right now in general is that a lot of times people set up these systems to just go go go go.

And then you have it run for a day and by the time it comes back, you just have crap. Like I've had that myself as I've tested a lot of things within Cloud Code like routines and slash loop. And so I'll send this off here and I'll just pause and come back once it's done so so we can walk through everything that it accomplished here.

And the best part about all of this is we actually have nine coding agent sessions for this entire loop or whatever you wanna call this entire harness. Right? Like one per GitHub issue fix, one per review, and then we have our primary orchestrator.

So we're doing a ton of work, but at the same time, we actually are pretty lean for each individual session because I actually I kind of have to correct myself. It's more than just nine sessions because even within each individual r con workflow, we're running separate coding agent sessions where we can have different models.

We can optimize for cost. There is a lot of engineering that goes on behind the scenes here. Alright.

So I'm back after the entire thing ran. I just wanna show you how comprehensive we can be here. And so we have the four workflow runs for actually fixing the issues, and then Cloud Code here is really monitoring and orchestrating everything.

Right? So, like, as the different tasks are done, it's coming in and checking on them. And then finally, we have everything done together.

So all four fixed workflows are done, then it launches the code reviews because it confirms that all of the pull requests are ready to be reviewed. And you can even ask for a status update. So, like, while the archon workflows are running, if we wanna see where we're at, we can, of course, check the logs in the archon web UI.

I have that as well. But then also we can just ask our orchestrator. Right?

Because it really is in control of our entire situation here. And then finally, all the reviews are done and it gives us the things that need our attention now.

We can really come in and direct things from here. It's the harness driving everything, but we still can be in the loop wherever we want. I know there's a lot that goes into effectively orchestrating parallel coding agents.

There's a lot of content on my channel where I cover this kind of thing. Like, for example, one thing that you have to do a lot is branches in your database. Right?

Like, if each coding agent is working on something in parallel, you don't want them to be stepping on each other's toes, not just with code changes, but also database changes. So work trees and Neon is a super powerful thing. A lot of different things like port conflicts that we want to solve for as well.

So I'll link to a video right here where I cover that stuff and just generally how we can make parallel AI coding more reliable. So assuming you take care of all of that, you can really let Arkon rip on as many GitHub issues or whatever in parallel. Very cool how far we can take our output here.

Alright. So we have covered a lot in this video already. Loop engineering basics, the downsides of it, how I'm using Arkon to extract the good parts out into more deterministic workflows.

But last, I wanna cover a system that I built for loop engineering in its purest form because I presented these issues to you but I I do see a lot of promise with this. I want to try to build a system that solves for these problems. And so I built this dashboard that I'm really excited to show you right now.

I actually have it open sourced on GitHub, link to this in the description. And I have built this to solve for a lot of the problems that we have with loop engineering right now. So first of all, we have durability.

Just like with Arkon, all of the loops that we run-in the different events and logs, I'm storing this here so we can always resume a workflow later on.

So we're managing all of our state in an external database, so we're not relying on that staying in any coding agent session. And so our main orchestrator, it is going to read through this state here and then figure out like, okay, what is the next thing that we need to do?

And so then it's going to call upon the workers to accomplish all of that, like build a new feature or do some kind of validation, whatever it needs to do. And then those workers are gonna go back and they're going to update the state that we have in our database. Like, again, I'm using Neon for Postgres here.

And so this is our loop. Right? Because then the next time the orchestrator runs, it's going to get that update state from the workers and then figure out the next workers to invoke.

And there are a couple of problems that I'm solving by building something like this. And and I wanna start by saying, like, this is more experimental. I'm just showing you something that I'm working on and kinda building into my own second brain.

But first of all, I'm driving everything with pie. So I'm actually using my Kimi subscription with Kimi k 2.6, now Kimi k 2.7 to drive all of these workflows.

So yes, it is a lot of tokens, but I'm not using Opus for everything. But I'm still getting really good results because of the harness that I built here that elevates the model. And then I have a lot of observability built into this dashboard.

I mean, obviously, it being a dashboard, it solves part of that reliability problem, which obviously I'm still working on. But just being able to see exactly the decisions that are going on here means that it's easier for me to, uh, look at this, even have my coding agent analyze the runs in the database and then figure out how to improve the loop, how to improve the harness here.

And so I just I've been going through a lot of really simple examples, but like nontrivial enough where it does have to go through quite a few rounds to build it. So like building a single page Kanban board as a static web app, I just take this prompt and I'll show you it running live right now. Like I'll just send this in and I will start a loop.

And it's really cool. We can see that the orchestrator is deciding how to split up the work right now. And then we also have, like, the full run history here.

It's pretty neat. Like, it's super easy to get this up and running. Uh, if you just wanna check out the GitHub repo linked in the description.

But after a little bit, the orchestrator will decide, here is how I'm going to create that first wave, and then we'll see the workers dispatched. So there we go. The orchestrator spent 6,000 tokens with that initial planning and then prompting our first three workers in round number one.

And so we don't have to watch paint dry seeing this go to completion here, but you get the idea. We saw the full run-in the logs earlier of how it'll go round by round doing validation each time, and we can even have human in the loop so that we get to actually take a look at what has happened in the first round before the orchestrator moves on to the next.

That is the kind of reliability that I feel like we really need to have right now in order to build anything more than simple demos with, uh, this kind of loop engineering setup. And so, yeah, I I would encourage you to just play around with this kind of idea. Like building a a dashboard to manage more autonomous tasks in something like your second brain is a big thing that I'm focusing on right now.

And we can even take this kind of dashboard and deploy it to the cloud as well so we can access it from anywhere, maybe even start to share our loop setup with our teammates. And these days, it's just so easy to take applications that you build locally for these kinds of control systems and deploy them to production so you can use it remotely or have a team use it.

Retool is a tool specifically I've been leaning on a lot for these kinds of deployments. And so it's just so easy to create an app here, and then we can import React code. So I just had Claude code build the entire dashboard in React with the idea of I'm gonna deploy this here.

It's so incredibly easy. So I just go in and I take the zip file of the front end that I just showed you and then its agent is gonna go through wiring everything up. So it'll connect to the back end with the API that I have running with py.

It'll get everything deployed to a real URL that I can use. It's really neat. So for example, here connecting to my Neon database where I'm storing all of the runs for durability, it asks me to set up a connection here.

So I can create a new resource. I can select Postgres because that's what Neon is running under the hood and then set up all of my connection information here. So really easy to make that connections.

I'm just deploying the front end here and then connecting it to wherever I'm hosting my app hosting pie running behind the scenes. So So I'll get all this hooked up off camera and then I'll show you the final result here. And there we go.

Everything is deployed. We can see our app hosted in the cloud just like it was running locally. Very cool.

So now we have a URL where we can share this. There's also a lot of other cool things you can do in retool. Like you can set up permission groups and so certain actions that you can gate with an API endpoint.

So you have to approve it and have the right permissions to do so. So for example, being able to pause the workflow and then resume it.

If I click this right here, can see that approve and resume and you can see the identity that I have through retool. It's giving me permission to actually do that. And then it's also very easy to edit this application.

I can continue to make changes with it here in the cloud as I need, adding new features to the front end, whatever I need as I'm evolving my dashboard. So, yeah, I'm just spending a lot of this with, like, deploying dashboards for observability and helping me with all my systems for my second brain and my AI coding, very powerful stuff.

And a quick shout out to the retool team. Ever since I've been using their platform, I've been working with them and I even collab to bring this integration in the video today. It's a great platform because you get to build your applications directly in retool or you can import it like I showed earlier.

But then your team, regardless, has a single governed path to production with audit trails, really easy to make your changes just with chat like I showed here, and the review system with human in the loop, all of it that you need to ship your apps to production. And I'll have a link in the description. If you go now, you get free app imports through July 1 and bonus AI credits on all paid plans.

So that's everything I have to cover for loop engineering. The basics, the problems with it, how I'm solving for it, because I I really do want to incorporate loop engineering. Like, I'd like the concept of it and I want to drive how autonomous my coding agents can be, but you gotta have the right system.

Otherwise, things are gonna completely fall apart like we've already talked about. And so I hope I've inspired some ideas for you, even like how to use Arkon or start to build this sort of harness for yourself. Really, would just fold loop engineering into harness engineering.

It doesn't quite deserve its own buzzword. Right? But like there are some good ideas here.

So I hope you found this useful. If you did, I would really appreciate a like and a subscribe. And with that, I will see you in the next video.

The Hook

The bait, then the rug-pull.

A viral tweet from OpenClaw's creator and a Fortune headline about the head of Claude Code both landed the same week: the best AI engineers have stopped prompting their agents. They write loops instead. Cole Medin watched both, built the thing himself, and came back with a more honest report than either headline delivered.

Frameworks

Named ideas worth stealing.

01:42list

Claude Code Loop Primitives

/loop — recurring timed prompt
/goal — run until criterion met
/routines — scheduled background jobs

The three built-in Claude Code slash commands that implement loop engineering natively.

Steal forAny background automation or recurring task management workflow

05:52list

Three Problems with Loop Engineering

Quality ceiling — loops don't produce best results
Token cost explosion — orchestrator reasoning is expensive
Context bloat — same-session loops degrade over time

The three structural problems that make naive loop engineering impractical for real production work without a harness.

Steal forEvaluating any agentic system pitch — ask which of these three it solves for

11:42concept

Model Tiering by Step

Assign cheap small models to classify/explore steps and frontier models only to implementation and review. Reduces per-workflow cost without sacrificing output quality.

Steal forAny multi-step agent workflow to cut token spend by 3-5x

17:29model

State-Outside-Model Architecture

All task state lives in an external database. Each agent round starts by reading fresh state, so context windows never fill up regardless of loop length. Enables resume-after-crash and team observability.

Steal forAny long-running agentic system that needs to run for hours or days

CTA Breakdown