Big Idea

The argument in one line.

An AI coding loop needs only two things — a trigger and a goal — and whether that goal is verifiable or left to the LLM to judge is the one decision that determines how reliably the agent actually stops.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use AI coding agents (Codex, Claude Code, Cursor) and want them to keep iterating on a task until a bar is met, without you manually re-prompting.
You run recurring code-quality work — performance audits, doc syncs, refactors, log coverage, SEO sweeps — and want to automate the loop structure.
You want concrete, copy-paste prompts you can drop into Codex or Claude Code today, not theoretical explanations.

SKIP IF…

You need the agent to build net-new features from scratch — loops are unreliable for open-ended creative building, and the video explicitly says so.
You are on a strict per-session token budget — these loops can run for hours or days and will burn tokens proportionally.

TL;DR

The full version, fast.

A loop is an AI agent instruction that runs autonomously until a defined goal is met. The trigger starts it (manual, schedule, or action like a PR open); the goal stops it. Verifiable goals (measurable thresholds like every page under 50ms) produce cleaner, more reliable loops than LLM-as-judge goals (subjective satisfaction). The video walks through 7 ready-to-copy loops covering performance, docs, architecture, logging, production errors, SEO/GEO, and full product evaluation — plus honest caveats: loops are expensive, not suited for feature-building, and can run for days if unchecked.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 02:22

01 · What is a loop?

Defines loops as trigger + goal structures that remove the human from the iteration cycle. Explains three trigger types (manual, schedule, action) and two goal types (verifiable vs LLM-as-judge) with concrete examples.

02:23 – 05:12

02 · Loop 1 — Sub-50ms page-load

The favorite loop: optimize every page in the app until all load under 50ms. Demonstrates the /goal flag in Codex. Ran for nearly 50 minutes in the live example.

05:13 – 05:38

03 · Sponsor — DigitalOcean

Mid-roll DigitalOcean ad covering inference infrastructure.

05:39 – 07:16

04 · Loop 2 — Overnight docs sweep

Nightly scheduled loop that reviews the full codebase and opens a PR when documentation drifts from the implementation. LLM-as-judge stop condition.

07:17 – 08:25

05 · Loop 3 — Architecture satisfaction

LLM-as-judge loop: refactor until the model is happy with the architecture. Can run every night after daily deploys to keep the codebase clean.

08:26 – 10:25

06 · Loop 4 — Logging coverage

Adds missing log coverage until every important path emits tested logs. LLM-as-judge. Works well chained with the production error sweep.

10:26 – 11:15

07 · Loop 5 — Production error sweep

Nightly loop that reviews production logs, traces each actionable error to its root cause, fixes it, verifies the fix, opens a PR, and pings in Slack.

11:16 – 12:04

08 · Loop 6 — SEO/GEO visibility

Weekly audit across crawlability, indexation, page intent, internal links, structured data, and answer-first content. Runs until no critical technical issues remain.

12:05 – 13:57

09 · Loop 7 — Full product evaluation

Most ambitious loop: generate N realistic scenarios covering every capability, test each one, fix failures, rerun, repeat until every scenario meets the quality bar. Can run for 12+ hours.

13:58 – 16:11

10 · Caveats — cost and limits

Loops are not good for feature-building (direction is unpredictable). They are expensive — tokens burn autonomously until the goal is hit, potentially for days. Excel clone example: ran for days before being manually stopped.

Atomic Insights

Lines worth screenshotting.

A loop needs only two things: a trigger to start it and a goal to stop it — everything else is optional.
Verifiable goals (every page under 50ms) produce more reliable loops than LLM-as-judge goals (refactor until satisfied) because the stop condition is unambiguous.
The /goal flag in Codex and Claude Code is what tells the agent to keep running until the condition is met — without it, the agent stops after one pass.
Scheduling a loop nightly turns a one-time audit into a permanent quality guarantee on your codebase.
Loops are not good for feature-building: you cannot say 'loop until we build a full permissioning system' because you do not know which direction the AI will go.
The production error sweep loop traces every error to its root cause, fixes it, verifies the fix, opens a PR, and pings you in Slack — all autonomously.
LLM-as-judge goals are brittle because taste and judgment are left to the model, which makes the stop condition unpredictable.
The full product evaluation loop creates N realistic test scenarios and iterates on the product until every scenario passes — it can take 12+ hours.
An agent that cloned Excel with computer use ran for days before being manually stopped — an example of a loop without a realistic stop condition.
Loops that build on each other multiply in value: logging coverage plus production error sweep together give full autonomous quality coverage.

Takeaway

Two questions that decide if a loop will work.

WHAT TO LEARN

Before handing a task to an autonomous agent loop, you need to know what starts it and — more importantly — what stops it.

Every autonomous loop needs exactly two components: a trigger (what starts the agent) and a goal (what tells it to stop). Everything else is optional.
A verifiable goal — a concrete, measurable threshold like every page loading under 50ms — produces a more reliable loop than a subjective one because the agent can test it deterministically.
When the stop condition is subjective (refactor until satisfied, documentation is complete), the model decides when it is done, which makes the endpoint harder to predict and the result harder to audit.
Scheduling loops to run nightly converts a one-time audit into a permanent quality floor — the agent checks every night and only opens a PR when something needs fixing.
Loops are not suited for building features from scratch: when the direction is open-ended, you cannot know what the agent will build, how it will prioritize, or when it will stop.
Two loops that chain well: logging coverage (adds missing log coverage) followed by production error sweep (finds and fixes errors in those logs), giving full automated quality coverage overnight.
Token cost scales directly with run time — a loop that runs for days will consume tokens for days. Know your budget before triggering a long-running autonomous job.

Glossary

Terms worth knowing.

Loop: An AI agent instruction that runs autonomously, repeating its task until a defined goal is met, without requiring human re-prompting between iterations.
Trigger: What kicks off a loop — either a manual command, a scheduled time, or an event like opening a pull request.
Verifiable goal: A loop stop condition defined by a concrete, measurable threshold that can be checked deterministically, such as every page loading under 50 milliseconds.
LLM-as-judge: A loop stop condition where the model itself decides when the goal is met, used for subjective outcomes like architectural quality or documentation completeness.
/goal: A slash command in Codex and Claude Code that tells the agent to continue working autonomously until the specified condition is satisfied.
Loop Library: A free, publicly accessible collection of ready-to-copy AI agent loop prompts, each with a defined trigger, prompt body, and verifiable or LLM-judged stop condition.
GEO: Generative Engine Optimization — structuring web content so it appears in AI-generated answers and answer boxes, as distinct from traditional search engine rankings.

Resources

Things they pointed at.

02:23linkLoop Library

05:32productDigitalOcean

08:06producthear.now

08:25toolCodex

10:26linkFree consulting sessions (Matthew Berman team)

Quotables

Lines you could clip.

00:31

“The most important thing about loops is that it removes humans.”

Punchy one-liner that frames the entire value proposition of autonomous agents→ TikTok hook↗ Tweet quote

05:23

“Loops are the frontier of AI workloads.”

Declarative claim, works as a standalone pull-quote→ newsletter pull-quote↗ Tweet quote

15:41

“Loops are very expensive. They are churning through tokens autonomously until they hit the goal.”

Honest counter-point; pairs well with the hype claims earlier→ IG reel cold open↗ Tweet quote

14:42

“I told the model to clone Excel, feature parity. And it was running for days and days and days until I finally stopped it.”

Vivid concrete story that illustrates the loop cost caveat without abstract explanation→ TikTok hook↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphorstory

00:00Loops are emerging as the single biggest unlock for people building software with artificial intelligence right now. But most people don't even know what loops are. And so today, I'm gonna tell you what loops are.

00:12I'm gonna show you why they're valuable, and then I'm actually gonna give you many specific use cases that you can use loops for today. So what is a loop?

00:23A loop is a way to allow your AI coding agent to work autonomously towards a specified goal.

00:31The most important thing about loops is that it removes humans. That allows the agent to work much more quickly towards this defined goal. And if it sounds very theoretical, I am going to break it down.

00:45So what is a loop more specifically? Well, you need two things. You need a trigger, and you need a goal.

00:52With those two things, you can complete the loop. A trigger is what kicks off the loop, and there are three ways to kick off a loop. One, you can do so manually.

01:03You literally tell the agent, go do this loop. Two is schedule. You can schedule a loop to happen at a certain time of day or on a repeating schedule.

01:13And then three, you have actions. You can have the loop kick off based on some kind of action like opening a PR. Now to fully remove the human, we wouldn't wanna kick everything off manually, but sometimes it is required.

01:27Alright. And for the goal, the goal can be basically one of two things. It can be verifiable, or we can use LLM as a judge.

01:36So if it's verifiable, it is something concrete, some specific number or some way to test it deterministically. If it is LLM as a judge, that means we're giving the model the ability to determine when it has reached the goal. Let me give you two examples.

01:53So for verifiable, 100% test coverage in our code base is an example.

01:58That is something that we know for sure, and we have a nice way to test against when it is true. And for LLM as a judge, one example would be refactor until satisfied.

02:10And the satisfaction just means you as the LLM get to determine when we are satisfactorily refactored enough.

02:20Alright. Enough of the theoretical. Let me actually show you some examples.

02:24So a lot of people talk about loops, but they don't actually give concrete use cases, and I wanted to fix this. That is why I am launching the Loop Library. It is a free library.

02:35I'm basically taking all the loops that I use and the ones that I see other people use and putting them in a single place so you can see them, you can be inspired by them to create your own loops, or you can simply copy them straight from here. It's free. I'm gonna drop the link down below.

02:50So let's go over it. This is definitely my favorite loop, and it's going to show you exactly how loops work. This is the sub 50 ms page load loop.

03:01Let me click into it, and here we are. So the objective of this loop is to get every single page load in my app under 50. And so that is the goal.

03:13It is a very concrete, well defined goal, which really makes building a loop easier. So what I tell it is continue optimizing the code for speed after each significant change, measure page load performance across every page under the same repeatable test conditions.

03:32Continue until that's the loop.

03:35Continue until every page loads in under fifty milliseconds. So it is literally gonna go through my entire application, every window, every page, every modal.

03:46Load it. If it's above fifty milliseconds, it's going to continuously optimize it until it gets it under fifty milliseconds.

03:54Once it's done with one, it moves on to the next. That's the loop. That's the goal.

03:59But how do I actually do that? How do I actually kick it off? Well, the trigger in this case is me.

04:06I am the human, and I'm going to manually kick off this loop. You can certainly set it on a schedule, and you can even trigger it on, let's say, a PR open.

04:17So every time you open a new PR, you also want to make sure that that new PR doesn't make the page load over fifty milliseconds. So let's kick it off. So we're going to click copy right here.

04:27All you have to do is paste it in. So I have the prompt right there. And then at the end or at the beginning, it doesn't matter, type slash goal.

04:36And this is a feature in Codex. Claude Code also has a slash goal feature. But as soon as you have this slash goal, it's telling Codex to continue working until the condition is met, the condition of every page loads under fifty milliseconds.

04:51That's it. You just hit go, and it might run for ten minutes. It might run for ten hours.

04:57It will just continue to run until it meets the goal. And so you do have to keep a close eye on it if you're under a token budget constraint. So here it is in action.

05:06I sent this as a goal. Look for more optimizations to make sure every page loads in under fifty milliseconds on production. It worked for nearly fifty minutes, so I'm treating this as a production performance goal.

05:17I'll first measure the real Teams page request path, and it basically, as you can see here, went through every single page and optimized it to load under fifty milliseconds. Loops are the frontier of AI workloads.

05:31And if you wanna power them reliably and at production scale, use the sponsor of today's video, DigitalOcean.

05:39If you're running production inference, you're probably running into some of these problems. Your inference stack is too complex to operate. Costs are unpredictable, and I'm spending more time managing the infrastructure than actually building the things to be on the infrastructure.

05:52And most teams find out the hard way that the hard part of building AI applications is not using the model. It's actually everything around the model. The operational overhead, the fine tuning inference complexity, the costs that become harder to predict as you scale, and that's why I wanna tell you about DigitalOcean, the partner of this video.

06:12DigitalOcean is designed to minimize the total cost of ownership by giving teams a simpler path to production AI. They provide infrastructure that is optimized for inference and a vertically integrated core cloud that provides efficiency at scale.

06:29Vertically integrated is the keyword. And with transparent usage based pricing that makes cost easy to predict.

06:37So if you wanna spend less time managing your infrastructure and actually building the thing you're excited about, DigitalOcean is the way to go. So go check it out. They've been a fantastic partner.

06:45I've actually been using DigitalOcean for well over a decade at previous companies, so I can vouch for them. Go check them out. Link down below.

06:53Now back to the video. Here's another loop that I really like. This is called the overnight docs sweep.

06:59Each night, review the code base in full and make sure all documentation reflects the latest changes from the previous day. Update the documentation as needed, then open a pull request with those changes. So what I am doing is I'm making sure we have complete documentation based on any changes we may have made.

07:17This is an example of LLM as a judge. There's no verifiable way to know if we have complete documentation coverage. There may be some ways that we can say, okay, as long as a piece of documentation covers this section of the code.

07:31But ultimately, what we're doing is saying, okay, LLM, you decide. So how do we actually use this? Well, once again, just hit the copy button.

07:39We're gonna come into Codecs. We're gonna click this automations tab. We're going to create via chat.

07:44We're gonna delete this portion. I don't know why they put that in there, but I wanna set up an automation. Then we paste in what we just copied, and then each night review the code base in full, hit go, and let it run.

07:55And hopefully, it will set up an automation just like this. So there we go. I'll set this up as a recurring automation.

08:00So first, I'm loading the automation tool rather than writing a one off note. Perfect. So this is a way to keep your documentation always up to date.

08:08It is awesome. And by the way, I created this website with hear. Now.

08:13So shout out to hear. Now, the partner on the loop library. I created it, and I simply said, deploy to Hear.

08:21Now, and it was done. It's so easy. Next is the architecture satisfaction loop.

08:26This is one that Peter Steinberger himself says he uses often. Here we go. Refactor until you are happy with the architecture.

08:36Here is the trigger and the goal all in one sentence. Refactor, which is what the loop is going to do, until you are happy with the architecture.

08:44Happy with the architecture is the goal. This is another example of LLM as a judge.

08:50We can even give it more guidance on what happy with the architecture means. We can say, be very strict about simplicity or make sure every single line of code is dry.

09:01Then after each significant step, live test the system, run auto review, and commit. Track progress in, and then we give it a markdown file to track the progress.

09:11This is fantastic. So it's tracking its loop as it's actually looping. Now you can kick this off manually or you can run it every night.

09:19So let's say during the day, you're deploying a bunch of code, and then every night, you're just making sure that it's refactored, it's dry, and it looks really solid. So very good way to keep your code base very clean. Next, another one of my favorites, the logging coverage loop.

09:36So let's click into it. Basically, what this loop is gonna do is make sure that we have thorough logging throughout our app. And there's another loop that builds off of this that I'm gonna show you in a minute, which these two loops together, you can start to see how loops can become so powerful.

09:52So this says review the system's logging and add missing coverage until every important path produces useful tested logs.

10:00And, again, this just makes sure that we have logging for everything. And this is gonna be manually kicked off, and this is going to be LLM as a judge because it says every important path and important is nondeterministic.

10:15It just means the LLM gets to decide what's important and what isn't. And by the way, if you want hands on help with loops and other AI topics at your company, my team is offering free consulting sessions. I'm gonna drop a link down below.

10:28We're only doing a few of these, so go apply if you're interested. Would love to talk to you. Alright.

10:33So now imagine this. You have full logging coverage, but what do you actually do with those logs? Well, I have another loop for you.

10:40This is called the production error sweep. Every single night, we're going to review our production logs for errors. If you find an actionable issue, trace it to its root cause, fix it, verify the fix, and open a pull request.

10:55Then ping me in Slack with the findings and PR link. If no actionable errors are present, ping me with that result instead. So we are kicking off a loop every night, and the loop is looking for every error in the logs, and we'll fix them one by one with the end goal being no more unaddressed errors in the logs.

11:16So that is a very concrete goal for this loop. Alright. Here's another loop.

11:21Something incredibly important to any website owner, any app owner is SEO. And not only SEO, now GEO.

11:29So here's the SEO, GEO visibility loop. Run an SEO, GEO audit across crawlability, indexation, page intent, titles, internal links, structured data, source citations, and answer first content.

11:45Rank the gaps. I'm not going to read the whole thing. Fix the highest leverage issues.

11:49Rerun the same crawl. And here's the loop. Repeat until no critical technical issues remain.

11:56Again, you might have one issue. You might have 50 issues. The point is we've now kicked off a loop that fixes all of them until no more issues are present.

12:08So this is a really cool one to run, let's say, once a week. Alright. Here's one of my favorite and one of the most hand wavy loops that I have, but listen to this.

12:16This is called the full product evaluation loop. Create n realistic scenarios covering every major capability before testing to find clear success criteria and choose a consistent evaluation method such as pass fail checks or a scoring rubric. Run every scenario under the same conditions and record evidence for each outcome.

12:35Fix the underlying cause of anything that that does not meet the criteria. Rerun the affected scenarios, and then rerun the complete test.

12:44Continue until every scenario meets the original quality bar. Now a lot of you might be thinking, wow. That just sounds like tests.

12:51Right? It's just like a test suite. Well, kind of, but this is actually nondeterministic.

12:57This is allowing the model to go through every single use case in your application, in your product, figure out if it's good enough, determined by the LLM, and update it if necessary.

13:09This one really does work. It takes, like, twelve hours at times or more, but it really does come up with very good optimizations. Now you can also customize this for your specific app.

13:22So for example, I'm building something right now that requires me asking a question of an LLM and it providing a really accurate response with sources. So I tell it, come up with 100 different use cases, wide ranging use cases for asking the LLM questions and judge whether the response is good enough.

13:41If it's not, iterate and improve it. So I can keep going, but if you wanna find all of the loops and any new ones that I discover, go check out the loop library. I'm gonna drop a link down below.

13:51And once again, shout out to hear. Now for hosting the loop library. Okay.

13:56So there are two major caveats with loops that I have to tell you about. Number one is it's not for every problem yet.

14:05Designing a loop isn't always easy. Specifically, coming up with the goal for the loop is not easy.

14:12If something can be verified, like every page loads under fifty seconds, that is perfect for a loop.

14:20When we have to have the AI judge, LLM as a judge, whether a goal is met or not, that's when it becomes a little more brittle because we are leaving taste and judgment up to the model.

14:34This becomes even more difficult when we're talking about building features. I've not really found a way to build features with loops. You cannot say loop until we build a full permissioning system.

14:47I mean, you technically can, but I'm not doing it because I don't know which direction the AI is gonna go. I don't know what features it's gonna build.

14:56I don't know when or how it's going to decide which features are worthwhile versus which are not. So that makes it not great from day zero feature building. Now one example of building a product from scratch using a loop is something I did where I told the model, as a goal, to clone Excel, feature parity.

15:19And it was running for days and days and days until I finally stopped it. It actually opened up Excel on my computer, used computer use, and literally clicked through and made sure that it had feature parity. And, yes, it was running for days before I finally stopped it.

15:35So I do not recommend doing that. And that brings me to the second big caveat. Loops are very expensive.

15:42They are churning through tokens autonomously until they hit the goal.

15:48Some of these agents might run for ten minutes. Some of them can run for days. So for you token maxers out there, loops are fantastic.

15:56But for those of you who don't have an unlimited token budget, this might not work for you today. And by the way, if you like coding with loops, you might also like these four open source projects that I reviewed that you can use right

The Hook

The bait, then the rug-pull.

Loops are the single biggest unlock for AI software builders right now — and most people still don't know what they are. In 16 minutes, this video defines the pattern, demonstrates it live in Codex, and hands you seven copy-paste prompt templates covering performance, docs, architecture, logging, error sweeps, SEO, and full product evaluation.

Frameworks

Named ideas worth stealing.

00:43model

Trigger + Goal loop anatomy

Trigger (manual / schedule / action)
Goal (verifiable / LLM-as-judge)

Every autonomous agent loop needs exactly these two components. The trigger starts it; the goal stops it. Verifiable goals produce more reliable loops.

Steal forAny time you want an AI agent to keep iterating on a task without manual re-prompting

CTA Breakdown

How they asked for the click.

VERBAL ASK

13:58link

“Go check out the loop library. I am gonna drop a link down below.”

Mentioned multiple times organically alongside each loop demo. Final CTA at the caveats section feels earned. Also promotes free consulting sessions mid-video.

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.

OTHER LINKSAlso linked in the description.