Big Idea

The argument in one line.

The shift from prompting AI agents to designing judge-gated loops that recurse until they pass is the single biggest leverage upgrade available to developers today, and a library of 45 ready-made loops makes that shift accessible in an afternoon.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use Claude Code or Codex daily and still manually check every result before prompting again.
You have heard loop engineering mentioned but could not map it to a concrete workflow or copy-paste prompt.
You maintain an active codebase and want automated sweeps for docs, tests, errors, or performance on a schedule.
You are a founder on a lean team who needs agents that can triage, fix, and open pull requests without babysitting.
You ship web products and want onboarding, accessibility, or CSS bloat caught automatically before users notice.

SKIP IF…

You want step-by-step implementation walkthroughs -- this is a survey video that reads each prompt, not a build-along.
You are already deep in loop engineering and want novel techniques beyond what is published on the library site.

TL;DR

The full version, fast.

The video draws a hard line between automations (sequential, single-pass, terminates) and loops (recursive, judge-gated, recurs until an eval passes). With that frame in place, Andy reads the full 45-loop inventory from Matthew Berman's Forward Future site, covering each loop's one-paragraph prompt and verify/stop condition. The loops span software engineering (doc drift, test coverage, flaky tests, performance), product quality (onboarding, accessibility, CSS trim), AI-on-AI workflows (adversarial Codex review, multi-LLM convergence, self-improving champion prompts), and creative work (thumbnail generation, product podcasts). The actionable conclusion: pick any loop whose verify/stop condition names a problem you already have, copy the prompt, paste it into your agent.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 02:39

01 · Intro: Automation vs. Loop

Steinberger tweet sets the stakes. Andy explains the only real difference between automation and loop: a judge that sends the process back.

02:39 – 06:48

02 · Loops 1-7: Engineering Foundations

Doc sweep, architecture satisfaction, sub-50ms page load, production error sweep, 100% test coverage, SEO/GEO visibility, logging coverage.

06:48 – 11:49

03 · Loops 8-15: Scheduled Maintenance

Nightly changelog, quality streak, full product evaluation, test-suite speed, repository cleanup, stale-safe batch release, production data cleanup, post-release baseline.

11:49 – 15:31

04 · Loops 16-20: Review and Coordination

Ticket-to-PR, customer AI deployment, product update podcast, Codex adversarial review, loop harness verification.

15:31 – 19:15

05 · Loops 21-25: Creative and Safety Tests

Boeing 747 3D benchmark, War Loops frontend reconstruction, self-improving champion, devil's advocate, fresh clone onboarding test.

19:15 – 22:49

06 · Loops 26-30: Tooling and Optimization

Infinite clickbait thumbnail, autonomy-loop builder-reviewer, Codex completion contract, Revolve versioned experiment, 5-minute repository maintainer.

22:49 – 26:56

07 · Loops 31-35: Audit and Alignment

Recent feedback sweep, promise-to-proof marketing audit, propagation compliance, multi-LLM convergence, Goal Forge planning workflow.

26:56 – 35:30

08 · Loops 36-45: UX, Performance, and Quality

UI/UX score, cold-load trimmer, pixel-safe CSS trim, easy onboarding, accessibility repair, housekeeper, Axelrod subagent arena, prepare-a-new-project, test stabilizer, artifact-to-skill.

35:30 – 36:31

09 · Outro

Subscribe ask, Skool community plug, next video card.

Atomic Insights

Lines worth screenshotting.

A loop is not an automation -- the only thing that makes it a loop is a judge at the end that can send the agent back.
Your new job as an AI builder is writing checklists and eval criteria, not writing code.
Every loop in the library has a verify/stop condition -- that condition is the entire value, not the prompt body.
The doc sweep loop detects documentation drift and opens a pull request automatically -- no human needs to notice the gap first.
The 100% test coverage loop stops only when the full suite passes at 100%, so the AI hunts uncovered paths rather than you.
The multi-LLM convergence loop requires two genuinely different model families to approve the exact same version -- no model grades its own homework.
The devil's advocate loop forces every high-impact objection to be resolved with evidence before you build, not after.
The fresh clone loop proves onboarding actually works by rebuilding from scratch until a first-timer can follow the README without help.
The artifact-to-skill loop converts a one-time proven artifact into a reusable method validated on a second, fresh case.
The self-improving champion loop only promotes a challenger prompt when it wins on holdout cases it was never edited against.
The loop harness verification loop ships output only after a second independent Claude session confirms it -- never one agent approving its own work.
The goal forge loop produces SPEC.md and GOAL.md before any code runs, making completion criteria explicit before execution starts.
The Boeing 747 benchmark tests visual judgment by having an agent build a Three.js 3D model and self-correct against nine fixed camera angles.
Community contributors have already expanded the library beyond the original set, making it a living open resource you can submit to.
The promise-to-proof loop audits every customer-facing marketing claim against current evidence and fixes or narrows the ones the product cannot back up.

Takeaway

Every loop lives or dies by its verify/stop condition.

WHAT TO LEARN

The prompt body of an agent loop is almost secondary -- what defines a real loop is the judge at the end that decides whether to ship or recurse.

01Intro: Automation vs. Loop

A loop is not an automation -- the only distinguishing feature is a judge that can restart the process on failure.
Writing checklists and eval criteria is the new hard skill; execution is delegated to the model.

02Loops 1-7: Engineering Foundations

The doc sweep and logging coverage loops are maintenance tasks you can schedule immediately on any existing codebase.
The sub-50ms page load loop uses the same benchmark on every run, making regression detection automatic and objective.

03Loops 8-15: Scheduled Maintenance

The nightly changelog loop removes the human bottleneck from keeping release notes current.
The quality streak loop only ships after a defined consecutive run of passes -- a streak requirement is a cheap forcing function for real stability.

04Loops 16-20: Review and Coordination

The Codex adversarial review loop treats one model as builder and another as adversarial reviewer -- separation of roles catches blind spots.
The loop harness verification loop is the safety primitive for any unattended agent: independent verification before any external change ships.

05Loops 21-25: Creative and Safety Tests

The Boeing 747 benchmark is a useful proxy for any agent task requiring multi-angle visual judgment, not just 3D modeling.
The fresh clone loop exposes unstated dependencies and manual workarounds teams have normalized -- the README must be the only instruction.
The devil's advocate loop is most valuable before major architectural commits, not during implementation.

06Loops 26-30: Tooling and Optimization

The self-improving champion loop prevents overfitting by requiring wins on holdout cases the prompt was never edited against.
The goal forge loop (SPEC.md + GOAL.md) is a portable planning primitive for any long-running coding agent session.

07Loops 31-35: Audit and Alignment

The promise-to-proof loop is a marketing audit in disguise: it finds the claims your product cannot actually back up.
The multi-LLM convergence pattern is the structural answer to the self-grading problem -- two families, same version, no edit between reviews.

08Loops 36-45: UX, Performance, and Quality

The pixel-safe CSS trim loop removes stylesheet bloat one rule at a time with pixel-identical screenshots as the regression gate.
The accessibility repair loop fixes confirmed barriers in priority order against an agreed standard -- not against an automated score.
The artifact-to-skill loop is how you stop redoing proven work: extract the method, validate it on a second fresh case.
The test stabilizer loop attacks flaky test root causes directly instead of papering over them with sleeps or retries.

Glossary

Terms worth knowing.

Loop: An agent workflow with a judge, checklist, or eval at the end that determines whether the result passes or the process should restart. Distinguished from an automation by its recursive, self-checking structure.
Automation: A workflow that runs a fixed sequence of steps to reach a predetermined end state without checking its own output for quality or correctness.
Verify/Stop condition: The explicit, measurable criterion that ends a loop. Defines what observable evidence proves the job is done correctly.
Loop Library: Matthew Berman's curated collection of 45 ready-made agent loop prompts hosted at Forward Future, each with a copy-paste prompt and verify/stop condition.
Judge-gated recursion: The pattern in which an agent's output is evaluated by an explicit criterion and the process restarts if the criterion is not met, continuing until it passes or a budget is exhausted.
Multi-LLM convergence: A review pattern requiring two AI systems from different model families to independently approve the exact same version of a document or code change before it is accepted.
Holdout cases: Test examples kept separate from those used during prompt iteration, used to evaluate whether a challenger prompt genuinely generalizes rather than overfitting to known inputs.
Forward Future: The platform hosting Matthew Berman's Loop Library, where anyone can browse, submit, and copy agent loop prompts.
Doc drift: The gradual divergence between a codebase's documentation and its actual current implementation, typically caused by code changes not reflected in docs.
Stale-safe batch release: A release pattern that audits pending changes before shipping, excludes incomplete or stale work, and combines only fully proven changes into a single release artifact.

Resources

Things they pointed at.

00:00linkForward Future Loop Library ↗

00:55channelPeter Steinberger (@steipete) ↗

35:55linkSkool AI Mate Community ↗

Quotables

Lines you could clip.

00:00

“You should not be prompting your coding agent anymore. You should be designing loops that prompt your agents.”

Strong contrarian opening, no setup needed, lands in under 10 seconds→ TikTok hook↗ Tweet quote

01:25

“A loop -- the key differentiator -- is that by the end of a process it has a judge, a checklist, a set of criteria, an eval... and then it goes back again.”

Clean standalone definition of a loop, works with zero context→ IG reel cold open↗ Tweet quote

02:10

“Your new job is to write these loops and checklists. That's the new hard part.”

Tight two-sentence reframe of what AI-era developer skill looks like→ newsletter pull-quote↗ Tweet quote

36:15

“We're entering a new era of prompting where it's more about loop engineering and I'm all for it.”

Clean outro pull quote, optimistic framing, shareable sentiment→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

00:00Matthew Berman just launched Loop Library, a curated list of agent loops you can use right now. But if you've been hearing about loops, maybe it has gone a bit over your head as it has mine.

00:14This tweet from two weeks ago from Peter Steinberger, the creator of Open Claw said, here is your monthly reminder that you shouldn't be prompting coding agent anymore.

00:26You should be designing loops that prompt your agents. But I was still going like, what do you mean?

00:33Loops? I've heard of goals and loops and schedules and automations, but to me, I haven't been able to get to the point of loops being at the level of these people, including the creator.

00:48I don't prompt quad anymore. I have loops that are running. They're the ones that are prompting quad and kinda figuring out what to do.

00:54My job is to write loops. So I feel a little bit left behind, like there's an inner circle of people that know how to run loops and the people that do the exact same as you and me where we basically prompt it, and then we check the result, and then we prompt it again, and then we check the result and back and forth, back and forth, that should now completely, what they're saying, be automated by AI.

01:19That's why we're checking out the loop library today. But there's a big misconception that you might have that an automation is the exact same as a loop.

01:30So before we begin, there's a very clear distinction between an automation and what is a loop. An automation runs step by step by step to reach an end destination.

01:41A loop, the key differentiator and the only thing really is that by the end of a process, it has a judge, a checklist, a set of criteria, an eval, some measurable or undeterministic results as well that can be interpreted by AI, and then it goes back again, and it keeps going and checking.

02:10And that's the main difference that it checks and checks and checks until something is correct. So your new job is to write these loops and checklists.

02:22That's the new hard part. So now that you have an AI college degree worth of knowledge on this, let's get into the prompts and think about it from the framework of how can you take inspiration from these loops starting with loop number one, the doc sweep, a reusable AI coding agent workflow for comparing documentation with the current code base, fixing Drift and opening a reviewable pull request.

02:51And as you can see the end state is that documentation matches the current implementation finish with a reviewable pull request.

03:00So this is for people that want to keep their documentation up to date. Loop number two, the architecture satisfaction loop.

03:09A bounded refactoring workflow that live tests the system, runs an independent review, commits checkpoints and records progress.

03:18The verify and stop for this is the architecture is satisfactory and checks pass.

03:24Live tests, auto review and commit each significant step. So this is for developers that want to clean up their code which if you're using Claude code or AI it's heavily recommended.

03:37Loop number three: The sub fifty millisecond page load loop. A performance optimization workflow for coding agents that uses one repeatable benchmarks and stops only when a target page meets the threshold.

03:51The verify and stop is every page loads in under fifty milliseconds. Use the same benchmark and confirm there are no regressions. So this is a simple one, page load speed.

04:03If you haven't tried this, it's insane. Loop number four, the production error sweep. A scheduled production log workflow that traces actionable errors to root causes, verifies fixes, opens a pull request and stops cleanly when no action is needed.

04:20The verify end stop here is actionable production errors are fixed and verified, finish with a pull request or stop when no actionable errors are present.

04:31So this is for developers who want fewer production fires so you can turn real log errors into verified fixes automatically and skip the work when there's nothing to fix. Loop five: The 100% test coverage loop A goal based coding agent workflow that identifies uncovered behavior, adds meaningful tests and stops when the full suite passes at 100% coverage.

04:58As you can see, the loop is very small and the verify is that it passes 100 coverage.

05:06So this is for developers who want a real safety net so you can drive your code base to 100% test coverage with meaningful tests instead of chasing the number by hand. Loop number six, the SEO geo visibility loop, a repeatable search visibility workflow that fixes the highest impact crawl, indexation, page indent, citation and answer readiness gaps first.

05:31So as you can see this is visibility and the verify stop is priority pages are indexable, answer ready and technically sound. The repeatable crawl and query benchmark finds no remaining high impact gap. So this is for founders and creators who want to show up in both Google and AI Answers.

05:51So you just run this one repeatable loop that fixes your highest impact visibility gaps instead of guessing what to optimize next. Loop seven: The Logging Coverage Loop A goal based observability workflow that audits important paths, add useful structured logs and verifies success and failure events with tests.

06:13The verify stop is every important path emits useful tested logs. Representative success and failure tests prove coverage without exposing sensitive data. So this is for developers that don't want to debug blind but make sure that every code path produces useful tested logs to make every event in the future easier.

06:36So that's all of this video. Thanks so much for watching. Just kidding.

06:40We got a lot more. Are you learning anything so far? If yes, click the subscribe button.

06:45And thanks to Matthew for putting this all together. Loop number eight, the nightly changelog loop, a scheduled coding agent workflow that reviews the previous days, changes and keeps user facing release history complete and current.

07:01Again a very short prompt where the VERIFY stop is: Every user relevant change from the previous day is accounted for, the change log is updated and validated or the no change result is recorded. This one seems like just a no brainer.

07:20So this is for teams who hate wiring change logs so you can keep yours current with a nightly auto update of what users need to know. Loop nine The Quality Streak Loop A realistic product testing workflow that turns every failure into documented regression coverage and restarts the success streak after each fix.

07:41The verify and stop here is the latest N realistic cases pass in a row. Every earlier failure is documented, fixed and protected by regression and benchmark coverage.

07:54So this is for teams who need quality to actually hold so you can turn every failure into permanent regression coverage and only ship after a real streak of passes.

08:06Loop 10: The Full Product Evaluation Loop A comprehensive product quality workflow that evaluates realistic scenarios across every major capability, fixes weak outcomes and reruns them to the defined bar.

08:23The verify is that every one of the end scenarios meets the defined quality bar. The final evaluated run covers every major capability under the original conditions. So this is for teams who shipping a real product so you can prove quality across every major capability at once instead of trusting a few hand picked tests.

08:45And quick side note, this one is featured on his website highly recommended by Matt. Loop 11: The Test Suite Speed Loop A performance workflow for reducing test run time under repeatable conditions without weakening coverage, assertions, isolation, or behaviour.

09:05The verify stop here is that the suite is faster with no coverage or behaviour regression. Repeatable timing, the full passing suite and the original coverage report prove the result. So this is for developers stuck waiting on slow tests so you can cut suite run times without losing any coverage or changing behavior.

09:27Loop number 11, the repository cleanup loop. A repository hygiene workflow that audits branches, pull requests, commits and work trees, recovers valuable changes and removes proven stale state.

09:42The end state is valuable work is recovered and remaining repository state is intentional. Branches, pull requests, commits and work trees are current, owned or safely removed with evidence. So this is for developers drowning in stale branches and work trees so you can clean up the clutter without losing any valuable unmerged work.

10:03Loop 13 The stale safe batch release loop. A release coordination workflow that executes stale or unfinished work, combines valid changes and ships complete artifacts from the latest integrated main.

10:20The verify stop is only current. Complete changes ship in the combined release. The release revision is the latest integrated main that contains every selected change.

10:32So this is for you if you're shipping several changes at once so you can batch only the complete current work into the release and keep stale or unfinished code out. Loop number 14, the production data cleanup loop.

10:48A production data quality workflow that removes disallowed records, improves classification logic and verifies the remaining dataset against an explicit definition. The verify stop is every remaining record meets the allowed definition.

11:04Representative classification test and a post cleanup audit prove the retained data is valid. So this is for you if you have messy production data so you can purge records that don't meet your rules and fix the classifier that lets them in.

11:19Loop 15, the post release baseline loop, a triggered release workflow that runs standard benchmarks against the completed release and records a reproducible baseline for future comparisons. As you can see the short prompt says after current releases finish, run the standard benchmarks and record the results as the new baseline.

11:44So the verify stop is that the new baseline belongs to the completed release. Loop number 16, the Ticket to PR Ready loop. A bounded engineering workflow that turns a ticket, failing behavior or customer complaint into a proven root cause, minimal patch, and reviewer ready handoff.

12:06We're now starting to see new contributions by the community it seems like. And the verify stop here is that the failure is fixed, verified and ready for review.

12:18The issue reproduces before the fix, no longer reproduces afterward and relevant regression checks pass. So this is for developers handed a vague bug ticket so you can turn it into a reproduced root cause minimal fix that's ready for review.

12:35Loop 17: The Customer AI Deployment Loop A supervised delivery workflow that advances one customer priority into a validated, gradually released AI system with monitoring, approvals, and outcome evidence.

12:52The verify stop is one customer priority reaches a proven terminal state. The workflow reaches its agreed rollout stage, a production issue is fixed, or a blocker is escalated with an owner and next step.

13:09So this is for teams shipping AI into real customer workflows so you can take one priority from idea to monitored production with approvals and ROI evidence at every stage.

13:22Loop 18: The Product Update Podcast Loop A scheduled editorial workflow that turns meaningful public product changes into a short source grounded podcast episode.

13:36The end state is that the episode accurately covers every meaningful public update and you can see that it triggers each night.

13:45Finish with a review ready three-five minute episode or a confirmed no episode result when nothing meaningful shifts.

13:54So this is for you if you ship often so you can turn each day's meaningful product updates into a short source verified podcast episode users can actually follow. Loop 19 The Clodex Adversarial Review Loop A Claude and Codex workflow that opens a pull request, runs an independent Codex review, fixes blocking, findings and repeats.

14:20As you can see the end state is the pull request reaches the configured review bar. Codex approves it or only explicitly accepted findings remain errors, stalls and exhausted limits are reported as such.

14:36So this is for developers who want a second set of eyes on every change. So you can have Claude build while Codex adversarially reviews each round until the PR truly passes.

14:49Loop 20 The Loop Harness Verification Loop A scheduled loop harness workflow that runs Claude in an isolated work tree and ships staged output only after a second Claude session verifies it. The verify stop is only independently verified outputs shipped.

15:08A second agent pass releases the configured output. A failed verification preserves evidence and produces no external change.

15:19So this is for you if you're running agents unattended so you can let scheduled repo work ship only after a second cloud session independently verifies it. Never one agent approving its own output.

15:31Loop 21: The Boeing seven forty seven Benchmark A vision benchmark in which the agent builds a Boeing seven forty seven from a 3JS primitives renders nine repeatable angles and fixes what each view reveals.

15:47So the end state here is the Boeing seven forty seven meets the visual bar from all nine angles. The same camera rig and rubric show every required view meeting the present threshold. Or the run reports stagnation, budget exhaustion and remaining gaps.

16:07So this is for you if you're testing an agent's visual judgement. So you can have it build a three d model and self correct against nine fixed angles instead of one flat ring hero shot. Loop number 22, Warloops Frontend Reconstruction.

16:23A Warloops workflow that captures a real page, builds a static pencil mirror and moving forge version then repairs the weakest fidelity signals. The verify stop is the builds match the source across all three fidelity axes.

16:43Static appearance, experimental motion and responsive reflow pass their gates or the run reports stagnation or a blocked capture.

16:54So this is for you if you're trying to rebuild a real interface from a URL so you can recreate its look, motion and responsive behavior and fix only the parts that don't match the source. Loop number 23, the self improving champion loop, a prompt optimization workflow that tests challengers on a working set, promotes only fresh holdout wins and keep the current champions on uncertainty.

17:22The verify stopping, the best holdout tested champion is returned. Every challenger is logged and accepted. Changes beat the previous champion on untouched cases without weakening a must pass check.

17:36So this is for you if you're tuning prompts or policies so you can promote only the changes that beat your current best on fresh holdout cases instead of overfitting to the examples you edit it against.

17:51Loop number 24, the devil's advocate loop, a critic and builder workflow that attacks a design, tracks every objection, and requires evidence before an objection can be closed.

18:04The verify stop is no high impact objection remains open. Every logged objection is verified as resolved or explicitly accepted with evidence or the final report truthfully records a two round stalemate. So this is for you if you're about to commit to a big design decision so you can have a critic attack it and force every serious objection to be resolved with evidence before you build.

18:32Loop 25: The Fresh Clone Loop A disposable environment workflow that follows the readme from scratch, fixes every hidden setup assumption and restarts until onboarding works cleanly.

18:47The verify stop is a clean environment reaches the documented ready state using only the readme. The final run uses only the onboarding guide and needs no unstated dependency, configuration or manual repair.

19:03So this is for developers that want to prove onboarding actually works by rebuilding from a clean code until a first timer could follow it without help.

19:15Loop 26: The Infinite Clickbait Thumbnail Loop A thumbnail workflow that creates 10 concepts, scores the top three against relevant YouTube channel and improves the winner without misleading viewers.

19:30The verify stop is when accurate thumbnail clears the fixed quality threshold so the winner outscores the alternatives under the same conditions, remains legible at realistic sizes and represents the video accurately.

19:46So this is for YouTube creators who want thumbnails that actually get clicks so you can generate score and refine 10 concepts down to the one winner that's legible and honest about the video. Loop number 27, the autonomy loop builder review loop.

20:04An autonomy loop workflow in which a builder and adversarial reviewer pass a git baton between work trees and prove each new test can catch its fix. The verify stop is every accepted wave passes autonomy loops, proof and test gate.

20:23The new test fails with the change passes it. Every configured gate passes and protected production changes remain human gated.

20:32Loop 28: The Codex Completion Contract Loop. A goal planner codex workflow that defines completion upfront, tracks proof for every requirement and prevents partial codex work from being reported as done.

20:48As you can see every codex goal requirement has current adequate proof. The final audit contains no weak, missing or contradicted required item.

20:59Otherwise the work remains open, blocked or exhausted. Loop 29: The Revolve Versioned Experiment Loop A Revolve workflow that improves prompts, code and configurations through checkpointed experiments whose scores remained comparable across sessions.

21:19As you can see, the best revolved checkpoints wins within one evaluation revision.

21:26The incumbent and candidates have comparable recorded runs.

21:31Accepted changes pass every guard, rollback is available and live promotion has approval. So this is for teams running long term optimization experiments so you can improve a prompt or code path through checkpointed rounds whose score stay comparable and reversible across sessions.

21:53Loop number 30, the five minute repository maintainer loop. A five minute codex workflow that triaches repositories, directs outbound maintenance to dedicated threads and requires proof and permission before work lands.

22:11As you can see the verify stop is: Every repository item reaches a proven handoff or terminal state. Authorized automations work lands with evidence. Other items are decision ready, blocked with one exact ask or recorded as a clean no op.

22:33Once again from Peter Steinberger. So this is for you if you maintain several active repos so you can have an agent triage and advance one bounded task per repo on a heartbeat with proof and permission required before anything lands.

22:49Loop 31: The Recent Feedback sweep A project audit that turns recent user reported problems into reusable failure patterns, fixes every confirmed match and verifies a clean final sweep.

23:04The verify stop is, the issue inventory is closed and a fresh pattern audit is clean. Every reported issue and newly found match has current proof of resolution. Blocked approval gate or budget exhausted items remain explicitly open.

23:22So this is for you if you're sitting on scattered bug reports so you can turn recent complaints into reusable failure patterns and hunt down every sibling defect across the whole project so not just the one that got reported. Loop 32: The Promise to Proof Loop A product review that compares claims in marketing, documentation, demos and AI answers with current evidence then fixes or narrows unsupported promises.

23:53The verify stop is every high risk customer promise is supported, narrowed or waiting on an explicit decision. Each promise links to current evidence.

24:05And every high risk mismatch is fixed, narrowed to what the product can prove or clearly approval gated.

24:14So this is for teams whose marketing may have outrun the product. So you can check every customer facing promise against real evidence and fix or narrow the ones your product can't actually back up.

24:28Loop number 33, the propagation compliance loop.

24:33A consistency check for values copied across a code project. Update every affected copy, find leftovers and prove that only intentional old references remain.

24:46So the verify stop here is no unintended copy of old value remains. The final searches find only references that are intentionally historical or required for examples: migrations or compatibility with a reason recorded for each one.

25:05So this is for you if you change a value and it lives in dozens of places so you can update every copy and prove no stale version is left except the ones that you meant to keep. Loop number 34, the MultiLLM Convergence Loop alternates two AI systems from different providers to review a plan, document, or code change until both approve the exact same version.

25:31The verify stop is two different AI model families approve the exact same version. The final two clean reviews come from different model families with no edit between them: a pass limit, repeating disagreement, unavailable reviewer or approval boundary is reported as a stall instead of consensus.

25:51So this is for you if you don't want one model grading its own homework. So you can have AI systems for two different providers review until both approve the exact same version. Loop number 35, the Goal Forge Loop.

26:06A planning workflow that interviews the user, writes what should be built in spec. Md and writes how codecs should execute and verify it in gold.

26:18Md. The verify stop is the planning files say what to build, how to judge it and when to stop. Everyone done when completion check names, observable evidence, the quick and final checks can actually run, the environment is ready and unresolved decisions are clearly marked not ready.

26:40So this is for you if you want to go from vague idea to a long running coding agent. So you can pin down the scope, completion checks and safety boundaries up front instead of letting the agent guess what that means before you write the goal.

26:56Loop number 36, the UIUX score loop.

27:00A browser based review that completes a real user task, scores each meaningful screen with the same checklist, improves weak spots and retests the whole task.

27:11The verify stop being the complete user task scores better without making another important screen worse. The final dashboard shows the same entry point, fresh browser state, screen sizes, modes, scoring rubric, screenshots, score changes and stop reason for every retained improvement.

27:34So this is for teams whose sign up or checkout flow feels off so you can score every screen of a real user task against one rubric and fix the weakest spots without breaking the rest.

27:46Loop number 37, the cold load trimmer loop. A web performance workflow that reduces the data downloaded before the first screen appears.

27:56While tests and screenshots protect behavior and appearance. The verify stop is the first screen downloads less data without a tested behavior or pixel changing.

28:07The same production like measurements report fewer downloaded bytes, existing tests pass, every representative screenshot is pixel identical and the uncertain dependency removal remains approval gated.

28:22So this is for you if your app feels heavy on first load. So you can trim down the bytes downloaded before the first screen appears while tests and pixel identical screenshots guarantee nothing breaks.

28:34Loop number 38, the pixel safe CSS trim loop, a stylesheet cleanup workflow that removes one piece of unused or redundant CSS at a time and keeps it removed only when every test screen looks identical.

28:51The verify stop is the delivered stylesheet is smaller while every tested screen remains pixel identical. The same project checks and screenshots pass after each retained deletion. The build CSS file sent to users is smaller and untested browsers, screens or interactions remain explicit risks.

29:15So this is for developers that's sitting on bloated style sheets so you can strip out unused CSS one rule at a time while pixel identical screenshots prove every screen still look the same.

29:28Loop number 39 is the easy onboarding loop. A first time user test that starts with no saved account or browser state fixes one confirmed onboarding obstacle and retries the entire experience.

29:45The verify stop is a first time user can complete onboarding in one uninterpreted clean session. The full experience succeeds from the real starting point without the saved browser state, secret setup, guest routes or manual repairs and every real requirement remains intact.

30:06So this is for you if your onboarding makes sense only to people who already know it so you can test it as a true first time user from a clean session and fix what trips newcomers up until one pass succeeds. Loop 40: The Accessibility Repair Loop An accessibility review that confirms barriers against an agreed standard, fixes the issue with the greatest user impact, and repeats the same checks.

30:34The end state being: No confirmed accessibility barrier remains in the agreed pages, components, or user tasks.

30:43The same automated scans, available manual checks, affected user tasks and regression tests pass after each refined fix without lowering the chosen accessibility standard. So this is for you if you need the product to work for everyone You can confirm real accessibility barriers against the standard like WCAG and fix the highest impact ones first instead of chasing an automated score.

31:10Loop number 41, the housekeeper loop. A conservative code project cleanup that proves one small opportunity is safe, makes the smallest useful change and keeps it only after existing checks pass.

31:24The Verify Stop being null confirmed low risk cleanup remains and existing behaviour still passes. Every retained cleanup is supported by direct evidence, relevant builds and tests pass, the application still runs where applicable, unrelated work is untouched and uncertain candidates are deferred rather than deleted.

31:46So this is for developers whose project has collected dead code and clutter. So you can clean up one proven safe item at a time while existing tests guarantee you never delete something that's actually in use.

31:59Loop 42: The AxelRob sub agent arena loop.

32:04A controlled tournament where two reasoning AI agents repeatedly choose to cooperate or defect then are compared with players that always make one choice.

32:17The verify being all 18 matches and 180 rounds can be reproduced from the recorded moves and fixed scoring rules.

32:28Each agent chooses before seeing the opponent's move. Every move is recorded before scoring. Totals reproduce from the full history.

32:36Invalid responses are logged and any partial or invalid tournament remains explicitly incomplete. So this is for researchers curious about how AI agents behave in repeated games.

32:51So you can run a controlled cooperate or defect tournament that reveals whether they retaliate, forgive, or exploit with every move auditable.

33:01Loop number 43, the prepare a new project loop. A planning workflow that closes documentation gaps until requirements, technical design, acceptance criteria and test strategy describe one buildable system.

33:16The VerifyStop is two independent reviewers derive substantially the same build from the project documents.

33:24Their descriptions agree on the components, data model, dependencies and the definition of done and every required artifact is specific, consistent, traceable and testable.

33:36So this is for teams about to build from rough project docs. So you can close every gap and contradiction until two independent engineers would build the same system instead of three different ones.

33:50Loop 44 The test stabilizer loop.

33:53A flaky test repair workflow that measures inconsistent results, fixes one root cause at a time and stops after a defined streak of stable full suite runs.

34:06The verify stop being the full test suite passes for the required consecutive run streak. The repaired test passes repeatedly and consecutive full suite runs are green under the recorded conditions and no blind sleep or retry hides an unresolved cause.

34:25So this is for developers played by tests that pass one run and fail the next so you can fix each flake at its real root cause instead of papering over it with sleeps and retries until the suite runs green every time. Loop 45: The Artifact to Skill Loop A reusable workflow for turning one proven artifact into a transferable skill, playbook or procedure and validating it on a second case.

34:55The end state being the extracted method succeeds on a fresh second case without the original artifact. An independent reviewer applies the reusable version under criteria defined before extraction and the second result meets the source artifacts demonstrated quality bar or the method is honestly marked provisional or not generalizable.

35:17This is for people who keep redoing work they've already done once so you can build an exact and repeatable method behind a proven artifact into a reusable skill and confirm it works on a fresh case. So those were all the current 45 loops on the loop library. I'm definitely taking some of these away but more importantly given me inspiration to make my own loops better.

35:43We're entering to a new era of prompting where it's more about loop engineering and I'm all for it.

35:51But honestly it still goes a little bit over my head but that's why we keep learning. So drop your favorite loop prompt in the comments below.

36:01Check out our school community in the link in the description where we just made a seven day build and sell your first web app challenge with Claude Code, and I'll give away my loops once I got them nailed down and maybe even submit them to the forward future website. Once again, thanks to Matt for bringing all these together.

36:22Subscribe to the channel and if you didn't check out last video, it's on the screen right now. Tap it and I'll see you tomorrow at lunch.

The Hook

The bait, then the rug-pull.

A tweet from Peter Steinberger -- stop prompting your coding agent, start designing loops that prompt your agents for you -- is the real opening act. Andy admits he felt left behind by that idea, and that honest confession turns a 36-minute catalog video into something more: a public catching-up session for everyone still on the manual prompt-check-prompt treadmill.

Frameworks

Named ideas worth stealing.

01:25concept

Loop vs. Automation Distinction

Automations run step-by-step to a fixed end. Loops have a judge/eval at the end that can restart the process. The verify/stop condition is what makes something a loop.

Steal forAny time you explain to a client or team why your agent workflow is more robust than a simple script

02:39list

Loop Library Taxonomy

Engineering (docs, tests, performance, errors)
Scheduled maintenance (changelog, baseline, data cleanup)
Review/coordination (PR readiness, adversarial review, multi-LLM)
Creative benchmarks (thumbnail, 3D visual judgment)
UX/performance (onboarding, accessibility, CSS trim, load speed)
Audit/alignment (promise-to-proof, propagation compliance, feedback sweep)

The 45 loops naturally cluster into six functional areas. Picking the right cluster for a given pain point is faster than reading all 45.

Steal forOrganizing your own agent loop library or pitching AI automation to a team

01:25model

Judge-Gated Recursion Pattern

Every loop prompt contains: (1) what to do, (2) what to measure, (3) when to stop, (4) what a clean exit vs. a stall looks like. The four-part structure is the portable pattern.

Steal forWriting your own loops from scratch once you understand the structure

CTA Breakdown