The argument in one line.
The shift from prompting AI agents to designing judge-gated loops that recurse until they pass is the single biggest leverage upgrade available to developers today, and a library of 45 ready-made loops makes that shift accessible in an afternoon.
Read if. Skip if.
- You use Claude Code or Codex daily and still manually check every result before prompting again.
- You have heard loop engineering mentioned but could not map it to a concrete workflow or copy-paste prompt.
- You maintain an active codebase and want automated sweeps for docs, tests, errors, or performance on a schedule.
- You are a founder on a lean team who needs agents that can triage, fix, and open pull requests without babysitting.
- You ship web products and want onboarding, accessibility, or CSS bloat caught automatically before users notice.
- You want step-by-step implementation walkthroughs -- this is a survey video that reads each prompt, not a build-along.
- You are already deep in loop engineering and want novel techniques beyond what is published on the library site.
The full version, fast.
The video draws a hard line between automations (sequential, single-pass, terminates) and loops (recursive, judge-gated, recurs until an eval passes). With that frame in place, Andy reads the full 45-loop inventory from Matthew Berman's Forward Future site, covering each loop's one-paragraph prompt and verify/stop condition. The loops span software engineering (doc drift, test coverage, flaky tests, performance), product quality (onboarding, accessibility, CSS trim), AI-on-AI workflows (adversarial Codex review, multi-LLM convergence, self-improving champion prompts), and creative work (thumbnail generation, product podcasts). The actionable conclusion: pick any loop whose verify/stop condition names a problem you already have, copy the prompt, paste it into your agent.
Chat with this breakdown — free.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →Where the time goes.

01 · Intro: Automation vs. Loop
Steinberger tweet sets the stakes. Andy explains the only real difference between automation and loop: a judge that sends the process back.

02 · Loops 1-7: Engineering Foundations
Doc sweep, architecture satisfaction, sub-50ms page load, production error sweep, 100% test coverage, SEO/GEO visibility, logging coverage.

03 · Loops 8-15: Scheduled Maintenance
Nightly changelog, quality streak, full product evaluation, test-suite speed, repository cleanup, stale-safe batch release, production data cleanup, post-release baseline.

04 · Loops 16-20: Review and Coordination
Ticket-to-PR, customer AI deployment, product update podcast, Codex adversarial review, loop harness verification.

05 · Loops 21-25: Creative and Safety Tests
Boeing 747 3D benchmark, War Loops frontend reconstruction, self-improving champion, devil's advocate, fresh clone onboarding test.

06 · Loops 26-30: Tooling and Optimization
Infinite clickbait thumbnail, autonomy-loop builder-reviewer, Codex completion contract, Revolve versioned experiment, 5-minute repository maintainer.

07 · Loops 31-35: Audit and Alignment
Recent feedback sweep, promise-to-proof marketing audit, propagation compliance, multi-LLM convergence, Goal Forge planning workflow.

08 · Loops 36-45: UX, Performance, and Quality
UI/UX score, cold-load trimmer, pixel-safe CSS trim, easy onboarding, accessibility repair, housekeeper, Axelrod subagent arena, prepare-a-new-project, test stabilizer, artifact-to-skill.

09 · Outro
Subscribe ask, Skool community plug, next video card.
Lines worth screenshotting.
- A loop is not an automation -- the only thing that makes it a loop is a judge at the end that can send the agent back.
- Your new job as an AI builder is writing checklists and eval criteria, not writing code.
- Every loop in the library has a verify/stop condition -- that condition is the entire value, not the prompt body.
- The doc sweep loop detects documentation drift and opens a pull request automatically -- no human needs to notice the gap first.
- The 100% test coverage loop stops only when the full suite passes at 100%, so the AI hunts uncovered paths rather than you.
- The multi-LLM convergence loop requires two genuinely different model families to approve the exact same version -- no model grades its own homework.
- The devil's advocate loop forces every high-impact objection to be resolved with evidence before you build, not after.
- The fresh clone loop proves onboarding actually works by rebuilding from scratch until a first-timer can follow the README without help.
- The artifact-to-skill loop converts a one-time proven artifact into a reusable method validated on a second, fresh case.
- The self-improving champion loop only promotes a challenger prompt when it wins on holdout cases it was never edited against.
- The loop harness verification loop ships output only after a second independent Claude session confirms it -- never one agent approving its own work.
- The goal forge loop produces SPEC.md and GOAL.md before any code runs, making completion criteria explicit before execution starts.
- The Boeing 747 benchmark tests visual judgment by having an agent build a Three.js 3D model and self-correct against nine fixed camera angles.
- Community contributors have already expanded the library beyond the original set, making it a living open resource you can submit to.
- The promise-to-proof loop audits every customer-facing marketing claim against current evidence and fixes or narrows the ones the product cannot back up.
Every loop lives or dies by its verify/stop condition.
The prompt body of an agent loop is almost secondary -- what defines a real loop is the judge at the end that decides whether to ship or recurse.
- A loop is not an automation -- the only distinguishing feature is a judge that can restart the process on failure.
- Writing checklists and eval criteria is the new hard skill; execution is delegated to the model.
- The doc sweep and logging coverage loops are maintenance tasks you can schedule immediately on any existing codebase.
- The sub-50ms page load loop uses the same benchmark on every run, making regression detection automatic and objective.
- The nightly changelog loop removes the human bottleneck from keeping release notes current.
- The quality streak loop only ships after a defined consecutive run of passes -- a streak requirement is a cheap forcing function for real stability.
- The Codex adversarial review loop treats one model as builder and another as adversarial reviewer -- separation of roles catches blind spots.
- The loop harness verification loop is the safety primitive for any unattended agent: independent verification before any external change ships.
- The Boeing 747 benchmark is a useful proxy for any agent task requiring multi-angle visual judgment, not just 3D modeling.
- The fresh clone loop exposes unstated dependencies and manual workarounds teams have normalized -- the README must be the only instruction.
- The devil's advocate loop is most valuable before major architectural commits, not during implementation.
- The self-improving champion loop prevents overfitting by requiring wins on holdout cases the prompt was never edited against.
- The goal forge loop (SPEC.md + GOAL.md) is a portable planning primitive for any long-running coding agent session.
- The promise-to-proof loop is a marketing audit in disguise: it finds the claims your product cannot actually back up.
- The multi-LLM convergence pattern is the structural answer to the self-grading problem -- two families, same version, no edit between reviews.
- The pixel-safe CSS trim loop removes stylesheet bloat one rule at a time with pixel-identical screenshots as the regression gate.
- The accessibility repair loop fixes confirmed barriers in priority order against an agreed standard -- not against an automated score.
- The artifact-to-skill loop is how you stop redoing proven work: extract the method, validate it on a second fresh case.
- The test stabilizer loop attacks flaky test root causes directly instead of papering over them with sleeps or retries.
Terms worth knowing.
- Loop
- An agent workflow with a judge, checklist, or eval at the end that determines whether the result passes or the process should restart. Distinguished from an automation by its recursive, self-checking structure.
- Automation
- A workflow that runs a fixed sequence of steps to reach a predetermined end state without checking its own output for quality or correctness.
- Verify/Stop condition
- The explicit, measurable criterion that ends a loop. Defines what observable evidence proves the job is done correctly.
- Loop Library
- Matthew Berman's curated collection of 45 ready-made agent loop prompts hosted at Forward Future, each with a copy-paste prompt and verify/stop condition.
- Judge-gated recursion
- The pattern in which an agent's output is evaluated by an explicit criterion and the process restarts if the criterion is not met, continuing until it passes or a budget is exhausted.
- Multi-LLM convergence
- A review pattern requiring two AI systems from different model families to independently approve the exact same version of a document or code change before it is accepted.
- Holdout cases
- Test examples kept separate from those used during prompt iteration, used to evaluate whether a challenger prompt genuinely generalizes rather than overfitting to known inputs.
- Forward Future
- The platform hosting Matthew Berman's Loop Library, where anyone can browse, submit, and copy agent loop prompts.
- Doc drift
- The gradual divergence between a codebase's documentation and its actual current implementation, typically caused by code changes not reflected in docs.
- Stale-safe batch release
- A release pattern that audits pending changes before shipping, excludes incomplete or stale work, and combines only fully proven changes into a single release artifact.
Things they pointed at.
Lines you could clip.
“You should not be prompting your coding agent anymore. You should be designing loops that prompt your agents.”
“A loop -- the key differentiator -- is that by the end of a process it has a judge, a checklist, a set of criteria, an eval... and then it goes back again.”
“Your new job is to write these loops and checklists. That's the new hard part.”
“We're entering a new era of prompting where it's more about loop engineering and I'm all for it.”
Word for word.
Don't just watch it. Burn it in.
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
The bait, then the rug-pull.
A tweet from Peter Steinberger -- stop prompting your coding agent, start designing loops that prompt your agents for you -- is the real opening act. Andy admits he felt left behind by that idea, and that honest confession turns a 36-minute catalog video into something more: a public catching-up session for everyone still on the manual prompt-check-prompt treadmill.
Named ideas worth stealing.
Loop vs. Automation Distinction
Automations run step-by-step to a fixed end. Loops have a judge/eval at the end that can restart the process. The verify/stop condition is what makes something a loop.
Loop Library Taxonomy
- Engineering (docs, tests, performance, errors)
- Scheduled maintenance (changelog, baseline, data cleanup)
- Review/coordination (PR readiness, adversarial review, multi-LLM)
- Creative benchmarks (thumbnail, 3D visual judgment)
- UX/performance (onboarding, accessibility, CSS trim, load speed)
- Audit/alignment (promise-to-proof, propagation compliance, feedback sweep)
The 45 loops naturally cluster into six functional areas. Picking the right cluster for a given pain point is faster than reading all 45.
Judge-Gated Recursion Pattern
Every loop prompt contains: (1) what to do, (2) what to measure, (3) when to stop, (4) what a clean exit vs. a stall looks like. The four-part structure is the portable pattern.
How they asked for the click.
“Subscribe to the channel and if you did not check out last video, it's on the screen right now.”
Standard end-card CTA with next video card. Secondary ask: join Skool community for 7-day Claude Code challenge. Low pressure, well-paced.







































































