Why Modern Creator?

Theo - t3․gg · YouTube

I guess we're writing loops now

A 25-minute case for letting agents run their own loops — and how one 2:29 AM prompt produced four merged PRs by morning.

Posted

June 18th

1 months ago

Duration

24:44

Format

Talking Head

educational

Views

66.3K

2.4K likes

Big Idea

The argument in one line.

The highest-leverage move in AI-assisted coding is not writing better prompts — it is designing loops where agents orchestrate other agents, close their own feedback cycles, and handle every step that used to require a human messenger.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

A developer already using Claude Code, Codex, or Cursor who handles multi-PR projects and finds themselves copy-pasting review comments back into the agent.
Someone on a $100–$200 flat-rate subscription plan who suspects they are not getting full value from the token budget.
Anyone who has tried sub-agents before but found the pre-defined-role approach (security reviewer, exploration agent) awkward and unproductive.
A solo builder running several code bases who wants context-fetching and review cycles to happen while they sleep.

SKIP IF…

You are using pay-per-token API pricing — the cost math here assumes a flat subscription plan, and the calculus is completely different at API rates.
You want a step-by-step tutorial on setting up a specific loop tool — this is a conceptual essay with real examples, not a how-to guide.
You are not yet comfortable delegating code changes to an agent without reading every line before it lands.

TL;DR

The full version, fast.

Most developers run agent loops manually: they prompt, read the output, copy it somewhere, paste it somewhere else, tell the agent what to do next. The insight here is that every one of those handoff steps can be included in the original prompt. The author's real example: a single late-night message asked the agent to (1) spin up a thread to make the PR, (2) spin up a reviewer when the PR landed, (3) keep the first thread in a loop addressing comments until approval, and (4) merge and trigger the next PR. Four stacked PRs were reviewed and merged by morning. The practical takeaway is to audit what you do after an agent finishes and ask whether the agent could do those steps too.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:03

01 · Pete's tweet + old approach

Opens with Pete Steinberger's viral tweet as a hook; host admits the memo didn't reach him and describes his old hand-holding workflow.

01:03 – 02:00

02 · The pivot

Started building more with loops — agents reviewing code, triggering re-reviews, watching PRs, using Hermes to bring context instead of hunting for it.

02:01 – 03:55

03 · Sponsor — Magic Patterns

AI design tool that works within your existing code base; design system selector, Figma import, multi-model support, frame testing.

03:55 – 04:37

04 · The tweet that actually got him to try loops

A Pete post about using Codex threads that spawn other Codex threads clicked in a way the earlier one didn't — specifically the orchestrator-skill pattern.

04:37 – 05:55

05 · Anthropic's recursive self-improvement arc

Uses Anthropic's published article to frame the historical progression: copy-paste → agent edits code → agent orchestrates agents.

05:55 – 07:17

06 · Pre-defined personas are the wrong abstraction

Rejects the trend of assigning role-based identities to sub-agents (security reviewer, exploration agent) — the point is dynamic context, not hard-coded roles.

07:17 – 09:32

07 · First real loop — PR monitor

Describes setting up Claude Code to SSH into another machine, monitor a PR, and address CodeRabbit/Greptile/Macroscope comments as they arrive — ran for 6+ hours autonomously.

09:32 – 11:06

08 · Lakebed isolate layer — the scope of work

5.5 performance analysis of Lakebed reveals data architecture problems beyond runtime; dependency-aware validation, mutation coalescing, subscription batching all needed.

11:06 – 12:27

09 · HTML plan + first thread

Model breaks work into 3 PRs, writes HTML plan files (a pattern credited to Thoric), and creates the first Codex thread. Host merges PR 1 then realizes he should loop harder.

12:27 – 13:44

10 · The big loop prompt — 4 stacked PRs overnight

Single message asks for a 4-step loop: spin PR thread -> reviewer thread -> address comments until approval -> merge and trigger next. Set at 2:29 AM, woke to 4 merged stacked PRs at 6:50 AM.

13:44 – 15:36

11 · Why dynamic loops beat static agile shapes

Contrasts the rigid agile sprint (work fits the shape) with dynamic loop design (shape fits the work). Extends to monitoring PRs, morning briefings, even a 5G hotspot deal-finder.

15:36 – 17:47

12 · Practical: audit what you do after the agent finishes

Concrete advice: list every step you take after the agent completes — run server, check it works, commit, push, file PR, copy review comments, address them — and hand each step to the agent.

17:47 – 18:50

13 · The spicier take — you are reading code too early

If you read agent code before another agent reviewed it, you are doing the agent's job. Let peer-agent review happen first; by the time a human looks, only the hard stuff remains.

18:50 – 21:05

14 · Cost reality + token math

3 million tokens for one Opus loop addressing three comments. But on the $200 plan: 5 deep loops across several days = 29% of weekly budget. June: $8,600 of inference across machines on ~$600 subscription.

21:05 – 24:43

15 · Wrap — treat limits as challenges

/goal primitive for single-thread never-ending tasks; Hermes Rust rewrite running 12+ hours. Unused subscription budget = money lost. Ask the agent to do the next step.

Atomic Insights

Lines worth screenshotting.

The loop you are running manually — read output, copy it, paste it, tell the agent what is next — is itself a prompt the agent can run for you.
Pre-assigning personas to sub-agents (security reviewer, exploration agent) misses the point: the agent's value is building the context it needs dynamically, not following a role you wrote in advance.
A PR monitor loop that watches for CodeRabbit and Greptile comments and addresses them on arrival requires no custom tooling — stock Claude Code with a work-tree-per-PR is enough.
One loop prompt can produce loops inside itself: the orchestrator spawns an implementation thread and a reviewer thread, each of which runs its own sub-loop.
Token cost scales with loop depth, but on a flat subscription plan the risk is not money — it is undetected wrong-direction work that burns many hours before a human checks in.
Agents write better review cycles than humans reviewing raw diffs: by the time you read the code, another agent has already caught the obvious problems and the only thing left is the genuinely hard judgment calls.
The /goal primitive keeps a single thread running indefinitely on one task and is qualitatively different from a dynamic workflow — it is a linear never-ending loop, not a branching orchestration.
The static agile sprint forces work to fit a predetermined shape; agent-designed loops let the shape of the work determine the shape of the loop.
On the $200 Claude Code plan, five concurrent deep loops running over several days consumed only 29% of the weekly token budget — the perceived cost ceiling is much higher than the real one.
The moment to stop reading your agent's code and start having another agent read it first is earlier than almost every developer currently thinks.

Takeaway

Stop running the loop — hand the whole cycle to the agent.

WHAT TO LEARN

Every step you take after the agent finishes — running the server, checking output, filing the PR, copying review comments back in — is a step the agent could handle if you asked it to.

The bottleneck in most agent workflows is the human acting as messenger between steps: read the output, copy it somewhere, paste it back, say what is next. That handoff is itself a prompt.
Before prompting your agent, list every action you will take after it finishes. Each item on that list is a candidate to include in the original prompt.
Pre-assigning roles to sub-agents (security reviewer, exploration agent) misses the point. An agent's value is building the context it needs dynamically; hard-coded roles constrain that.
A PR monitor loop — where an agent watches for incoming code review comments and addresses them as they arrive — works with stock Claude Code and a work-tree-per-PR setup, no custom tooling required.
Agents can spawn agents that spawn agents. A single orchestrating prompt can produce a stacked sequence of PRs, each reviewed and merged before the next begins, while you sleep.
Token cost scales with loop depth, but on a flat subscription plan the real risk is not money — it is undetected wrong-direction work that runs for hours before a human checks in.
The /goal primitive in Claude Code and Codex keeps a single thread running until it self-reports completion. This is different from a dynamic multi-agent workflow and is better suited to long single-objective jobs like large rewrites.
Read the code your agent produced only after another agent has already reviewed it. The first review pass should be peer-to-peer; by the time a human looks, only the genuinely hard judgment calls should remain.

Glossary

Terms worth knowing.

Loop (agent loop): A workflow where an agent's output triggers the next agent run automatically, closing a feedback cycle without a human initiating each step. Distinct from a single agent run.
Work tree: A separate directory that checks out a specific branch of a Git repository, allowing multiple agents to work on different PRs in parallel without touching each other's files.
PR monitor loop: An agent configuration that watches a pull request for incoming review comments and addresses each comment automatically, running indefinitely until the PR is approved and merged.
Heartbeat loop: An agent pattern that wakes on a schedule (e.g., every 5–10 minutes), checks state (PR status, new comments, CI results), acts on what it finds, and sleeps again.
/goal primitive: A built-in Claude Code and Codex slash command that keeps a single thread running until it self-reports the task as complete, looping on the same thread rather than spawning sub-agents.
Stacked PRs: A sequence of pull requests where each one depends on the previous being merged, used to break a large refactor into reviewable increments without a single massive diff.
Recursive self-improvement (Anthropic framing): Anthropic's description of the progression from humans using computers, to using chatbots, to agents editing code directly, to agents orchestrating other agents — each step compressing the human's role.

Resources

Things they pointed at.

00:00channelPete Steinberger (@steipete) ↗

04:37linkAnthropic recursive self-improvement article

02:01productMagic Patterns ↗

08:20toolCodeRabbit ↗

08:20toolGreptile ↗

08:20toolMacroscope

Quotables

Lines you could clip.

01:47

“The majority of your agent runs should probably not be running with prompts that you wrote. That is a crazy thing for me to say.”

Self-aware reversal from a known prompt-quality advocate — lands as a genuine surprise→ TikTok hook↗ Tweet quote

17:50

“We are looking at the code too early. If you are reading the code your agent put out before another agent read it and gave feedback on it, you're wasting your own time.”

Tight counterintuitive claim, no setup needed, immediately actionable→ IG reel cold open↗ Tweet quote

14:32

“My loops created loops, and they did a great job at it.”

Six-word summary of the entire video's thesis→ Newsletter pull-quote↗ Tweet quote

22:51

“If you're on the expensive plan, you should be trying to get close to maxing it out because that's just money you're losing if you're not.”

Reframes the cost fear — turns unused budget into a loss rather than savings→ TikTok hook↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogystory

Here's your monthly reminder that you shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents. I don't know about y'all, but this memo didn't make it to me.

Of course, I've seen loops before. Things like the Ralph loop really helped me think about how agents can do more over time, but it also massively increased the error rate of the changes that I was having my agents make. They were really cool, but they didn't seem that productive, and I found myself going back to the usual, which was asking the model to make a plan, reading the plan, saying, yeah, that looks good.

Go do this part, and then the next part, then the next part, then having another agent review it, then bringing the feedback back to the first agent and just the usual looping of work. But I was the one running the loop. I was the one doing the hand holding and bringing things from part one to part two, and making sure all of my agents had the context they needed to build well.

And Pete, as always, is a bit ahead of the curve. I have been a huge fan of him since way before the Open Claw chaos, because he knew how to think about building with agents in a fundamentally different way that made him way more productive. I think of Pete as an experimental figure in many ways, where rather than being the role model we should all be copying, he's the person figuring out what the future looks like in a weird jank duct tape version now, and we can all learn from that and see where things are going.

At least, that's how I used to think about them, and honestly, I'll admit I still do in a lot of ways. But then I started building more with loops. I started getting my agents to prompt themselves.

I started setting up systems where agents would review code, give feedback adjusted, and then trigger re reviews. I started building systems that would watch pull requests and watch existing issues on other repos to tell me when updates happen. I started using Hermes Agent to bring context to me instead of to go out and find it for me.

And I've accepted now that Pete's right. We should still be writing prompts though.

More importantly, I would argue now that the majority of your agent runs should probably not be running with prompts that you wrote. That is a crazy thing for me to say because it was one of those, like, I never thought I would see the day things.

But now that I've explored it myself and I've shipped a lot of code using these types of loops, I have a lot of thoughts I want to share. But I have one other thing I wanna share quickly first, which is today's sponsor.

AI should be good at design. It knows all the things it needs to about code, designs, visuals, more. But every time I try to have it redesign things that I'm working on, it just doesn't do it right.

At least that was my experience before I started using today's sponsor, Magic Patterns. These guys really cracked good design flows with AI. They're not trying to replace your whole stack or be a full site generator.

They're trying to work within the real constraints of your real code base on just the front end in order to get great designs out. The first thing that makes them different is the design system selector. Unlike other tools that will just generate a bunch of slop code, you can pick between existing real based systems like the one that they provide, a wireframe system, or even classics like Shad CN, Chakra, Mantine, and MUI, or you can create your own, and you can also import things from Figma two, which is super helpful.

You can then switch between different models, obviously the ones that we all love and know are decent design like Opus Line Gemini three one, but also their auto router has really impressed me. It was able to grab the real SVGs for the logos for the things that I wanted to put on there once I showed them where they were.

I can open up the preview and send this to other people on my team, which has already been super helpful. I can also leave comments on any point on the screen to tell the agent what else I wanted to fix, which has been a lifesaver when you're working on these types of things. They even have a visual editor for when you wanna edit fonts, content, and things yourself.

So when you notice the agent's just not getting something right, don't fight it, change it yourself. This is a small thing, but it's one of my favorites. The ability to choose different frames to test your site in to see how it looks on like a mobile display or an iPad display is so helpful as you're trying to get these fine tuned pieces right.

I can't tell you how many times I had a design that seemed good, but as soon as I shipped it and opened it on my phone, was awful. No more, just do it here. Starting to see why companies like DoorDash, Vappy, Granola, and more are leaning so hard on what Magic Patterns has built, these guys get it.

Design better with AI at soydev. Linkmagicpatterns. So this post by Pete is the one that started this new era of looping discourse.

But this is not the tweet that got me to go try loops. It was this one. Here's a simple loop.

Tell Codex to maintain your repos. Wake up every five minutes and direct work to threads. That makes it easy to paralyze and steer work as needed.

He uses an orchestrator skill combined with his triaging and auto review and computer use skills, so some work can land autonomously. This helped a lot click for me. In particular, of your agent directing work to threads.

I didn't realize Codex had a feature where a thread in Codex could spin up another thread in Codex. And now that I know it has that, I have been pushing it much harder. I wanna contextualize this in a bit of a weird way.

I'm gonna reference the article Anthropic did about recursive self improvement because they did a great job describing how our work has changed over time. Previously, a person would use a computer and they eventually would use that to build a chatbot or an AI model. Once we had the AI model, the person could use the computer to ask the chatbot questions and get outputs that they could then use in their code to make better software and eventually maybe make a better model too.

But the loop was the person uses the computer, asks the chatbot a question, it gives a result to the person who then copy pasted into their code and then asks another question. I know a lot of people use stuff like my chat service, t three chat, as a way to just do code, but they would bring it code questions and then copy paste the answers.

It really kinda emphasized the whole like coding is just copy paste meme. Chatbots pushed it way further, but now we've gone far beyond that because copy pasting is not the best use of our time. So instead of copy pasting the result from the chatbot into our code base, we started to just use our IDEs, our terminals, and other tools to talk to the model and get it to edit the code directly.

And that's where things have been for a while now. But then we had another big change with workflows and sub agents. I know a lot of people haven't even made this move yet, and I was hesitant to do it myself.

Obviously, tools like Cloud Code and Cursor will do some amount of this to go explore and find things in your code base, but the idea of telling my agent to spin up five agents to go break up work was something I just wasn't that interested in. Especially when I saw all the crazy shit people were doing, trying to create different personas and roles for all of those workers, where they had a skill that wrote down in markdown files, this is the adversarial reviewer, this is the security reviewer, this is the grokker in finder, this is the exploration agent.

That made no fucking sense. And I would argue that still makes no fucking sense. The idea of predefining personas to go do things in your code base fundamentally misses the cool part of agents and AI as a whole.

It's dynamic. The agent can build the context it needs and do the things it needs to without having everything pre built and hard coded ahead of time. Imagine a coding template for a project where every file's already created and you have to edit things in the existing files.

It's stupid. And that's how I felt about most of the sub agent stuff that people were doing. Workflows pushed me hard here, and the video I just recently published about the things I like about quad code goes a little more in-depth there on the things I like about workflows.

The idea of your agents constructing this method that they're going to use to tackle a problem was really enticing to me. But now I'm going a bit further.

Closing the loop. Where the model doesn't just pick and spin up what sub agents it needs, it audits the work it does, and then sends the result back to run again, and again, and again, and again.

I am not at the fully autonomous loop point yet. I am not claiming the same things people like Boris are claiming, where they're writing the loop and now the code is just happening by itself with no oversight. That is stupid.

But I wanted a taste. I wanted to get an idea of how this could work, so I could play with it myself and see what benefits exist. So I started to play a bit.

I started to do stuff like this. I had Claude Code spin up a PR for a pretty big refactor. I used sub agents a bunch to go address specific concerns, to take over specific parts of the code base.

I didn't even say how to break it up. I let Opus figure that out itself. Man, I miss Mythos right now.

But one specific thing I did do was tell the agent to monitor the PR for comments. Because I have a lot of awesome code review tools that are watching my PRs when they're filed and leaving feedback. And I moved away from copy pasting code out of chatbots and into my code base, and instead, I found myself copy pasting the comments that things like CodeRabbit, Greptile, and Macroscope would leave, and pasting those into the agent so that it would go address them.

It wasn't great. So what I started doing instead, and this was the first step into heavier looping for me, and I would highly recommend you guys try the same, because it's actually really cool. Once you have your setup in such a way where you have different work trees that are monitoring and working around specific pieces of work, where this code is in a directory that is specific to this PR.

That means I don't care about this directory. It's not blocking other work. Once you have this broken out, in this case, I'm SSH ed to another machine on my network that is running this code base, that has this fork of this code base, this work tree for it.

And then I told it, monitor the comments. Watch the PR. Wait for comments to come in.

And when they come in, address them. And it did it. And it's been doing it now for like six plus hours.

It has made a ton of improvements through this. And then I had a taste. And then I got really excited to play more.

I wanted to push the limits of how much I could land without having to do the follow-up prompting myself.

And I'll be honest, I still found myself hopping over the codex and saying, hey, can you review this code? And then copy pasting the results of that review over. I played a little bit more there where I told Claude, hey, when you're done, run Codex with this command to get it to give feedback, and then address what it gives as feedback.

And that worked pretty well, too. But this is still for traditional work, where I have one PR that does one thing that is being watched by my agent to address the comments that come in. There's a lot of work that can't be broken down into just one PR.

I recently ran into one of those pieces of work. I have been rebuilding the isolate layer inside of Lakebed to make it a little more financially reasonable to deploy the way I want to deploy it. I did a deep dive on performance and alternative runtime options for how we could architect this with 5.5.

And it had really good suggestions. But one of the things it pointed out was that my data architecture had a lot of room to improve that could help performance even more than runtime changes.

Here's where it gave that feedback. The isolate architecture may not be the first scaling bottleneck. Current subscription validations rerun every query subscription for an app after each mutation.

For hot apps, we should implement dependency aware and validation, mutation coalescing per app and validating batches, shared results for identical subscription arguments, and backpressure and maximum refresh frequency. This is when I realized there was a lot of work that needed to be done. So I asked upfront, from these features you think we should implement, which should be done separately and which should be done in tandem?

Would it be realistic to do all of this in one PR? It very quickly said, no, I would not implement all of this. It's one project, but at least three PRs.

Current implementation synchronously, yada yada. And then it broke up what the different PRs could look like. I asked if they could be worked on separately or should they be stacked?

And so they should be mostly stacked, but there's some opportunity for paralyzing. And then told it to write an HTML plan, my beloved. Thank you again to our friend Thoric for introducing me to this wonderful pattern where it is so much easier to see what my agents want to do and read it in a way that I can even open on my phone.

It's so nice. And it wrote these plans for each of the portions that it needed to complete.

I also told it here, after the plan, like, please make the plans piece, to create a new thread with the first plan as a starting point. And it did. It created a PR by itself in a new thread to go implement that first plan.

And then it landed, and I did my usual thing, where I had a bunch of back and forth review. I spun up another thread to review it. I copy pasted back and forth.

It got into a good state, so I merged it. I then asked to make a fresh thread for part two, which it did. It only took a few seconds.

But I realized I should be looping harder. This is the single message I have sent to an agent that has impacted my psychosis the most. Would it be possible to make a workflow of some form that first, will spin up a separate thread to make the PR, second, spin up another thread to review that PR when it's filed.

Three, puts the thread from one in a loop reviewing comments until it gets all approvals. And then fourth, the thread would merge the PR and trigger another one for the next piece. I didn't think it'd be able to do this, but I was curious how it would try.

And it made a kind of broken diagram showing the workflow it had in mind. It said it would use a heartbeat attached to this thread, pulling every five to ten minutes. On each wake up, it would read the implementation thread status, detect file PRs, create a fresh review thread when a new PR has a new shaw head, send actionable findings back, re review after the fixes are pushed, yada yada, and then pull latest main before creating the next work tree.

So I said, make the workflow and use it to file the remaining PRs. And it did it. This was Sunday at 02:29AM, and it eventually finished and broke everything in my editor pretty aggressively at 06:50AM.

I set this off before going to bed, and I woke up the next day with four stacked PRs reviewed to hell and back, all merged.

It was fucking awesome. Do I think you should do this on real production code bases that have millions of users? Probably not.

At least not yet. But goddamn is it cool to spin up work in this way, where complex multi stage problems requests, that need their own reviews and cycles and loops.

Because that's the craziest thing here. I asked the model if I could make this loop, and it made a loop that makes sub loops dynamically. This isn't a hard coded, every time I make a change, I spin up one reviewer that reviews it, and then they go back and forth.

This is a dynamic workflow that was created based on the specific needs of this specific problem I was solving. My loops created loops, and they did a great job at it.

This was real code that landed, and sadly, I couldn't have Fable come in and review it because this was after the ban. But the idea of your agents being able to orchestrate dynamic work in a way that is specifically tailored to the problem is so cool.

Throughout most of my career, when I worked at real companies, we would follow some form of the traditional agile sprint loop, where we would put tickets inside of our backlog, and then once every week or two weeks, the start of the week, we would pull up the backlog and decide what was worth working on and how long we thought it would take.

And then try to make sure work that's blocking other work was prioritized accordingly, that everybody had unblocked work to do. But the actual flow of all of this was pretty static.

It was the classic, agile, water folly structure.

And we kind of had to force our work to fit that shape. The most productive teams were the ones that would build their own alternative shape around the problems they were trying to solve.

That is what makes this so cool. The shape of the loop, the shape of the structure, the shape of how work happens can be dynamically generated based on the shape of the work that you're doing.

And you can use this for all sorts of crazy stuff. You can use this to monitor poll requests that need to be merged. You can use this on a schedule to every morning start your day with feedback on what PRs are worth merging and what ones are worth forgetting about.

I use this type of thinking to find the best solution for a five g hotspot. Since I had a loop checking what the best deals were, I got early information about the new Verizon plan they just put out because my loop pointed it out to me randomly on Discord. It's so cool.

And again, to my earlier point, I wrote a handful of prompts in this thread. I wrote most of the prompts. Actually, I didn't because it got in that schedule after.

But up until the schedule started, I wrote all the prompts and I read the responses. And I said, yeah, that sounds good.

Let's see what happens. And then I did see what happened. And what happened was kinda fucking awesome.

So what I would highly recommend you do here, the info you take from here, is to think about the work you do before, during, and after you prompt your agent. When your agent completes its task, pay attention to what you do next.

For me, what would happen is I would tell the agent to build the thing, and then once it built it, I would run the thing and go see if it worked. And if it did, I would commit the thing and then push the thing, and then make a pull request on GitHub for the thing. I would then wait for my code review agents to give feedback.

I would address that feedback. I would then ask my team for feedback. I would address that feedback, and then I would merge it.

Start from where you started there. The first thing I did after the changes were completed was run a dev server. Tell the agent to do that.

I then checked if the work worked. Tell the agent to do that too. Computer use has gotten really good.

After I verified the work, would then commit. Tell the agent to do that once it's verified things are correct. Tell it to push up the code and file a PR once it's ready.

Then I would go get those code review comments and copy paste them into the agent to fix. Tell the agent to do that itself too. Maybe tell the agent to spin up other threads to do its own reviews.

The other spicier way of putting this is that we are looking at the code too early. If you are reading the code your agent put out before another agent read it and gave feedback on it, you're wasting your own time. That's time that the agent could have spent instead, that you could have used to find other work worth doing, or to relax a little, or go spin up a sidebar.

I don't know what you're gonna do with your free time. But I have had far too many instances where I read AgentCoder's like, that's obviously wrong, and then told it to go fix it. They can figure that shit out themselves, too.

And now, when the human comes in, all the bullshit is gone, and you can focus on the hard stuff. It's so much more fun. Try to find where you have to be involved, and see what it takes to prompt yourself out of it.

I'm not saying you need a bunch of custom skills, I have almost none here. I'm not saying that you need to build fancy plugins or install a bunch of shit. I'm just using stock codecs.

I'm not even using t three code for this. I do hope to get these features added to t three code soon because they're really cool. But I'm just using stock codecs with a normal account here.

There is one catch though. Cost. You will burn many more tokens when you run things in loops like this.

And if it's going down the wrong path, it might go down that wrong path for longer to burn more tokens and potentially cost you more money. If you're paying API prices, you probably shouldn't be doing loops yet.

That said, you might be surprised how far you can go with them. Remember that loop I mentioned earlier that I was using Opus and Claude code for, where it's watching the PR and updating it constantly?

Not only is it doing that, I've noticed that every time it gets feedback, it spins up a workflow with eight steps or more to address all of it. I had one agent spend under ten minutes leaving feedback.

And based on that feedback, the Opus workflow ran for eight hours and did over 3,000,000 tokens down to address like three small comments. It was brutal.

It was absurd. If I was blocked during that time, it would have been very rough. And honestly, was kind of blocked at that point because this is a big overhaul and I want this in before doing other changes.

Because I'm unfucking the the TypeScript that looks like Python that GPT 5.5 wrote. Because as great as the model is at writing code that functions, it does not write code I like looking at. Anthropic models write better looking code.

I wanted to do this with Fable. Fable was taken. So instead, burned a shitload of Opus tokens.

This thread is so long that it's like breaking my SSH and clawed code. I can't even scroll up far enough to get to my first prompt, because this thread is just so much shit going on, very little of which has involved me at all. So how was my usage?

I do have two quad code accounts right now, so I'm sure this burned through it really aggressively. Right? Well, this combined with everything else I've been working on for the last few days using Opus, still has me at only 29% of my weekly limit, which expires in eight hours.

I was maxing out my limits with Fable. And with Opus in a loop like this, I'm not even close to getting my limits.

And I've had like five of these types of loops running in that time. Really big piles of changes happening. And it doesn't fucking matter.

It's not getting close to my limits. I am on the $200 plan. I will also say that I ran a workflow using the new quad code with Opus four eight when it came out on the $100 plan, and I hit the five hour limit instantaneously.

I have never come close to the five hour limit with Opus and Loops. And I'm also not coming close to the weekly limit with it either on that $200 plan. So if you're already on a $200 plan or you're willing to be on one and you find that your usage is not getting like lethal, like you're not getting close to maxing out, start looping more.

And since you can't use these plans at normal companies usually because of the differences and restrictions in how you're supposed to use an enterprise plan at API prices, Go use this for crazy shit that you don't think should be possible. I would also recommend experimenting with the tools that are included with our harnesses now.

A lot of them are pretty powerful. Codecs' ability to spin up new threads is really, really cool. Both Codecs and QuadCode have a slash goal primitive, which allows you to get one thread going forever on a task where it keeps double checking at the end of a turn, did you finish the work?

If no, okay, keep going. That type of like linear never ending loop is different from a dynamic workflow like I showed earlier where it creates dynamic work based on a preplanned goal versus a traditional slash goal where it just keeps plugging along on that one thread until it completes.

I have a goal running right now that's over twelve hours in that's trying to rewrite Hermes agent in Rust so that I can run it in isolates that are much smaller and use less resources. Because my Hermes agent uses over a giga RAM. It is getting close.

It'll probably work. It probably won't be production ready. It probably won't be something I wanna put out there and sell or anything.

But it's a fun use of my spare tokens. And it's really interesting to see what types of problems can be solved when you throw these crazy rate limits at them. The point I'm trying to make here is that you should be treating these limits like challenges.

If you're on the expensive plan, you should be trying to get close to maxing it out because that's just money you're losing if you're not. The 70% I'm not gonna hit in my weekly limit here at eight hours is thousands of dollars of inference that I paid for, that I could've done, that I didn't do.

But again, I need to be realistic with you guys. In all of May, on this computer, I did about $1,900 of inference.

I didn't pay that obviously because I'm using the subscriptions with Claude Code and Codex. But this month, June, which we're only seventeen days into, I'm at nearly $6,000 of usage.

But that's just this computer. As I mentioned earlier, I'm using multiple computers. My Mac Mini has another $2,600 of inference on it.

I'm at $10 for the month across all of my machines. And that's on three of those $200 plans, two Claude Code, one Codex, and I haven't used the second Claude Code account since Fable was taken from us.

That's a shitload of value that I'm getting given for relatively cheaply. To spend $600 and get back $10 of inference?

That means you can do a lot. And if you're not pushing loops to their limits, you're not using that as much as you could be. I've been having way more fun with loops than I expected to, and I'm curious if you guys will as well.

Take a look at what you do when you're done prompting, see what additional steps you take, and ask the model, can you do this? You might be surprised at what it's capable of.

I know for a fact that I was very surprised myself. What I'm trying to say here is that loops are cool not because the technology or the mindset's really cool, but the idea of letting agents do more is unbelievably powerful.

You take anything from this video, it really should be that. Ask your agent to do the next step and see if it impresses you. I know it impressed me.

Let me know how it goes and until next time. Peace, nerds.

The Hook

The bait, then the rug-pull.

Pete Steinberger's tweet landed like a memo no one forwarded. The host had been running his own loops for months — reading the plan, approving it, kicking off the next step, copy-pasting review comments back in — never quite noticing that the loop itself was the thing the agent could run.

Frameworks

Named ideas worth stealing.

16:36concept

Audit the steps you take after the agent finishes

Run the dev server
Check if the work worked
Commit
Push and file a PR
Copy review comments back to the agent
Ask the agent to spin up reviewers itself

Every post-completion action you take manually is a candidate to include in the original prompt. Walk the list once, then fold it into the agent's instructions.

Steal forAny agent workflow where you find yourself doing follow-up steps after the agent says it is done

12:27model

4-step PR loop

Spin up a separate thread to implement and file the PR
Spin up a reviewer thread when the PR is filed
Keep the implementation thread in a loop addressing comments until all approvals land
Merge and trigger the next PR in the sequence

A single orchestrating prompt that closes the entire PR cycle without human intervention at each step.

Steal forMulti-PR refactors where each stage depends on the previous being reviewed and merged

CTA Breakdown

How they asked for the click.

VERBAL ASK

24:18next-video

“Let me know how it goes and until next time. Peace, nerds.”

Soft close with no explicit subscribe ask — relies on implicit community follow-up

MENTIONED ON CAMERA

00:00channelPete Steinberger (@steipete) ↗

02:01productMagic Patterns ↗

08:20toolCodeRabbit ↗

08:20toolGreptile ↗

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.

Thank you Magic Patterns for sponsoring! Check them out at ↗

OTHER LINKSAlso linked in the description.

Storyboard

Visual structure at a glance.

Pete's tweet

hookPete's tweet00:00

Anthropic arc diagram

contextAnthropic arc diagram04:37

Lakebed perf analysis

valueLakebed perf analysis09:32

Big loop prompt

valueBig loop prompt12:27

4 PRs merged by morning

proof4 PRs merged by morning13:44

Claude usage stats — 29%

proofClaude usage stats — 29%18:50

Wrap — ask the agent

ctaWrap — ask the agent24:00

Frame Gallery

Visual moments.

Pete's tweet

Frame at 00:29 from I guess we're writing loops now

Frame at 00:46 from I guess we're writing loops now

Frame at 01:04 from I guess we're writing loops now

Frame at 01:23 from I guess we're writing loops now

Frame at 01:42 from I guess we're writing loops now

Watch next

More from this channel + related breakdowns.

19:12

Theo - t3․gg · Reaction

Claude Code's creator has some really good advice

Theo reacts line-by-line to Boris Cherny's post arguing that automation — CLAUDE.md rules, lint checks, CI — matters more than ever in the agent era, not less.

July 21st

30:48

Theo - t3․gg · Review

GPT-5.6-Sol Is Better Inside Claude Code Than Inside Codex

Theo runs OpenAI's GPT-5.6-Sol through Claude Code instead of Codex and gets visibly better designs and cheaper orchestration — then reads Codex's system prompt on camera to find out why.

July 16th

30:09

Theo - t3․gg · Tutorial

How to Use GPT-5.6 Codex Sol Without Burning Through Your Usage Limits

A same-day breakdown of why GPT-5.6 Codex drains rate limits so much faster than 5.5 — and the five habits that actually fix it.

July 13th

43:15

Theo - t3․gg · Tutorial

A proper guide to Fable 5

How Theo turned a returned, unmetered Claude release into a five-and-a-half-hour unattended agent run that cleared a month of stalled pull requests for about $150.

July 6th

28:37

Theo - t3․gg · Talking Head

FABLE IS BACK! (And Sonnet 5 is here too)

A 28-minute benchmark teardown of Claude Sonnet 5, plus the government letter that brought Fable back from the dead.

July 1st

30:40

Theo - t3․gg · Tutorial

Mythos is here, it's time to start tokenmaxxing

A 30-minute field report on burning $5,400 of subsidized AI inference in ten days — and what actually came out of it.

June 12th