Big Idea

The argument in one line.

Claude Code's $200 plan subsidizes up to $8,000 of monthly inference, and the right response to that subsidy is to run the most ambitious, multi-agent workflows you own — not to be conservative with tokens.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You pay for a Claude Code Max or Pro subscription and want to use every token before your weekly reset.
You maintain one or more repos with a backlog of unreviewed PRs and want an agent to do triage instead of you.
You have a second machine (Mac Mini, spare laptop) and want to understand how to run agents remotely without staying glued to your desk.
You've run simple single-threaded Claude Code sessions and want to understand how workflows and sub-agent orchestration actually behave at scale.

SKIP IF…

You're on the API pay-as-you-go tier and looking for cost optimization — this video is explicitly about burning tokens, not conserving them.
You want a model benchmark or objective capability comparison between Fable and other models.

TL;DR

The full version, fast.

Fable is in Claude Code subscriptions until June 23, giving $200/month users access to what would otherwise cost $8,000/month of inference. Theo has burned over $5,400 in ten days across two machines. The core techniques are: keep rate-limit timers running by sending a dummy message immediately after login, run parallel workflows with eight-plus sub-agents to drain limits fast, swap auth tokens mid-session between two accounts to prevent workflow interruption, and use daily repo-triage agents and multi-judge PR review workflows for real work — not demos. The closing argument is a mindset one: approach the window with ambition, not fear.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:50

01 · The Window — Fable, the subsidy, and why this video exists

Fable is in Claude Code Pro and Max until June 23. Theo has burned $5,470 in ten days across two machines. The video is explicitly about spending tokens, not saving them.

01:50 – 03:43

02 · Sponsor — Render

Render's agent-ready cloud: private networking, blueprints (Terraform-style YAML), durable workflows SDK. $50 credit with RENDER-THEO.

03:43 – 05:54

03 · Rate Limit Architecture — how the two buckets work

Five-hour rolling session window + weekly limit, both starting on first message. Theo can hit the five-hour cap in ~1 hour with parallel workflows. Weekly = ~4x the five-hour cap.

05:54 – 06:47

04 · Automating the warmup — cron via Hermes

Hermes agent in Discord sets up a cron that runs 'claude -p hi' every five hours, keeping timers ticking on both accounts automatically.

06:47 – 12:43

05 · Account swapping mid-session — the auth token trick

'claude /login' in any terminal swaps the auth token machine-wide. All running workflows route through the new account on their next tool call. No restart needed.

08:07 – 13:48

06 · PR review workflow demo — Lakebed file storage decision

Three competing PRs (#35, #37, #39). Theo triggers a 100+ sub-agent workflow: 13 audit agents, 7 judges, harvest + synthesize phases. 1.8M+ tokens in under 30 min.

14:44 – 16:50

07 · Daily repo triage — surfacing PRs worth merging

Morning agent reads every PR across all repos, outputs ranked HTML queue by merge-readiness. A month-old bug-fix PR got merged within 5 minutes of being surfaced via the link.

16:50 – 17:44

08 · HTML plans as the agent handoff protocol

Self-hosted web service for HTML plan files. Browser-readable, phone-readable, URL-pasteable into the next agent. More useful than Markdown for cross-agent context passing.

17:44 – 19:46

09 · Skills as queued work — shadcn improve + Raycast notes

When near a session limit, queue skills that trigger workflows. Raycast notes as a lightweight prompt backlog — jot ideas throughout the day, batch them into agent runs.

19:46 – 22:43

10 · Remote machine setup — stop locking yourself to the laptop

Mac Mini running Codex + Claude Code + Hermes. SSH (local) + Tailscale (remote). macOS Screen Sharing over Tailscale. T3 Code remote system. Close the lid; work continues.

22:43 – 30:40

11 · Mindset — ambition is the right input

Every dev can do 10-100x more work. The constraint is no longer tokens — it's ambition. Sawyer Hood framing: rewrite production apps, add multiplayer, give agents browsers and debuggers. Approach with excitement, not fear.

Atomic Insights

Lines worth screenshotting.

Claude Code's $200/month plan subsidizes up to $8,000/month of inference — a 40x multiplier that disappears when Fable leaves subscription tiers.
Rate-limit timers don't start until you send your first message; triggering a dummy message on login starts the countdown immediately.
Swapping auth tokens mid-session with 'claude /login' transparently reroutes all running workflows to a new account on the very next tool call.
Running eight-plus sub-agents in parallel via Claude Code workflows drains a five-hour session window in under an hour.
The weekly rate limit is roughly 4x the five-hour cap — you can max out the five-hour window four times before hitting the weekly ceiling.
HTML plans are more useful than Markdown for agent handoffs because they're readable in a browser, on a phone, and can be pasted as a URL into another agent.
A morning triage agent that ranks PRs by merge-readiness across all your repos produces more throughput than any amount of manual GitHub review.
When Fable orchestrates sub-agents, it prefers to use Fable for those sub-agents too — explicitly tell it to use Opus or Sonnet or cost compounds exponentially.
Running agents from a remote machine with Tailscale eliminates the session-killing cost of closing your laptop.
Custom lint rules tuned for common agent errors are a high-leverage tool most developers haven't reached for yet.
One agent writing a plan, a second agent validating each claim in that plan, is more reliable than asking a single agent to do both.
Having agents review each other's PRs via GitHub comment loops — without human involvement — is already possible with Claude Code's built-in loops.

Takeaway

Five habits that extract maximum value from a subsidized AI window.

WHAT TO LEARN

When inference is effectively free, the constraint shifts from cost to workflow design — and these five habits are what separate productive token burning from watching a percentage ticker go up.

Trigger a dummy message the moment you log in so rate-limit timers start counting down immediately, guaranteeing the fastest possible reset.
Run multi-agent workflows with eight or more parallel sub-agents instead of single-threaded sessions — this is what actually drains limits fast enough to get real resets.
Build your heaviest, most ambitious tasks for a subsidized window: full PR audit workflows, daily repo triage, entire-codebase improvement passes — not toy demos.
When Fable orchestrates sub-agents, explicitly tell it to use Opus or Sonnet for those sub-agents, or it defaults to Fable everywhere and cost compounds.
Host agent outputs as HTML at a URL rather than passing Markdown — a URL is readable in a browser, on a phone, and can be handed to the next agent directly without human intermediation.

Glossary

Terms worth knowing.

tokenmaxxing: Using every available token in a subsidized AI subscription window, typically by running the largest possible workflows rather than conservative single-threaded tasks.
five-hour session window: Claude Code's rolling rate-limit bucket that resets five hours after your first message in a session, not five hours after you logged in.
weekly limit: A second, separate rate-limit bucket in Claude Code that accumulates across all sessions in a rolling week and resets once the weekly clock started by your first message expires.
Fable / Mythos: The same Anthropic model released under two names: Mythos is the internal/API name, Fable is the subscription-accessible version with additional safety layers. Available on Claude Code Pro and Max tiers only until June 23, 2026.
workflow (Claude Code): Claude Code's built-in orchestration mode where a parent agent spins up multiple sub-agents that run in parallel on separate tasks, then synthesizes their outputs.
Hermes Agent: A Discord-based AI agent that runs on top of a Codex subscription, used here to automate cron jobs and run work on a remote Mac Mini.
HTML plan: An HTML file generated by an agent describing its plan, findings, or recommendations — hosted at a URL so it can be read in a browser, on a phone, or pasted directly into another agent's context.
Lakebed: Theo's unreleased project: a cloud platform designed for agents to operate without needing to access dashboards or manage API keys directly.
sub-agent: An individual agent instance spawned by an orchestrating (parent) agent as part of a workflow, typically handling one parallel slice of the overall task.
Tailscale: A VPN product built on WireGuard that allows devices to connect as if on the same local network, enabling SSH and screen sharing to remote machines from anywhere.

Resources

Things they pointed at.

02:52productRender ↗

05:54toolHermes Agent

05:54toolOpenClaw

08:07productLakebed

14:44channelT3 Code ↗

18:30toolshadcn/ui improve skill

19:11toolRaycast Notes

20:50toolTailscale

21:48productT3 Connect

Quotables

Lines you could clip.

00:42

“These plans can get you up to $8,000 a month of inference for just $200.”

Concrete number that reframes the subscription as absurdly underpriced→ TikTok hook↗ Tweet quote

15:15

“Reading all those PRs and keeping up with all those changes is way too much work for any reasonable human to even consider trying, which is why I don't. I let my agents do it because they don't get bored.”

Clean one-liner that articulates the agent-vs-human triage argument→ IG reel cold open↗ Tweet quote

27:57

“Your focus should not just be writing more code. Your focus should be solving more problems.”

Tight reframe from output to outcome — standalone without context→ newsletter pull-quote↗ Tweet quote

28:32

“Lower your bar for what's worth building and raise your bar for how far you bother going.”

Pithy paradox — counterintuitive phrasing that sticks→ TikTok hook↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

You might have noticed I haven't been posting as many videos the last few days, and there's a good reason for it. It is indeed the Fable release. It's such a powerful model, and I've been pushing it to its absolute limits to the point where it's affecting my sleep, my day to day, and of course, the content I'm putting out.

I don't wanna do a normal video where I just show some of the cool things I've been doing with it, because this release is different. As you all probably know by now, Fable, aka Mythos with some safeguards, is available on the pro and max tier subscriptions on Claude Code, but only until June 22, because as of June 23, it will be removed from those plans.

The model seems to be just too expensive and too compute heavy for them to run-in the subscription tier that is as subsidized as it is. And those subsidies are crazy. We're talking like $8,000 a month.

These plans can get you up to $8,000 a month of inference for just $200. And I'm not talking out of my ass here, I'm speaking from experience. I've spent the last few days going hard on my subscriptions for ClaudeCode as well as Codex to an extent, and I've managed to do $4,358 of inference in the last ten days alone.

There's a catch though, that's just on this laptop. I also have a Mac Mini I've been doing a lot of work on too, and this one's up to $1,112 of additional inference on top of all of this.

That's a shitload of tokens, and I wanna show how I've been using them to get real work done. And I wanna be clear, not all of this is super useful. This is not a video about maximizing how much you get per token or per dollar.

It's kind of the opposite. It's how you can take advantage of these really generous subsidized subscription plans in the ten days we have left with Fable on them, which is why I'm rushing this video out of course, so that you can get the most possible value and see and taste what the future can kind of look like with models this powerful assuming cost is no issue.

Obviously, this is not realistic long term. Nobody should be spending $10 a month personally on inference for day to day work, but spending 2 to $400 and getting that much inference, that is kind of compelling.

I've never burned so many tokens in my life. In fact, I'm pretty sure I did more in the last ten days than I did in the rest of my life prior to that, and I've learned a ton since. The good, the bad, the ugly, and more.

From loops to OpenClaw and Hermes Agent to crazy workflows that will automate value being poured out of your code bases and more, there's a lot to dig into here and a lot of cool opportunities to maximize your usage of these tools and models. But before I can dive into all that, I highly recommend maximizing on today's sponsor.

AI's gotten great at coding, but deploying, not so much. As great as things like AWS are, good luck trying to configure things properly using an agent, especially once you're trying to do multiple things, have preview environments, spin your team up as well. It's just not a great story.

And that's why Render is so, so cool. These guys build an enterprise ready cloud that is also agent ready. While every other cloud is struggling to keep up with what's going on, they're making nice changes like dropping the per seat pricing instead just charging you for your infrastructure.

So it doesn't matter how many people or agents you have on your team, all that matters is how much you're shipping. You can use Render to host anything. Web services, databases, cron jobs, workflows, static sites, CDN content, KVs, everything is supported by these guys, and they have integrations for everything you'd ever want.

And as I mentioned before, these guys are actually enterprise ready. They have private networks built in, it gives them some to over the internet. This is huge for building real systems.

And you can configure this all with blueprints, simple YAML files that describe your entire infrastructure. Imagine if Terraform went way deeper and further, that's what you get with their Blueprints. That said, if you want Terraform, they support that too, don't worry.

And if you're building heavily distributed work with lots of pieces with high failure rates, Render Workflows are gonna save your butt. It's a simple system for queuing and scheduling your work that has a great SDK for TypeScript and for Python, making it easy to make resilient, durable workloads on the same cloud that hosts your database, your CDN, and more.

If you're not convinced yet, would $50 convince you? Because they'll give you $50 of credit if you use CodeRender Theo. Try it now at soidiv.link/render.

Before we talk about burning all of your tokens, I wanna talk about getting as many as possible, as in maximizing your use of the limits that you have. As I mentioned in my last video, I'm actually dual wielding right now. I have two Claude code $200 plans that I rotate between whenever I max one out, and I found a lot of ways to gamify that that we'll talk about in a second.

But first, I wanna talk a little bit about how these rate limits work. This is my second account, which I haven't had to use today, which means that my five hour session hasn't started. This is the first important thing to start hacking around.

You want this timer resetting as often as possible. Realistically speaking, I can hit the five hour limits now in about an hour. So if I don't have this timer ticking and I start this work session that will kill my limit in an hour, then I then have to wait four hours for it to reset because the timer doesn't start until you send a message.

Thankfully, this is easy to work around. This is all it takes. I am just in the claude.ai site.

I'm going to say hi. I'm going to send. I'm going to stop before it does any real work.

And now the timer is ticking. It will now reset in four hours and fifty minutes even though I haven't dinged any of my usage. Okay.

There's a tiny tiny bit was probably used, basically nothing. Doing this gets you reset as soon as possible. So now if in four hours, let's say four and a half hours, I start working, I don't have to worry about hitting my limit because it's gonna get reset in thirty minutes.

That's only one of the limits you have to worry about though because there are two. You have the current session limit and separately, you have the all models weekly limit which resets once a week. Similar to the five hour window, this one starts counting down after you send your first message, again, incentivizing you to make sure you go hop somewhere and trigger something that gets these limits burning.

I wanna automate this though, which you'll notice is a theme for a lot of the things I wanna show off here. I could keep going to the website. I could even have a browser use agent go there for me.

But at this point in time, claud dash p still counts towards your rate limits. That will change in the near future, which means it probably won't work for that long, but at the very least the strategy will work now, so I'm going to set something up. This is not meant to be like, oh, look at how smart I am.

It's meant to try and get you into the mindset of taking advantage of the limits that we have today. I'm currently using a Hermes agent inside of Discord as my main like default AI agent that does random work on my Mac Mini, and I'm going to take advantage of that here.

It already has Claude code set up, so watch and learn. I wanna make sure my Claude code account has a recent message at all times. Set up a cron that triggers every five hours.

It should run-in an empty directory. The command should be Claude dash p hi. And now my Hermes agent is going to go set that up so going forward, I will always have something triggered on that account to keep my rate limits moving at all times.

Also, just a pro tip I got from Ben, I much prefer using OpenClaw and Hermes in something like Discord because every message gets its own thread that I can keep following up in to maintain context. It's really really nice. And I'm using this for a ton of different stuff.

Obviously, you can't use your ClaudeCode auth directly inside of a tool like this because Anthropic really wants you to use ClaudeCode with your sub, not tools like this. But you can use your Codec sub with us if you wanna min max that too, which is what I do. It works really good.

Fantastic. Now that that's done, I'll never gonna have to worry about my limits not counting down. Again, this isn't trying to get more out of your plan, like I'm not trying to push past my weekly limit or anything here.

I'm just trying to make sure that you can get to a 100% and get your reset as quickly as possible. But as I mentioned before, I'm juggling two accounts. This might seem like really really annoying to do especially if you're using the Claude desktop app because when you sign out and into a different account, you lose all of your history, all of your sessions, all of your everything.

I put out a warning about this in the Claude Co desktop app for us account switching folks. And Anthony from Anthropic actually confirmed that they're going to fix this behavior soon. I don't think that will happen within the range we have Fable for in the subscriptions, so I wouldn't count on this.

And honestly, the CLI still is a much better tool than the desktop app at this point in time. That's why I've made meaningful improvements. I did my video roasting it, but I I tried it a bunch the last few days.

I wouldn't recommend it. So what does account switching look like when you're actually doing real work with these tools? Could tell you how to take advantage of the multiple accounts, but I'd rather just show you.

And when I do this, I'm also gonna show one of the real use cases I've been doing with this stuff. I've talked a bunch about my recent project Lakebed in various videos. I'm trying to make a better cloud for building apps that agents can easily operate without needing to access dashboards, deal with API keys, and all those types of things.

And I have a lot of work in flight on this project right now. In fact, some of it overlaps. PRs 35, 37, and 39 are all attempts to add file storage to Lakebed that have their own benefits and negatives, respectfully.

Thirty five and thirty seven differed a lot in their implementations. I had Codex and QuadCode both building their own solutions and then comparing each others, and they couldn't have disagreed more. But when I synthesized the best parts of both and made a new PR 39, it came out a lot better.

I'm still not 100% confident though, and rather than doing the normal thing, is sitting there and reading all of this yourself, I'm gonna be lazy and take advantage of my limits. I have Claude code open in a random work tree of this repo and I'm going to tell it to get some work done here. Notice that I'm using UltraCode.

I don't recommend using UltraCode and the workflow feature in Claude code if you're trying to conserve tokens at all, if but you're trying to hit those limits, you're trying to take advantage of your remaining inference before a reset comes up, doing things like this is actually really useful and finds more information than I would have expected.

I'm gonna whisper flow this one because I don't feel like typing and talking at the same time, so forgive me. I currently have three pull requests open on this project, 35, 37, and 39, all of which are implementing roughly the same feature, which is user facing object storage that developers can implement in their Lakebed apps.

I wanna decide which one is the best choice to merge. Make a workflow where you break these PRs up and have judges review each of them independently to conclude and figure out which one is best and what pieces of each we can bring into the best solution.

Audit all of them independently and help me come to a conclusion as to which of these PRs should be the one that we continue iterating on and merge eventually. I put the word workflow in here specifically to trigger it to start a workflow because workflows are a great way to do this type of giant bulk work.

I've actually found it very nice for this type of thing. It's now creating the workflow and orchestrating all of these sub agents to be the judges, and I'll show what that looks like in Claude Code because it is really cool. What I wanna show you guys first is the account maxing that I've been doing.

So here I have my personal Claude code account, and here I have my secondary one I was mentioning before. The personal is the one I currently have signed in. So if I refresh, you're gonna see that usage start to tick up pretty fast.

And not? Okay. Not that fast because it's still figuring out what the workflow is, so I only have one thread going.

And honestly, running Mythos twenty four seven, or Fable in this case, four seven in just one thread, so you only have one going at a time, you're probably not going to hit the limits too aggressively. But when you start getting workflows going where you have eight or more running at the same time, you'll burn through those limits fast, like easily under an hour.

Okay. It's still setting up the environment, but we do see the percent starting to go up. We're now at one.

It'll go to two momentarily. Well, it would, but I wanna demo the account swapping. It's a really, really complex process.

I'll show you just how complex. You run Claude code, you run the slash login command, you grab the URL that it puts out, and you put it in whatever browser profile has the Claude code that you want.

I'm using Helium, which has browser profiles just like Chrome does, so I can swap between the two. The purple is for my secondary account. I authorize here, I go back to my terminal, and now I'm logged in on a different account.

And ready for the crazy part? We have this workflow going, generating tokens. The next turn it takes, as in the next time it does a tool call or kicks off a workflow or starts a next step of any form, that one's gonna use the new auth token that it just got, and it's going to start routing to the other account.

It doesn't care when you swap accounts mid session. So if I have five workflows going across different projects or different work trees on the same project, and I wanna make sure that I am not going to have them all stop immediately because I hit a limit, which is super annoying by the way because it doesn't recover these workflows well.

So you have a work workflow with like a 100 plus sub agents, you get to the ninety fourth and then your limit hits, you have to often rerun the entire workflow when you hit those limits.

So it's worth it to keep an eye on how close you're getting and swap your auth out before you hit those limits, or you can do what I do which is burn a shitload of money by turning on overage credits, and once you see those getting burned, then you switch over. But now you see this account is no longer going up because this is not the account that the traffic is being routed through.

Okay. Now the workflow is going and you can see just how many tokens get burned with these things. We have the audit stage, which will be 13 separate agents, and depending on what their results are, they'll pass it on to the judge section, which will have even more than verify, harvest, and synthesize.

This run looks like it might be a 100 plus of these sub agents, which is gonna be crazy. Burns absurd tokens. We're already at 368,000 because it's running eight of them in parallel right now.

So now if we go here, you'll see these rate limits for my second account are going up super fast. We're already at 5% when it took the last twenty minutes to even hit 1% here because now we're running it eight x harder. But now this account isn't going up anymore because it's not the one being used for the sub agent run.

Super helpful for hopping between accounts, and the fact that you can just do slash login in any Claude code terminal on a given machine, and it updates all of the things on that machine is wonderful. Jesus Christ, we're almost at a million tokens down, and this has been like under a minute. Yeah, this percentage is going up fast.

Like, this is real time. We're already at seven. This is why you should be careful with your workflows, but you should also be careful with your weekly limits.

My two accounts reset on Wednesday and Thursday respectively right now, And I'm already at pretty high percentages on both. I'm expecting to max out my weekly on at least one of these accounts and I'm kind of counting on a reset for some reason. Hopeful that they'll have some reason to reset things, but even if they don't, I'll probably just grab another $200 account and push it to its limits as well, specifically during this short testing window of ten days where we can use Fable.

I will not be going anywhere near this hard on Anthropic Models when Fable is no longer in the subscription tiers, but when it is, I'm pushing my limits. So you might be wondering now, how big are those weekly limits? Well, from my experience, just from basic testing and the secondary account in particular, it seems like you can get roughly 25% of your weekly when you hit a 100% of your five hour.

Put simply, you can max out a five hour window four times in a given week before you're out of usage. So should you always be trying to hit the five hour limits? Maybe, especially during the end, but you should absolutely be aiming to hit the weekly limit to get the most out of your usage.

So I've already burned like $400 of inference just with these basic tests as I've been filming, and you might be thinking, wow, that's a lot of inference for things that haven't actually panned out much yet. Like you're reviewing three slot PRs and trying to decide which one is best, like how valuable is that? First off, that is actually quite valuable when you're trying to figure out how to deal with a giant pile of PRs, but second off, you can modify this slightly to be way more useful.

Here's a much more realistic example that I find is actually super useful in my day to day work. I maintain a handful of different repos. From Lakebed, which still isn't public yet, to t three chat, which also isn't a public repo, but at the very least has real people contributing every day, to t three code, which is a big open source project that has a lot of people throwing stuff at it.

Even with our best efforts, keeping it under 400 issues and under 300 PRs feels nearly impossible. So rather than try to do that, I've been putting more effort into highlighting and surfacing the best work that is worth pursuing. Reading all those PRs and keeping up with all those changes is way too much work for any reasonable human to even consider trying, which is why I don't.

I let my agents do it because they don't get bored. They can just sit there and do the thing indefinitely. So one of the things I have Methus doing for me, well, Fable, same difference, is every morning going through all of my PRs on all of my repos and helping me surface the ones that are the easiest to merge, the most justifiable to get done, etcetera.

This example is using my Hermes agent on top of my Codec sub, but this is the one that Mythos made and it's really good. I ran this one on just t three code and it built this ranked queue section where it goes through every PR that's currently open, gives it a status and ranks them based on how easy is this to just go merge and how much of my attention does it deserve.

So number one here, we have disabling external git diffs. This is a PR that Magnus opens to fix a bug with the external diff viewer some people use with git when they're using t three code. Symbol PR fixes a real bug, people will be very happy if we merge this.

It was surfaced by the agent even though this PR was originally filed over a month ago. And as soon as I sent this link with all of these ranked PRs to Julius, it got merged within five minutes. Getting this type of overview of all of your work is so much better than trying to dig through the GitHub PR tab trying to get useful information out of it.

If you're wondering how I got my agent to spit out these HTML plans, that's another service that I threw together. This one's just for me and my team, but I recommend building something like it and maybe I'll even open source it in the future if enough people want I built a simple service for hosting plan files, which are just HTML plan descriptions on a real web service so my agent can spit out a URL that I can click on and then go see what it was thinking or what it's planning.

I find HTML plans to be way more readable than Markdown. I have a whole video about this and I've been abusing these for all sorts of different things, including reviewing agent work and having one agent review another agent's work, then just pass the HTML over, but also having the ability to read it in a good format. That's what's cool about HTML is I can look it in my browser or on my phone in my browser and get a good idea of what's going on, and then just paste the link to an agent and say, hey, go deal with this, and it figures it out.

By the way, that agent run I started earlier with the workflow is now at 1,600,000 and 21% of this fresh usage window used.

And that is in under thirty minutes. Kinda crazy. You can burn through usage fast, but if you steer it in the right way, you can get actually useful stuff.

I actually did have p r 39 get reviewed previously by Mythos. And I have its review here, which is super convenient because Codex was actually the agent that wrote this PR. So having Codex write it, having Mythos review it, and then I can take this URL, hand it to Codex and say, hey, go make the changes that this suggested and give me feedback on the ones you don't like as much.

It's such a useful way to just pass context around in a way I can read. So yeah, again, points in favor of HTML plans, highly recommend it. If you're already pushing the limits of workflows and reviews and you're still not getting close to those limits, there are plenty more things you can do, don't worry.

One that I would highly recommend is looking into the skills others have made to audit your projects and your code and find ways to improve it. For example, shad cn's shad cn improve skill is really nice. Vercel built this nice plugin for adding skills.

If you wanna actually add the skills, there's cool commands like the Vercel skills package, but you can also just go copy the content of the skill directly. Because as you hopefully know by now, skills are for the most part just markdown. So here we have all the markdown for this skill.

I can go to raw, I can grab this all, I can open up Claude, I could put it in workflow mode or ultra code mode, but I am already burning enough inference and I wanna save some of this for later.

So I'm just gonna paste, enter, and let it do its thing. Again, very useful if you notice yourself at the end of a weekly limit or an hourly limit, and you want to get a little bit more inference out.

Just keep a set of these things in your head that you would like to run, and write them down, maybe even cue them. I've also recently been loving Raycast's notes feature.

It's super easy to just like write something down and then hide and reopen it with a hotkey. And you can easily iterate through the notes that you have saved here to find random prompts or things you want to do and go grab them in the future if you don't want to send them just yet.

Looks like it's going to kick off a workflow for the improving anyways, which means I'm about to burn a lot of money. Great. Thankfully, it's not my money.

And since starting this video recording, we've already done another like $400 of inference. Jesus Christ.

Take advantage of these generous limits while we have them folks. We gotta escape that permanent underclass. One thing I touched on earlier that I haven't really dug into yet is the effect this has had on my sleep.

I'm sure we've all been there, glued to our laptop just waiting to get the results so you can run the next prompt and then go to bed, and then the one more prompt effect keeps you going and not leaving your desk again and again until eventually you're like falling asleep at your keyboard. The AI vampires of the Silicon Valley is a trope that I'm seeing more and more.

Even some of the most skilled maintainers that are fathers of loving families are finding themselves staying up till four in the morning prompting because it's so addicting. Have a solution to that. I'm not gonna sit here and pretend I do.

What I do have is a Mac Mini on the same network that has Codex, Quad Code, and Hermes Agent all set up and ready to go on it, which has been wonderful for being able to get shit done from my phone and from other computers, and most importantly, being able to close the lid on my laptop and not worry about the work stopping.

I have three ways I interface with that Mac Mini. The one that I admittedly use the most is the SSH directly into it. When I'm on the same network, it's very easy to SSH into.

When I'm not, I rely on Tailscale to do it, which has been much better than I was expecting. As an old Wireguard fan, seeing Tailscale in its current state is awesome, highly recommend. I highly recommend using a terminal like CMux, which has both the sidebar and tabs, so you can pin your Mac mini or whatever other computer you have over SSH as its own section and easily manage and navigate it.

It's been super nice. Since I'm on a Mac and the remote computer's a Mac, I can use the built in screen show utility, which honestly, I didn't even know about until recently, which makes it super easy to access the computer, have native hot keys, and all the other things you would expect when you're using this computer remotely, which has been, again, a lifesaver for doing this type of thing.

This also works over Tailscale, which let me configure some things that were blocking when I was away from my computer. But none of those are my favorite way to access that remote machine. And this is where I will admit we're about to get into a bit of a self plug, but I hope you guys appreciate it because I think this is the coolest thing ever.

Julius has put a ton of work into the t three code remote system where you can control a t three code instance from another computer. I usually do it by just connecting directly to the machine that I have t three code set up on, but there's lots of other options too. If you have t three code installed on one machine, it's really easy to give access to other ones.

You can create a link or expose it over something like Tailscale and then access it through that link or by going to the app.t3.code site and creating the connection there. Julius is also deep in the building of t three connect which is a news method to connect to a remote t three code instance that you have on one of your machines from another machine or even your phone with the upcoming t three code app.

Very excited about everything we're cooking there. But I'll be real, you can build all this yourself. Codex has some of it built in already.

Claude code pretends to, but it never works when I try. There's lots of ways to control your agents remotely, and I highly recommend finding methods like this because you'll start pushing the length of your jobs a ton when it's no longer locking you to your laptop. Before I had this remote Mac mini setup, I found myself running shorter jobs so I wouldn't have to worry about closing my laptop because I didn't wanna be one of those people walking around with a half open laptop.

I really didn't. Once I had the Mac Mini setup and I could run things from there, I found myself using agents entirely differently. Letting them go off on long exploratory journeys, not expecting to be able to use the results, but expecting to have interesting enough findings to talk about to my agents or even to my other coworkers when we build these types of things.

I don't care what method you end up using to control these remote machines. You can even build your own, which is a fun way to both burn tokens and really refine your workflow to fit your specific needs and expectations. If you take anything from this video, I really want you to take home the creativity you can apply to how you burn your tokens now.

You can have agents reviewing each other's work and you can even automate it. Something like ClaudeCode is capable of doing babysitting with built in loops. So you can tell one agent to make a PR and watch whenever a new comment comes in, and then tell another agent to watch a PR, and whenever a new push happens, give it a bunch of feedback and leave a review.

Now you have two agents that aren't even aware of each other, giving each other what they need to keep pushing work forward before a human's even involved. You can give agents stuff like browser use. This is much stronger on the codec side, but Cloud Code is starting to get there as well.

This allows an agent to spin up the changes it made, record the screen showing that the changes work, and then send you that or post it in the PR when they're done, not bothering you until they have video evidence that the work actually worked. As always, Pete is far ahead of the game and he has awesome examples. Codex has the ability to spin up threads by itself, so it's not just sub ages.

It can trigger new threads and make new work trees and make real changes through that. So you set up a really simple loop here. Tell Codex to maintain your repos.

Wake up every five minutes and direct work to threads. Makes it easy to paralyze and steer work as needed. Since each thing that it thinks should be going on gets its own thread, it's very easy to hop into a given thread and tell it, hey, that's wrong.

Go do it the other way or just stop it if it doesn't make sense. If you combine this with something like your Sentry bug tracking or your Datadog analytics, you can combine it and give all of the resources and data available to you as the engineer to your agents, which let them find information that's useful, self improve, and just push things forward in a way that's really powerful and surprisingly valuable.

Jesus Christ, that workflow is now at 1,800,000 tokens. We haven't even started the judges yet because we got one more of these audit agents going.

I know it's like I'm lighting money on fire here, but I've already paid the money and it's actually really fun to watch. I didn't think I'd ever be this type. I've always been the like single threaded just get the thing done guy.

I'm having a lot of fun right now. I've been going a little further than I normally do and I really like how Sawyer framed this. You need to be more ambitious than you have been before.

Ask the model to rewrite your entire production app from scratch. Ask it to make an entire product or internal tool for you, but don't stop there. Ask it to deploy it and add accounts, multiplayer, etcetera.

All the things you wouldn't normally have in a throwaway personal app. Raise your bar. Only by pushing more do you learn.

Orchestrate. When coding and having an agent write a plan, have another agent or multiple validate each claim in that plan. Quad workflows are fucking amazing for this.

Sub agents and codex are getting there now too. They're not quite as good, but they're better than I thought they would be. Fable's really good at orchestration.

One tip with that though, if you don't wanna burn too much, explicitly tell Fable during orchestration to use Opus or Sonnet for its sub agents because for whatever reason, Fable, more than any other model, really likes using Fable for its sub agents. It seems to not trust other models.

So be explicit. Tell it to do other things. Also see there, it just finished the audit stage and then it reviewed what it got back and decided it needed seven judge agents to do more work there, as well as these harvest agents that spun up in the background as well.

But again, Fable's incredible at orchestration. Fable is not needed for the individual sub agents. If you're not trying to actually burn as much money as possible, maybe tell Fable to use something else.

Back to what Sawyer said here. Have it use Codex dash p to delegate parts of it to Codex. Have it use Opus sub agents.

Have it use workflows. Don't be afraid to give those models big chunks of work. Give your agent a browser and computer use.

Let them click around and take screenshots. Use the JS debugger and take performance profiles. LMs aren't just coding machines.

They can do the debugging and reproducing work as well. For token maxing, get some way to kick off coding tasks from your phone. It could be the Codex app, ChatGPT, Claw, etcetera.

As I mentioned before, I threw a bunch of this in my Discord with my Hermes agent. Whatever you choose to do doesn't matter. Make it easy to get the idea out of your head into an agent running.

I don't care how you set this up, there's lots of good options, but your goal should be to make it as easy as possible to go from, oh, that's a nice idea, to having something running, testing out that theory for you as quickly and smoothly as possible. More of these opportunities you take, the more of them you'll find, and the more you find, the more you can do, and the more you'll find the limits of the tools that we're relying on now every day.

My last thoughts about where this is all going, and more importantly, how we should feel about it. The amount of work somebody can do as a developer has gone up exponentially as a result of all of this.

Any one dev can do 10 times to a 100 times more work than they could before. They can't necessarily validate that that work is good or worth shipping, but they can absolutely get the code out more than they ever could. Your focus should not just be writing more code.

Your focus should be solving more problems. And you shouldn't be doing this out of fear either. If you come into this scared that you're gonna lose your job if you don't token max, you're gonna burn a bunch of tokens and a bunch of your own sanity and come out way more stressed and not happy and probably still unemployed.

If you go into this with excitement, the excitement that we all went into software with, with the ability to customize our computers and the things on them to do what we want and need specifically for ourselves. If you go into your agents with that mindset, if you go into the token maxing with that goal, trying to get more out of the things that you do and build every day, If you approach it with excitement, what you'll get back is incredible.

When I went into this stuff skeptical, what I got out was some okay code and a bunch of bugs. When I went into these things excited to push the limits of what I could do and build, I came out with my own custom cloud.

I came out with a pile of NPM packages I use every single day for a ton of different stuff. I started forking the software I rely on to tailor it to my specific needs. I started building my own control planes for managing all of this.

I started having way more fun and building way more too. So don't approach this with the attitude of keeping your job and making sure you'll still be employed in a few years. Approach this with the excitement that we can build all the things we ever imagined.

Lower your bar for what's worth building and raise your bar for how far you bother going. And what you'll find is a lot of fun, but also a lot of rate limits being hit.

We're at 42% now and we still have four hours left. God.

Yeah. I'm at a photo by two subs, and I think you should too. Mythos is an incredible model capable of things I never would have expected even a year ago for agents to be able to do.

It builds awesome things, it writes awesome plans, it describes the stuff it's doing in ways that are actually digestible. And you can use this to push the limits of what your code is doing to get better information and spend less of your time doing the things you don't like and more of your time shipping things you do.

If you find yourself stopping to ask, can an agent do this? Reset your mind. Just ask the agent to do it and see if it fails.

And if it fails, try to figure out why. Ask it what it struggled with. Look through the history and the bad tool calls it made and try to steer it in the right direction.

Maybe make a skill that points it in the right way. Maybe build your own verification in your code base to push it in the right direction. If you're not already writing your own custom lint rules for weird shit that your agents are doing, you're not being creative enough yet.

And that's fine. This is a different way of thinking and it's a really exciting one. So go out and build.

Take advantage of these subscriptions, push the limits of the best models and tools available today, and don't get too locked into them because new things will be available tomorrow. Just try to be excited when you go in, and you'll be amazed what you can pull out. Now that I got these agents queued, I'm gonna go catch up on some sleep.

I recommend you do the same, but come back fresh and build some cool stuff. Let me know what y'all are building, and until next time, peace.

The Hook

The bait, then the rug-pull.

Theo opens by explaining his posting gap as a symptom of Fable addiction — a move that reframes what could sound like an excuse into proof of the model's pull. The urgency is real: Fable leaves Claude Code subscriptions on June 23, which gives this video a hard expiration date and transforms standard tutorial content into a time-sensitive briefing.

Frameworks

Named ideas worth stealing.

06:47model

Dual-account rotation

Two $200 accounts
Staggered weekly resets (Wed/Thu)
Auth swap via 'claude /login'
No workflow restart needed

Maintain two Claude Code Max accounts with different weekly reset days. Swap auth mid-session before hitting a limit so workflows continue uninterrupted on the fresh account.

Steal forAnyone running long multi-agent jobs who can't afford workflow interruption at 94/100 sub-agents

05:54concept

Timer warmup cron

Send a dummy message ('claude -p hi') immediately after login — or automate it on a cron — so the five-hour rate-limit clock starts ticking. This guarantees the fastest possible reset window regardless of when you actually start working.

Steal forAny heavy Claude Code user who wants to minimize idle time between resets

16:50model

HTML plan as handoff

Agent generates an HTML plan file hosted at a URL. That URL can be read in a browser, shared on mobile, or pasted directly into another agent's prompt. Removes the need for humans to intermediate between agents.

Steal forMulti-agent pipelines where one agent's output is another agent's input

14:44concept

Morning repo triage

Daily agent run across all repos that reads every open PR and outputs a ranked queue: easiest to merge first, highest attention-value next. Output is an HTML plan URL the maintainer clicks to see what needs them today.

Steal forOpen-source maintainers with PR backlogs too large to manually review

CTA Breakdown

How they asked for the click.

VERBAL ASK

29:17next-video

“Go out and build. Take advantage of these subscriptions, push the limits of the best models and tools available today, and don't get too locked into them because new things will be available tomorrow.”

Implicit call-to-action — no subscribe ask, no link, just the mindset pitch. Closes on encouragement rather than extraction.

MENTIONED ON CAMERA

02:52productRender ↗

14:44channelT3 Code ↗

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.