Why Modern Creator?

Theo - t3․gg · YouTube

How I code with AI changed a lot

Theo scraps cursor, plan mode, and Claude after five months — here is exactly what replaced them.

Posted

May 27th

2 months ago

Duration

47:33

Format

Talking Head

educational

Views

117.7K

3.3K likes

Big Idea

The argument in one line.

The most productive AI coding workflow is the simplest one: short threads per task, voice-to-text prompts, an agent.md written as a letter to the model rather than a ruleset, and actually reading what the model says instead of skipping straight to the code.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You are a developer using Claude Code, Codex, or Cursor and hitting friction or limits with your current workflow.
You have been trying plan mode or elaborate agent.md scaffolding and feel like you are fighting the tool rather than using it.
You want to understand how remote-first agentic coding works in practice, not theory.
You are deciding between CLI-based and desktop-app-based AI coding environments.

SKIP IF…

You are brand new to AI coding tools and have not shipped anything yet — this assumes existing familiarity.
You are committed to a specific tool and are not open to rethinking the harness layer.

TL;DR

The full version, fast.

Five months after a popular workflow video, everything recommended then is gone: Cursor, plan mode, and Claude are all dropped. The replacement is GPT-5.5 via the Codex harness managed through t3.code, voice-to-text for two-sentence prompts, and a fresh thread per task. The core insight is that reading and steering the model text output — not the code diff — is the discipline that determines quality, and that clean per-task context beats any amount of scaffolding.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:49

01 · Cold open: everything changed

Hook: five months ago I made a workflow video, now I'd walk back most of it.

00:49 – 02:07

02 · Sponsor — Clerk

Auth + billing for unlimited apps at $20/month; Stripe alternative built in.

02:07 – 03:45

03 · Overview of what changed

Slide listing models, harnesses, IDEs/apps, prompt styles, plans, remote control, PR flows.

03:45 – 05:16

04 · Models: why GPT-5.5 replaced Claude

Effectively unlimited inference on $200 plan with 10x event bonus; Claude almost entirely dropped.

05:16 – 07:35

05 · Harnesses explained

What a harness is; Codex CLI vs t3.code vs Cursor; why Codex desktop is best for most people.

07:35 – 09:10

06 · Desktop app vs CLI

App wins on every dimension: image paste, thread switching, remote control. Stop using SSH + tmux.

09:10 – 11:17

07 · t3.code tour + Conductor comparison

Open source, supports all harnesses, one-click PR. Conductor ghosted feedback; t3.code is forkable.

11:17 – 13:32

08 · Remote coding: Codex failures vs t3.code success

Codex remote had 30-second keyboard lag and broken model picker. t3.code remote via Tailscale works perfectly.

13:32 – 18:26

09 · t3.code remote deep dive

Helium network setup, Jack's Android tablet via Replit, Tailscale integration, upcoming React Native app.

18:26 – 19:33

10 · Context management: reference codebases

Clone a related repo and give the agent the local path — produces more reliable outputs than pasting snippets.

19:33 – 21:17

11 · Sponsor — DNSimple

DNS hosting with TypeScript SDK and CLI; agents can run the CLI to debug DNS instead of humans.

21:17 – 22:47

12 · Lakebed origin thread

Voice-to-text thought dump as first prompt; asked the model to roast the plan before proceeding.

22:47 – 26:55

13 · The most important habit: read the model

Most devs care more about code output than text output — that's backwards. Steer by what it says.

26:55 – 28:30

14 · agent.md as a letter

Wrote the context file as first-person explanation of thinking, not rules. No file paths, no enforcements.

28:30 – 30:20

15 · HTML plans + screenshot annotation

Ask the model to write plans as HTML; use screenshot annotation tools (Shotter) to point at problems.

30:20 – 36:22

16 · Thread discipline: one task, one thread

Sequential threads on main, not parallel work trees. Trust the model to find the right file.

36:22 – 39:47

17 · Voice prompts + concrete examples

Two sentences is enough. For complex asks, give a real URL or example instead of over-specifying.

39:47 – 42:00

18 · Computer use for verification

Set up Codex computer use, then use it via t3.code remotely. Agent deploys and checks its own work.

42:00 – 46:32

19 · PR flow: branching, stale PRs, CodeRabbit loop

PRs for security/hosting/large changes; stale branch inspection; run CodeRabbit CLI in a loop.

46:32 – 47:33

20 · Close: keep it simple

Most developers are over-engineering their workflows. Simpler is more productive.

Atomic Insights

Lines worth screenshotting.

Theo stopped using Claude models almost entirely — GPT-5.5 on the $200/month plan has been effectively unlimited for building a full cloud framework from scratch in five days.
A desktop app for agentic coding outperforms CLI in every real-world dimension: image paste, multi-thread switching, remote control, and visual feedback.
The model text output is what you need to read and steer — most developers skip it and only review the code, which is backwards.
An agent.md written as a letter explaining how you think beats a 5,000-line ruleset with file paths and technical enforcements.
Starting a new thread for every task keeps context clean and is faster than worrying about codebase re-exploration — the model does it in seconds.
Voice-to-text makes prompts better, not longer — it removes the friction that causes developers to under-specify what they actually want.
Almost all of Theo's threads on a real cloud framework were two sentences or less, and most completed correctly on the first try.
Asking the model to write plans as HTML instead of markdown makes them readable enough to actually catch wrong assumptions before execution.
Letting the agent use browser control to verify deployed changes raises first-try success rate dramatically.
A glossary of project-specific terms in agent.md eliminates ambiguity when a system has multiple layers of users and agents.
Parallel work trees increase cognitive overhead without matching the throughput of sequential fast-model threads.
When the same mistake recurs, fix it in agent.md — do not write longer prompts.
CodeRabbit loop: tell the agent to run the CLI until it gets zero feedback rather than manually shepherding each round.
Stale PR cleanup is faster when you ask the agent to diff the branch against main and decide whether its changes are already superseded.

Takeaway

The workflow that scales is the one you actually read

WHAT TO LEARN

Reading what the model says — not just the code it writes — and keeping every thread to one task is the discipline that separates productive AI builders from frustrated ones.

04Models: why GPT-5.5 replaced Claude

Switching models mid-project resets context advantages; committing to one model and learning to steer it beats constantly chasing the new state of the art.
Effective inference limits on flat-rate plans are often far more generous than advertised when you stay within the intended usage patterns.

05Harnesses explained

A harness is not an IDE — it is the set of tools that lets an agent act on your machine; choosing the right one is a separate decision from choosing a model.
Open-source harnesses let you fork and customize when defaults do not match how you build; closed-source ones leave you stuck when they stop responding to feedback.

06Desktop app vs CLI

Image paste in prompts is not a nice-to-have — a third to half of real prompts include a screenshot, and CLI tools make this painful or impossible.
The best desktop app for agentic coding outperforms any CLI workflow for multi-thread management, remote control, and real-time feedback.

08Remote coding: Codex failures vs t3.code success

Remote agentic coding is only practical when thread switching, image paste, and keyboard input all work at full speed over the connection.
Tailscale combined with a remote-hosted IDE removes the half-open-laptop problem: start a task, close the machine, check results later.

10Context management: reference codebases

Giving the agent a path to a reference codebase on the same machine produces more reliable outputs than pasting code snippets or describing the pattern.

12Lakebed origin thread

Starting a complex project with a deliberate request for the model to critique the plan surfaces wrong assumptions before any code is written.
The model text output is what steers the output quality — skipping straight to the code diff means skipping the part that tells you whether the agent understood correctly.

14agent.md as a letter

An agent.md written as a letter explaining the project purpose and the developer thinking style reduces incorrect assumptions without requiring explicit rules.
A glossary of project-specific terms in agent.md is most valuable when multiple layers of user or agent exist in a single system.

15HTML plans + screenshot annotation

Asking the agent to write plans as HTML pages rather than markdown makes them readable enough to actually catch wrong assumptions before execution.
Once one well-formatted plan exists in the codebase, subsequent plans inherit the format automatically.

16Thread discipline: one task, one thread

Starting a new thread per task eliminates context bleed where earlier task discussion biases decisions in a later unrelated change.
Trusting the model to find the right file to edit produces fewer confused diffs and more correct first-try results than specifying the file yourself.

17Voice prompts + concrete examples

When a task is complex, giving the model a concrete working example eliminates more ambiguity than any amount of added specification.
The goal is prompts short enough to dictate in two sentences; if a prompt is longer, the problem is in agent.md, not the prompt.

18Computer use for verification

Letting the agent deploy and verify the result using browser control removes the human from the feedback loop on routine changes.
First-try success rate rises when the agent has a way to confirm the work is done rather than just declaring it done.

19PR flow

PRs add most value for changes that are security-related, affect the hosting layer, or are large enough that a second opinion is worth the overhead.
Asking an agent to inspect a stale branch against main and decide if its changes are superseded is faster than reviewing it yourself.

Glossary

Terms worth knowing.

Harness: The toolchain and runtime that lets an AI agent read, write, and execute code on your machine. Examples: Claude Code CLI, Codex CLI, Cursor.
t3.code: An open-source desktop app that manages multiple AI coding harnesses in a unified UI with multi-project thread management and remote-hosting support.
Lakebed / Span: The side project built during this period: a full-stack TypeScript framework and cloud runtime built almost entirely with AI agents over five days.
agent.md / CLAUDE.md: The project-level context file AI coding tools read at session start to understand the project and how the developer wants to build.
Plan mode: A restricted agent mode where the agent outlines changes before executing them. Abandoned here in favor of natural back-and-forth.
Computer use: A capability where the agent controls a browser or desktop application to verify that deployed changes behave as expected.
Fast mode: A Codex feature that increases generation speed at the cost of higher token consumption against plan limits.
CodeRabbit: An AI-powered PR review tool with a CLI interface; used here in a loop where the agent runs it until all inline feedback is resolved.
Tailscale: A mesh VPN that connects machines on different networks securely without exposing them to the public internet — used for remote IDE access.

Resources

Things they pointed at.

02:07productClerk ↗

21:17productDNSimple ↗

06:22toolt3.code ↗

05:16toolCodex (OpenAI)

21:28toolWhisperFlow

13:20toolTailscale ↗

29:28toolShotter

42:10toolCodeRabbit ↗

09:12toolConductor

14:14toolReplit ↗

21:17linkLakebed source thread ↗

Quotables

Lines you could clip.

02:07

“Going back to this old video now just hurts me a little bit because I wouldn't make most of the recommendations I made there today.”

strong hook, zero context needed→ TikTok hook↗ Tweet quote

25:40

“Just talk to the fucking model. They're smart enough now.”

punchy, contrarian, standalone→ IG reel cold open↗ Tweet quote

28:14

“When the agent pushed back, I listened.”

tight punchline, subverts expectation→ TikTok hook↗ Tweet quote

24:04

“Devs have this instinct where they care more about the code output and not enough about what it said — and that's entirely backwards.”

contrarian, shareable claim→ newsletter pull-quote↗ Tweet quote

47:17

“If you're looking at the code more than you're looking at the conversation about the code, you're already behind.”

strong closer, quotable thesis→ YouTube endcard↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

About five months ago, I made a video about how I build with AI. A lot of you guys really liked it, just seeing into a workflow of somebody who's trying a little too hard to push the limits of these tools and build awesome things. And since then, I've kind of entirely changed my workflow.

Back in that video, I was heavily using cursor and plan mode with Opus models and going really in-depth on how I would use plans as the core piece to generate the outputs. That's not really how I'm building at all anymore. From the IDEs that I work in, to the models that I use, to the workflows that I've built for myself to get the most value out of these agentic tools, everything's kind of changed.

And I really wanna go in-depth on it because to be frank, going back to this old video now just hurts me a little bit because I wouldn't make most of the recommendations I made there today. So what am I doing now? That's a really good question and I can't wait to answer it after I tell you guys a bit about today's sponsor.

I don't know about you guys, but I'm shipping more apps than ever. Little things for my team, big things I put in front of people, and more. But I keep hitting the same friction points when I do that.

Building the stuff's easy, but authenticating users and charging them money tends to be really hard. I could just use today's sponsor Clerk, but that would get kind of expensive if I have 30 different projects and I'm paying $20 a month for all of them. That's way too much.

Right? Nope. Because I successfully bullied them into doing unlimited apps on the paid plan and on the free plan.

I still can't believe I won on this one because I was pushing them for a while about this because I always was the guy that wanted a bunch of different apps and I hated that Clerk was becoming my biggest expense. Now it's basically free. I pay the $20 per month.

I already have like 40 projects on my clerk for $20 a month and I couldn't be happier with it. Especially now that I'm using their billing product. Yes, they built a Stripe alternative in.

Don't worry, it's still using Stripe. They just have the best Stripe implementation I've ever seen. And believe me, I've tried a lot of them and I've invested in a bunch too.

The cost comes out to exactly the same as if you use Stripe yourself, but it's a 100 times easier to set up. All the billing data is set up on users, they have access to it through the user component that you already should be using, and setting up a pricing table couldn't be easier. You just call the pricing table component.

Waste less time on billing and auth and spend more time making good software at soidave.link/clerk. There's a lot to break down here from the models I pick to the harnesses I use to what UIs I'm working within, IDEs, apps, etcetera, the way I style my prompts, the way I use or don't use plans, how I manage things remotely, which is a really fun angle I wanna dive into as well.

And most importantly, I would argue, how I actually think about the pull requests that I am making when I build with AI. I wanna give a bit of context on where my experience is coming from here because the way I've been building and the things I've been building has been very different recently. I took on a pretty bold, pretty stupid project called Lakebed.

It's a new full stack framework, a new runtime, a new back end, a new database, a new cloud, a new lot of things to try and make it easier to build quick apps with agents. The point is to be a shitty cloud for shitty apps, but this means I had to do a lot of different things. Thankfully, I was mostly working on it myself though, which balances out some of the chaos I put myself through when making it.

But it gave me a lot to reflect on in terms of how I wanna build with agents now, and I've shifted a lot of my perspectives through the experiences I've had here. When When working on Lakebed, I tried a lot of different tools. I used Cursor a little bit, I used the Codex app a lot, I used t three code, I tried using Claude code of GoodBit as well.

I even played with OpenCode and some of the cool new like open source models throughout. It was a very eye opening experience to see how those different models could work in a code base like this, but also more importantly, how they handled different prompt styles, different harnesses, and different agent m d and quad m d files in particular.

But I wanna answer this first question around models. What models am I using now? Obviously, saying anything about this is gonna make the video incredibly dated.

In fact, by the time this video is published, it's possible that a new state of the art is out. But at this point in time, I really struggle to use anything other than GPT 5.5.

I will explain more on why I'm enjoying Five Five so much now, especially compared to how little I enjoyed it in the past. I have pretty much entirely stopped using Claude models.

I'll occasionally pull one in to like make a quick landing page for me, but for the most part, I'm just using Five Five. I do think Composer Two Five is really cool, but I have effectively unlimited inference on Five Five because I'm on the $200 a month plan, and I still am under that weird 10x thing they did for people who attended the Five Five event.

So I, despite my best attempts, cannot get these numbers down. They just did a reset that they probably didn't even need to do, but the worst I could get is about 6% down on my weekly usage when building a full cloud from scratch.

So yeah. I think the limits are very generous on that plan.

They are 10 x more generous for me because I'm on that weird things of the event. I don't know how drastically that will change when the 10 x is gone, but like the worst I was able to do damage wise would have been the equivalent of 60% with the normal limits.

So I don't think it's a big deal. That said, the cheaper plans are much easier to hit limits on, especially if you use some of the cool new features in the harness. Harnesses are a much more interesting conversation in various ways.

If you're not familiar with what a harness is, you should definitely watch my video on how Claude code works. It is an in-depth overview of what it takes to allow an AI to use your computer to edit code and actually do things for the work you're trying to do.

To be very, very simple with it, the harness is the set of tools and the actual application runtime whatever that allows an agent to do things on your computer to edit code and whatnot. Most harnesses come as some form of CLI, whether that is the harness in quad code or the harness in something like codex, but some aren't necessarily focused on that.

They are more like an SDK. The cursor harness, for example, is more in that direction. They do have a CLI for it, but yeah.

There are a lot of really cool harnesses nowadays, especially in the experimental space. Things like Py are really, really cool, and I do need to do a better deep dive on it, and I haven't yet.

That said, as part of me using five five more, I have found myself pretty much always defaulting to the Codex harness. There are a lot of things that Codex has been doing really well lately. One of the biggest ones is that they've kept the CLI boring.

They're not loading it up with all sorts of crazy stuff. The CLI has just kinda stayed a simple, minimal, boring CLI. But they have been building really cool things into the Codex app.

And a lot of those do carry over to the CLI as well, which means they carry over to other things using the harness like, you know, my favorite way to actually build up these things, t three code. A lot of you all seem to be confused about what t three code is. T three code is not a harness.

It is not able to be paid for, and it is not, most importantly, t three chat. T three chat is a different thing. T three chat is an app for chatting with AI agents similar to ChatGPT.

T three code is an app for managing your other AI harnesses, similar to something like Conductor or kind of Cursor, but really it's more similar to the Codex app.

I'm very proud of what we have built with t three code, and by we, I mostly mean Julius. He's done the vast majority of the work. I have put a lot of time into the Codex app though.

I was using it for most of the building of Lakebed because I wanted to just do another deep dive on the competition, see what its strengths are, see what its weaknesses are, and get a rough idea of where things are at. As I mentioned before, the Codex app is really good. And if you're just using the GPT models and you're happy with Codex, totally fine to stick with it.

If you've never tried one of these styles of agentic IDEs where you have multiple projects open at the same time, you have threads that are easy to swap between on said projects, this style of building is really, really nice.

When Anti Gravity added a view like this in its original release, I started to see the potential value in this way of building. Since then, I've quadrupled down, this is how I build. I only really open up editors to edit environment variables now.

So if I do actually use cursor, which I'll be real, I don't use it on my computer very much, I usually use it in their new cursor glass UI which is from my experience but very broken and laggy. Apparently, it's getting better. I still have not had a good experience with it.

The Codex app is very, very good and is probably the best bet for most people right now. Obviously, I'm biased and I find t three code to be much more pleasant, much more stable, much more reliable, especially with remote stuff. But realistically, Codex is what most people are going to experience.

It's a great experience. I will also say if your only experience with an app like this is the Claude Code desktop app, then you've not experienced an app like this because the Claude Co desktop app is like a third class citizen at Anthropic. No one there really uses it.

No one there is really trying to make it great. It's been stapled into the Claude desktop app in a way that isn't very pleasant.

The Codex app is how most people at OpenAI interface with Codex. In fact, I've heard many more people who aren't even technical using it as an alternative to ChatGPT itself. The Codex desktop app is the best way to use Codex.

If you've only used it through the CLI, you've not really used Codex. So I'm using t three code. I would absolutely support people for choosing Codecs instead.

There's lots of other cool options to consider like Conductor. My main gripes with Conductor are that I had a really rough time trying to give them feedback. I was surprised how quickly my team got ghosted, because we really wanted Conductor to be good.

It's also closed source, which means if you don't like anything about it, you're kinda stuck. Whereas t three code is entirely open source. You can do whatever you want to, and in fact, a huge portion of our users are running forks of t three code.

It is also worth noting that if you're primarily using a ClaudeCode subscription, we are about to get heavily limited because they marketed a bullshit change where they started giving you credits when you're a subscriber, the credits are what we would use instead of your actual subscription limits, which is absolute bullshit.

If you use Claude Code via the Claude Code CLI or the Claude Code desktop app, you can get up to $5,000 of usage for $200 a month. But if you use it in something like Conductor or t three code or if you SSH into a computer and call Claude dash p instead of just the Claude CLI directly, then you only get $200 of usage.

And if you go over that, it costs you money directly. It is a shit change. I can't believe they actually did it.

I have a whole dedicated video on that. But again, we can't do anything.

It sucks that we can't do anything. We will probably have crappy terminal UI that will open when you use Cloud Code and t three code, because there isn't really another option for us. Blame Anthropic.

We can't do anything. We do support everything else you might wanna use though. We support cursor and open code.

You can enable them in settings. They just use the existing CLIs through ACP. It's nice.

It's really cool having a tool like this that supports everything in one consistent cohesive UI, especially once you get into the remote stuff. I know I'm skipping some steps here, but I really wanna emphasize this when we're talking about the apps, because I think it's one of the most important things.

I know what a lot of y'all are thinking. Why don't you just SSH into a computer instead? Because like the problem we're trying to solve with remote control is what happens when I shut my laptop and an agent is running?

I wanna open and see the result. I don't want it to get stopped. I don't wanna have the meme where I have the laptop half open as I walk around.

I just wanna be able to run the thing, close my laptop, or go offline, or code from an Uber, and then have it updated reliably. I really thought Codecs would have this figured out. And to their credit, the mobile integration is largely there.

The fact that there's this little button you can click to set up your phone through the ChatGPT app to control Codecs on your computer remotely is actually really, really cool. As I hinted at before, I like to code on machines that aren't this laptop, but I still like to control it from this laptop. And on this local network, I have my Mac Mini and I try to control it from the codecs app here with codecs running on there.

This is an actual Mac Mini running on my network that I can connect to remotely over like the Mac screen share stuff. I have Codex on here open and ready to go. I also have t three code open on here ready to go.

I spent the last week trying to do everything through Codex. God damn, did I hit a lot of issues. The mobile integration is mostly fine, but I'm just gonna quickly go over some of the insultingly bad problems I had trying to control it remotely through the desktop app.

The first one, and this was like an absolute stopper for me, occasionally, the model picker would disappear. And when that happened, even though I was connected, I was unable to do anything on the remote machine.

So I had to connect back over to it via the screen share, close and reopen the Codex app, come back here, close and reopen the Codex app, and maybe it would work. Like, maybe.

Usually not, but sometimes it would. When it does work, it still has all sorts of fun issues. When you actually just like send a prompt, usually responds fast.

But you see how slow that was to pop up like the history? It's really bad. And where it gets even rougher is if you have to open up the terminal for any reason.

I'm gonna type one through nine just by sliding my hand down the keyboard starting now. Oh, it actually did it kinda fast that time. It's still like super sticky keys.

Like, I am typing as I speak. When I was trying this yesterday, there was up to a thirty second to two minute delay. I cannot stand.

God, it's sticky keys so bad even when you type correctly. It's it's unusably bad. And then when you try to paste an image, it's like fifty fifty if it works at all.

I have not been happy with the remote experience outside of mobile on Codex. I tried it because I wanted to try the mobile stuff, was pretty impressed with that, assumed the desktop app would work well, and it just didn't. Then my team reminded me that Julius went really hard on the remote stuff for t three code.

So I decided to try that instead. And I'm actually trying this a few different ways. I'm trying the baked into the app remote version, but I'm also trying the remote hosted version that was very easy to set up.

I hop over to Helium, you'll see here, I have my t three code instance over my local network fully functioning. I can hop here.

I can go to an existing thing. I can open the terminal. It types full speed, no issues, because it's effectively actually SSH ing over.

It has been significantly easier to build. Image pastes work perfectly, and I actually love having it in the browser because now this is a different thing from my local instance. I don't have to like remember which ones on which machine and deal with all of that.

It's been very very nice. One of the coolest examples I've seen of somebody pushing the limits of the t three code remote stuff is Jack here. He doesn't have a computer.

He does most of his work from an Android tablet. So what he did in order to get t three code working on his tablet is he actually spun it up as a t three code server in Replit. And this gives him the ability to use the t three code web app hosted on Replit to build projects remotely on Replit, which I think is really, really cool.

Point being, t three code remote hosting is great, especially if you combine it with something like Tailscale, which allows you to connect to devices on other networks very trivially without having to expose it to the whole web. It's been very nice to work with. I have been blown away with how stable it is.

I've used this for the majority of the changes I've been making to Lakebed for the last day and a half. And I'm also using it to spin up new projects remotely too, which has been very surprisingly solid. The one catch here is mobile.

We do have mobile web and it works better than I would have expected. Julius is cooking a React Native app now. We should hopefully have that pretty soon, fingers crossed.

And if you do wanna test this out yourself, you can just go to settings in t three code, hop over to connections, and relatively easily set up Tailscale or network access remotely or on your existing network very easily.

You can even do custom SSH connections even though in codecs they're a bit rough. We actually have a UI to store a password when you set it up yourself in t three code. As I mentioned before, Julius went really hard on this stuff because he really likes these remote flows and the result is that it is one of the best experiences I've ever had doing remote coding with AI by far.

No longer am I stuck using tools like Termius to SSH into some shitty Linux box and try to make a terminal UI work on my phone. I still can't believe people are doing that. I will crash out briefly on the SSH terminal people for a second because I need you guys to understand how much you're suffering when you do that.

First off, you have to deal with TMux or ZellaJ or Gnu screen or something. So if you disconnect, you don't kill the work that you're doing.

Second off, you now have to have a bunch of weird key bindings or some other abstraction to switch between different threads you're working on. God forbid you wanna go do something in a work tree.

Good luck, have fun with that. You're gonna be writing a lot of handwritten git commands to spin up that work tree.

And then what happens when you wanna paste an image? I would estimate that a third to half of my prompts have an image in them, whether I'm just quickly grabbing an error screenshot and pasting it, or if I have some UI that I wanna throw over to the agent to tell it like, hey, can you make this better? Hey, can you make it look like this other thing?

I use images a lot in my prompts. The fact that they kind of almost work in traditional CLIs and then don't work at all over SSH is insulting. I've been fighting this fight for a while.

I, as a person who spent the vast majority of their computer time in terminals throughout their life, I really really don't like doing agent decoding in terminals anymore. I'll still do it occasionally for like quick demos or like editing things on my computer. Like if I'm trying to config some directory or like fix my dot file, stuff like that, I absolutely will still use Codec CLI.

But when I am trying to do real work in a real code base, I want the app every single time. I still can't believe the pasting images over SSH thing is as absurd as it is. And while it's cool people are like building Raycast workflows in order to upload an image easily to send it as a URL over for the agent, like, it's cool.

You shouldn't have to do that. The amount of workarounds I've seen people make just to be able to use Claude code over SSH is hilarious and painful and no. I I I will not support that mental illness.

Please try a good desktop app for agentic coding before assuming the terminals are the only solution because you tried the Codex desktop app six months ago and it was shit. It's still shit. I understand if that's the only experience you've had that you think CLIs are better.

They're not. A good desktop app for coding will shit all over a CLI any day. As I mentioned before, working remotely is really really nice, especially if you're like me, and you find yourself randomly having to like leave your office to go to some event or go to some coffee shop meeting, or you just like wanna check-in on your work as you're pacing around the office.

I really like being able to do work remotely, and I find myself spinning up more and more of my work remotely. When I know I'm gonna be sitting at my desk for a while, like when I'm streaming or when I'm working with my team, I will still often run things in t three code locally, but almost all the time otherwise, I am connecting to that remote Mac and doing things through that.

But now I need to talk about how I actually do the work in these tools. Because thus far, we've just been focusing on the tooling side. And in order to understand why I like Five Five so much and why I like remote coding so much, I think my actual ways of working with the models are worth understanding more.

Context management really is the name of the game for getting these things right. An underrated trick that I found really useful is giving the agent the ability to explore other code bases that might be relevant. In this case, was trying to set up auth for the user facing apps people would deploy and generate with Lakebed.

So I needed to have a good auth solution. I wanted to see if my implementation for my auth servers I built before called Shoe would be a good fit. So rather than like describing it or throwing it at the docs, I just cloned down the repo and told the model, take advantage of the auth implementation I have in Shoe with a link to the path on this computer with that implementation so it could use that as a reference point.

This type of context manipulation results in much more reliable outputs from the models and has been one of the biggest improvements I have personally experienced. Even just telling the model like, go clone this repo and throw it in some scratch directory in order to figure things out. There's one other super reliable thing I want you to know about though.

Our sponsor. Agents can write surprisingly good software as long as the problems they're solving are simple. There are certain things that just aren't though.

You know, like DNS, the thing that sucks for everybody. It'd be really nice if DNS was simpler. Oh, DNSimple's on the screen, isn't it?

I love these guys. I've been so blown away with every interaction I've had with them, all of the cool things they do. The fact that they made a good SDK for managing your DNS is incredible.

If you're trying to build services where users can like register domains or sub domains or manage masks and set up forwarding and do all these types of things, good luck doing that programmatically because the APIs that exist for it suck unless you're using dn simple. Like how cool is it that I can call client.register or check domain to see if a domain's available and then buy it all from just writing TypeScript.

This is so useful that I wish I could do it through a CLI. Uh-oh. Is the CLI on the screen now?

Yep. This one was so cool. I did a call with the guy who made it because I was blown away and I wanted to give him some feedback to make it better for agents.

Because having an agent able to run a CLI to debug DNS issues is like a thing that I would have killed four years ago and now that I have it, it's like, yeah, obviously. The CLI is no joke. This is a full time project for one of the engineers there and he went hard on it.

Everything you would do through the SDK is available and it's all also exposed with the help commands so that your agent can see how to use it. You're too busy to be debugging DNS, let your agents do it for you at soydev.link/dnsimple. Here is the original thread where I started Span a k a Lakebed, my new cloud framework everything project that I'm very proud of.

You might notice this is quite a blob of text. The reason for this is that I used voice to text. I found that using WhisperFlow or other voice to text tools makes me write much better prompts.

And this one was meant to be a very much thought dump. Like I wanted to plan out with the model. Especially for these types of large changes, I do like planning.

And I wanna be clear about something. I do not mean plan mode. I like to work with the model and not be scared of letting it write code or make changes or test things.

And plan mode is a little too restrictive. And there are problems here. Like I've had times where I wanted to talk to the model about a thing and I just went and did the thing.

Five five is really guilty of this. But again, I prefer working this way where I'm not necessarily in the plan mindset and then in the edit mindset.

It's more of a natural back and forth. So here, I started with a very important thing, my end goal. And I find that this is the thing most developers miss.

You guys love focusing on the details and I understand why. That's been what mattered our whole career, especially when you're advising more junior engineers. It is important to be detailed about how the thing should work, not just what the thing is.

I found myself moving over to this more higher level, like here is what I want it to do, not how I want you to implement it. And I started with roughly that.

This directory is for a new project I wanna start called Span. The goal is admittedly complex. I want to rethink how clouds work from first principles.

I wanna build a new full stack TypeScript framework that has all the pieces you need to build full applications including simple, minimal, reliable, off layer, database synchronization, file storage, and more. The goal is to make everything you need available via the code instead of having to go through other layers of platforms.

If a user has to open a dashboard, we have failed because they want this to work for agents in all different ways that they would need to initialize projects. The formatting got all screwed up because I was voice to texting, I wasn't even paying attention. I thought this was gonna be a throwaway.

But the fact that I did this as a throwaway and it went so well shows how powerful these tools have gotten. To open, I said I want first for it to roast the plan and give me all my thoughts and feedback before we proceed. Do whatever research you need, yada yada.

Says, I love the ambition, but the roast is simple. This is not one project. It's a runtime database, sync engine, object store, deployment control plane, local emulator, security model, migration system, observability stack, and agent interface wearing one trench coat.

The idea is viable only if span starts with a brutally small thesis. It tried to insist that things like Convex, Instant, and Jazz meant it wasn't necessary. Also, Cloudflare being as good as Cloudflare is, yada yada.

I had to slowly convince it. The important thing here, and I know this is hard for people, you have to read what it says.

I know people tend to gloss over the text the model puts out, and instead read the code it puts out, or maybe they'll even read the plan it puts out, but rarely. I find that devs have this instinct where they care more about the code output and not enough about what it said, and that's entirely backwards.

You have to read what it says. And if you don't like how it's saying things or it says too much, steer it the way you want it to talk instead.

Tell it to be more brief and concise. Tell it that it wrote way too much shit. Tell it to format things in the ways you want to read.

But you need to get the model talking to you in a way you'll actually read what it says. Mark is a victim of this. And I was planning on yelling at him about this when he was here and I forgot, I will do it later.

You gotta read the text, especially when you're doing big sweeping changes. And accordingly, you have to respond to the model based on what it said.

So I went through here and typed by hand the different sections and my thoughts on what it had to say, saying here are the things that I agree with, here's the parts I don't agree with, trying to get the context of this thread to be more in the direction I want this project in.

You gotta kinda treat it like you're convincing somebody of your way of wanting to do things. I put the work in to convince the model to do the thing I want. I see a lot of questions in chat already, like what skill command or framework are you using to get the grilling?

What skills do you have set up? Have I tried the superpowers plugin, all this shit? No.

You're all coping. You don't need all of that shit. I have almost zero skills installed.

Just talk to the fucking model. They're smart enough now. I don't even have the super small and useful grill me skill from Matt setup, because I just have it as a binding in WhisperFlow.

Grill me skill. I just hold down my WhisperFlow key, I say grill me skill, and it just pastes the exact markdown into the input. So I can just do that and it will do the same thing.

Super easy. Even then, I can just tell it to and it usually does it. You guys care too much.

I don't bother with skills usually. After I gave my feedback, the model agreed more. This context makes the idea much sharper.

I buy it more now. The category isn't new cloud. It's an agent native app substrate for tiny full stack apps.

Yes. It got it now. And this is why you talk to the model and you read what it says.

You need to make sure you and the agent and the context are on the same page. And this is the biggest thing I want you to take from this video. The most important thing when you're building with AI is that the AI understands what you want and how you build.

You can do that by writing a 5,000 line agent MD that's global on your computer with all of the things you like and do and such. You could also become a famous influencer and tell the model, I'm Theo, build the way I like to build, Which works sometimes. But the easiest thing to do is to just read the outputs and steer the model in the direction you want to go in.

And if you notice it making the same mistakes over and over again, go into the AgentMD and try to give the model your psychosis. That's what I've done. And I think that's one of the strongest things I did that made it possible for me to make this project so quickly.

You'll notice that in this Lakebed AgentsMD, there are no file paths. There are no technical decisions or enforcements.

There's a couple small general rules at the bottom that I might even delete because I haven't found them to be super useful. The point of the AgentsMD, at least how I use it now, is to make the model more steered towards what I'm trying to do.

I wrote this one almost like a letter from me to the agent to tell it how we're thinking, what we're building, and why we're doing this, so that it's less likely to have bad assumptions or ask weird questions or work outside of the technical constraints that I want it to work within. This document has helped so much. I noticed almost immediately after writing this, by hand, by the way, my agents did not write this file, I wrote this file.

After I wrote that, I found that I didn't really have to do much to get the agent to build how I wanted it to. Once it had the context of how I was thinking about this, it started behaving way, way better.

The craziest thing that I did throughout this project, and I know this is going to be hard for a lot of you, when the agent pushed back, I listened. When it said certain things were not necessarily the right idea or were too hard to justify, I listened and I delayed those.

And one more hack that I found really nice is one that comes from our friends over at Anthropic, having the model write an HTML file for the plan. It's so much nicer to read.

I found this much easier to, like, read and go through the whole plan and get feedback and answer all of the remaining questions. It was so nice. That said, the first HTML page it made was horrible.

It looked awful. So I had to like yell at the model a whole bunch about that. Also apparently, in the Codex app, the questions tool just didn't work in the remote mode.

Because at this point, I was controlling this from my phone because I was busy and it couldn't ask me questions. It might be that mode, it might be the not using play mode. I don't know what it was, but it was insisting it couldn't ask me questions, so I had to like go through and answer all of them directly.

And then I was like, UI for the pages is awful. I got it to clean it up. After a while, it like still was full of useless crap.

I gave it as much as feedback. There's also another trick that I found really useful. Get a screenshot tool that lets you actually do shit.

Like being able to point an arrow at a thing and say, this sucks. And makes it so much easier to steer the model to make the right changes with things. I press control c, the thing goes away.

It's on my clipboard. I use Shotter. It's great.

There's lots of other options that are good too. Get a good screenshot tool. It makes life so much easier.

Once I got the HTML in a good state, I found that every plan I did from that point was really nice looking because it would just look at the existing one and be like, oh shit, I'm gonna copy this formatting. And this is like one of the coolest things is once you get the agent to behave how you want, if there is enough proof of that behavior in your code base, whether that is HTML plans, whether that is your agents md steering it certain ways, whether it's the code itself.

Once you get the model to behave how you want it to, everything else almost stops mattering. And that's one of the coolest things to discover as I've been using Five Five, like in its default state, copying the prompts I used to use with the big, bloated, useless agents m d, Five Five sucks. When you take the time to condense and steer it the way you want it to work, it becomes the coolest way to work I've ever built with.

And here's where we're gonna get into the other things about how I built. You might have noticed I have a lot of threads here. These are just the ones I did in the Codex app.

I have another few dozen in t three code, all for the same project that I did in five days. I probably started over a 100 threads on this one project in five days. And I know what you're thinking already.

Oh, so you have all of those running in separate work trees and you're hopping between 15 of them. Cool. Everyone says they do that now, but there's no way you're productive.

Nope. Almost every single one of these threads was run by itself on main alone. I found that I am just genuinely way less interested in these more parallel workflows lately, because it's just too much context to keep track of.

And when you spin up a smart enough model, especially if you're taking advantage of fast mode, which I've surprisingly been using a lot, it's not worth the price increase. But if you're not getting close to hitting your limits on the codex plan, it's very nice to use. The fact that fast mode is included on your plan, it just increases how fast you go through your limits on Codex when it costs actual money on quad code is hilarious to me.

But yeah, I've been using fast mode on my Codex subscription. I've never come even close to hitting my limits. It's been very nice.

So much so that I don't even find myself leaving extra high as much. If I'm just doing UI stuff where I want it to respond faster, I'll hop over to low. But I've gotten good enough at keeping extra high on task that I found it fine.

You'll notice a lot of these threads are literally just one prompt. This one was very simple. What would it look like to let users bring environment variables for server side code?

Ideally, they'd be able to update a dot env dot lakebed dot server file and run npx lakebed deploy to push those environment variables into the cloud for their deployment. A minute and fifty four seconds later, it wrote a whole model for how this should work that I thought was great. And it really seemed to understand what I wanted here.

The important design choice section shows the name of the file. It said that this should be the source of truth with replace semantics. If we add something and deploy, it gets created and updated.

If we edit something locally, the deploy will rotate it. And if we delete something from it, then Lakevideo Deploy will delete it. Exactly what I wanted.

I didn't have to get more specific. I didn't have to write a long ass prompt. I wrote two sentences, and then it specked out exactly what I wanted.

To which I responded, love it. Build it.

Ten minutes later, the whole thing is working exactly how I wanted. No additional changes needed to be made.

I just pushed it to GitHub, and I was done. It was great. And I started my next thread, which was I wanna be able to manually bump rate limits for a given user.

This is because one of my friends who was trying it out was loving it and wanted a higher rate limit, so I just told it had this feature. And I gave it a little more detail here. I want a new users table in the admin dashboard, so I wanna be able to control this myself, with the ability for me, specifically me, to set custom overrides for someone's limits.

Another important thing, and I had this in my agent m d, I didn't talk about it before, a glossary of terms and language to help the model understand what you're saying can be very, very helpful. I found that with this project, it was difficult because there is me, the person working on this thing. There's also me as a user of the thing.

There's the agent building Lakebed, and then there's the agent using Lakebed to build apps for you, the user of Lakebed. So I gave it these specific terms where you is the agent that is actually going to make changes.

Me, we, and us are the humans that are building Lakebed itself. Developers refers to our users, people who are gonna build things on top of Lakebed, and then agents, which is the thing the developers are using to build with this.

Again, the point here is to make it easier for the model to know what I'm referring to as I discuss these things. Here's another one of the themes you'll notice. Generally speaking, you should try to keep things simple.

And if it doesn't work, that doesn't mean make it more complex. It means fix the things that prevent it from being simple. If talking with the model in plain language was preventing it from getting what you're saying, you shouldn't get way more over specific in all of your prompts.

You should make slight changes to your agents md and to your quad md, so that you can keep your prompt simple. And you'll notice, almost all of my prompts here are two sentences or less.

Oh, no. This one was three sentences in a list of things. But it did exactly what I wanted the first try, and you'll notice this looks almost identical to how the homepage looks now.

Nice and simple. And as I said before, none of these threads will run-in parallel. I would do a task, I would complete the task, and then I would make a new thread and start the next task.

And I would just do that over and over again. As Morg just said in chat, make the difficult change easy, then make the change easily. Yep.

Why new threads? I always make new threads because I don't want old context getting in the way. I treat every thread as a pile of information that is steering the model.

And if I'm doing something different, like this thread, I was working on environment variables. And then this thread, I'm working on user limits. These are different concerns, and having the same thread with different concerns within it just biases the model towards things that aren't necessarily correct.

Remember, the way these models work is a bunch of parameterization. All of these sets of characters that are in your history, that are in the model, your history changes what points to what in the model.

So the more stuff in your history, the more customized the model is to an extent. Every additional word in your chat history is changing how the model behaves. I don't wanna deal with that.

And to those saying, wait, does that mean the model has to explore the codebase every time? Yeah, it does. It doesn't fucking matter.

It does a great job. It still completes all of these things in seconds. And I find that in real codebases, there is so much stuff going on that the history of your previous change is just gonna confuse the next one.

Also chat pointing out, they noticed that I'm not mentioning files specifically or applying skills for to work on. Correct. I'm not doing anything more specific.

I trust the model to find the right file. I am more likely to recommend the wrong file than the model is half the time. If it turns out the file that I set isn't necessarily the right one for the change, the model's gonna get confused and try to make it work in that file.

If I have a good dev on the team, I'm not telling them what file to edit, I'm telling them what to do, and they'll figure out what to change. And I would find half the time when I look at the diff for the changes it made, that I am surprised what files it ended up changing. Again, don't add details unless it needs the details.

Try to be more sparse with your requests and prompts. Figure out what it doesn't understand, and figure out how to fix that without making your shit too complex.

Yeah. I'm just looking through all my threads here. Almost all of them are actually just two sentences.

This was a voice to text, which is why it's a bit longer, and I need a little more context to what I wanted there. Yeah, for the most part, all of these are very simple. Here's an example of me sharing a log screenshot instead of actually copy pasting logs, because it's so much easier to do.

One more pro tip. When you have ideas that are a bit complex or the model struggles to understand it, don't overexplain. Just give examples.

The more simple an example you give that contains the problem you're trying to solve, the better job the model will do at solving it. So this example, I was trying to discuss custom domain flows for Lakebed, which coming soon.

I have a lot of layers to fix here to get it right. I gave the example of, I have a project on Lakebed and gave it an actual URL to an actual Lakebed project I had deployed. I have a domain on Vercel, t3.gg.

I've configured a c name for Lakebed demo t three g g, where the value is this. What's the best path to make this work? Handling SSL and whatnot.

This is to tell it like specifically what my question is, like what my where my problem is located. And then the goal as well. The goal is to be as easy as possible for our users without massively ballooning costs.

It's silly, but I think this is a great example of a prompt for something complex. I said what I wanted, I gave it a very clear concise example so it would understand exactly what I wanted to do, and then I steered it towards the parts that I wanted the most help with and gave it specific goals in order to keep it within my constraints.

Gave me a bunch of info. It did not give me enough info about costs though, so I asked it very specifically, give me a breakdown of all of the costs I would incur by going with your proposal. And then it did exactly that.

I didn't like the proposal, so I got more specific with an exact flow I would want the user to go through. This example, I say the user would run this command, npx lakebed domains add demo one dot t three dot g g, then the CLI would output to add the domain, set the following records. And you go set these records.

And I specify I would go and assign those values, wait for lakebed to pick up the change, issue SSL, and good to go. Ask more questions for more clarification. But I really just use this thread to figure out the how to do this.

What I would normally do at the end if I was ready to go with this is I would ask it to write down a simple plan based on what we discussed that I could have as the markdown or the HTML plan that I would read, confirm, and then hand off to a new thread to go actually build it. Once you actually get the model to start doing the work, one of the most important things is to give it the tools it needs to verify that the work was done.

This could be CLI commands it can run, this could be a test suite it writes, this could be computer use where it actually goes to the page to see if it's correct or not. This is one of the things I have found to be the best about using Codecs. Their plugins, and specifically the computer use stuff that they have built, is really, really good.

Codex's computer use will let Codex control apps on your computer, it'll let it control a full browser with an extension, and it can even do it with the computers locked now, which is really, really cool. I've been blown away with how useful this is. My issue is that I don't like it working on my computer because I wanna use my computer when the agents are running.

So again, this is where the remote stuff got really nice. I've also noticed that once you set these things up in the Codecs app, that you can use them through the Codex CLI, which means you can use them through t three code. So I set up all the computer use stuff in the Codex app and then I went back to using t three code remotely and now my t three code can verify changes by deploying an app on Lakebed and then going to Lakebed to see if it actually deployed or not and make sure it behaves as expected.

It's so powerful. And I found that it makes the likelihood that by the time the agent pings me that it's done, that it actually did it correctly, is way higher. Throughout this whole project, I would say of the like 50 plus threads I've done, maybe four or five of them didn't do what I wanted first try.

The rest all behaved exactly how I expected them to. And it worked very well. And also to be very clear, I'm not using the new like goal skill feature thing inside of Codex.

It seems really cool. It didn't work in the app for me when I tried it. And I don't use the CLI a whole lot and certainly not for these long running things.

And most of the stuff I'm doing, even the stuff that seems really difficult, ends up being done under ten minutes, especially on fast mode. Here, I overhauled the runtime for anonymous deployments, and it took seven minutes.

Coderabbit found issues, so I screenshotted the issues in the Coderabbit PR, told it to fix them, And it did. And then there was conflicts on main.

So I told it, there's conflicts with latest main. Please address them and push up your changes when done. And then it did.

And then I merged it. And then it was good. If your takeaway so far is that this is surprisingly simple, you have the right takeaway.

It's very simple. The simpler your flow, the better. Do things the stupid easy way.

And if it doesn't work, figure out how to make it work. I wanna talk a bit about my PR flow now, because I think this is important. When I start work, I usually start it with a rough idea of how complex the change will be in my head.

It varies a lot depending on what I'm doing, and sometimes the task ends up being a lot more complex than I expected it to be. If I'm unsure of the complexity of the task, I'll usually start by asking the model for its thoughts, and we'll have that back and forth. When I have a good gut feel of how big the change is, at that point, I'll often decide if it's worth doing on the branch I'm currently on, and then the work tree I'm currently on, or if I should go make a new work tree to finish up this task.

Sometimes, once I'm halfway through the planning process, I realize this is gonna be a lot more work than expected, so I just copy the initial prompt, I go spin up a new thread in a work tree, I paste that prompt with a couple additional steering comments to make sure it doesn't go down the bad pass I was going down earlier.

This has worked very well for me. That said, in this project that I've done hundreds of commits on, again, it's a solo project, so it is different in that sense.

I have not actually found myself reaching to make PRs that often, and I even closed two of the ones that I made because I didn't find myself needing it. When I made big enough changes that I really wanted to sit with and get second opinions on, things that are security related, things that change the hosting layer, thinking through what changes need extra eyes, not just like a human looking at it, but other agents or code review tools looking at it.

Like building the intuition for what changes would benefit from that is really important. And sometimes you'll put up the PR and something like CodeRabbit, Macroscope, Greptile, whatever you're using, will catch a bunch of issues, and then you tell the agent to fix them and it does, and then it catches more. At that point, I'll just tell the agent to run-in a loop until it resolves all of the issues, usually using something like the CLI for CodeRev, which has been very helpful for this type of thing.

I had two or so of these where there were just so many pieces of feedback that I didn't wanna keep copy pasting back and forth or telling the agent, go check the PR again over and over. And instead, just told it, run the CLI until you don't get any feedback. And that worked great.

That ended up working specifically for a lot of the ownership stuff that I was trying to get right with this project. That said, in real projects with lots of developers, you're gonna start running into problems with PR bloat, especially, and this is one of the coolest but also the most painful things in a tool like t three code, we made it a bit too easy to make a PR.

You can spin up a new thread, make changes, and then click one button to commit them, branch them, PR them. Super nice when you're using PRs as an artifact for review. Not so nice when you're accidentally spamming your projects with a bunch of PRs that'll never merge.

I do love the one click. I file a lot more PRs when I have that feature. Getting a good workflow in Codecs for turning your changes into a PR was annoying.

So, yeah, this was very, very helpful. But sometimes, those PRs will just sit around, and you don't know if they're worth merging or not anymore, or even looking at.

I've had that happen a few times, even solo on this. And again, bring in the agent. I noticed one of my branches kind of got stale and had a lot of conflicts.

So I brought this branch up in a new thread. I asked, how up to date is the Theo slash admin overhaul branch? Compare against the latest main.

If these changes are still worth merging, fix the conflicts and push up the finished branch. And it inspected it. The work tree is clean and currently on a local helper branch.

Branch is significantly behind. It has 49 commits not in this, while the admin overhaul has two commits on a main. Diffs concentrated reading the changes.

And at the end here, said after inspecting both sides, the branch's useful works already been superseded on main. Main already has the admin user detail route, this route, user detail UI, and smoke coverage for the admin user shell API. Boarded the merge and did not push because pushing a fixed branch would just add noise for changes that are already represented in newer forms on main.

Recommendation, close and delete. Awesome. Generally, it is important to try and keep PRs from getting too stale because it gets really bad really fast.

Take it from us, with 414 open PRs on t three code right now, it gets bad. I think I covered everything I have to here.

I know this video is a bit chaotic, but I kinda just wanted to show you the chaos. What does it actually look like with real projects I'm working on? That's why I showed you the actual threads that I was using to build real projects that I'm deploying now.

I've never been happier with my flow, which means it's probably all about to change again, and I'll be sure to do another updated video if these things shift over time. But for now, just try to keep it simple. I find most developers are trying too hard to engineer their workflows, and I get that.

I love engineering my stuff too. But generally speaking, keeping it simpler makes things much better. And the fact that I have made my set of tools here and my way of thinking about things here so simple has made me way, way more productive as a result.

And while I do dearly love t three code and what it enables for me, especially around the remote stuff, the Codex app is fine. Just like, stop using sidebars in your IDs, guys. Stop using CLIs.

Stop using these things that make it too hard to just start a new thread or start working on a new change. If you're looking at the code more than you're looking at the conversation about the code, you're already behind. It's time for us all to let go a little bit and try to build the way the AI is strongest, which is with a good conversation that builds the right context for the model to go do the right thing.

I know it's a bit different, and I really hope it was helpful for y'all. Let me know how you guys feel, and until next time. Peace starts.

The Hook

The bait, then the rug-pull.

Five months is a long time in AI tooling — long enough for an entirely different workflow to make the last one feel embarrassing. Theo opens by admitting that watching his own previous video now hurts, and that almost nothing he recommended then is how he builds today.

Frameworks

Named ideas worth stealing.

30:20concept

Thread-per-task pattern

One isolated thread per change. Start fresh every time. Trust the model to find the right file. Eliminates context bleed between concerns.

Steal forany agentic coding workflow where task isolation matters

27:07model

agent.md as letter

Write the context file as a first-person explanation of how you think and what you are building, not as a list of rules or file path enforcements. Include a glossary of project-specific terms.

Steal forany project where you want consistent model behavior without constant steering

44:00concept

CodeRabbit loop

After opening a PR, tell the agent to run the CodeRabbit CLI in a loop until it receives zero feedback. Avoids manual copy-paste review cycles.

Steal forteams using automated PR review tools with CLI access

28:36concept

HTML plan review

Ask the agent to write its plan as an HTML file instead of markdown. The resulting page is readable enough to actually catch wrong assumptions before execution.

Steal forany large change where you want to review the plan before the agent writes code

21:28concept

Voice-to-text first prompting

Use WhisperFlow or similar to dictate prompts. Produces better-specified, more natural language than typed prompts by removing friction.

Steal forany workflow where you find yourself under-describing what you want

CTA Breakdown

How they asked for the click.

VERBAL ASK

46:32next-video

“I know it's a bit different, and I really hope it was helpful for y'all. Let me know how you guys feel, and until next time.”

Low-key close with no hard CTA — relies on algorithmic recommendation for next video.

MENTIONED ON CAMERA

02:07productClerk ↗

21:17productDNSimple ↗

06:22toolt3.code ↗

13:20toolTailscale ↗

42:10toolCodeRabbit ↗

14:14toolReplit ↗

21:17linkLakebed source thread ↗

Storyboard

Visual structure at a glance.

cold open

hookcold open00:00

sponsor Clerk

sponsorsponsor Clerk02:40

overview slide

promiseoverview slide03:43

models: GPT-5.5

valuemodels: GPT-5.504:45

harnesses explained

valueharnesses explained06:36

desktop app vs CLI

valuedesktop app vs CLI10:35

Lakebed project

valueLakebed project13:32

reading model output

valuereading model output21:21

agent.md philosophy

valueagent.md philosophy27:07

thread discipline

valuethread discipline35:00

computer use verification

valuecomputer use verification39:48

PR flow

valuePR flow42:00

ctaclose46:33

Frame Gallery

Visual moments.

cold open

Frame at 00:37 from How I code with AI changed a lot

Frame at 01:16 from How I code with AI changed a lot

sponsor Clerk

Frame at 02:40 from How I code with AI changed a lot

overview slide

Watch next

More from this channel + related breakdowns.

27:47

Theo - t3․gg · Tutorial

I hated making this video...

A reluctant 28-minute tour of the Claude Code features every competing harness should steal.

June 17th

30:40

Theo - t3․gg · Tutorial

Mythos is here, it's time to start tokenmaxxing

A 30-minute field report on burning $5,400 of subsidized AI inference in ten days — and what actually came out of it.

June 12th

Video of the Day44:29

Theo - t3․gg · Review

Opus 5 Is My New Go-To Model

Theo spends a full day inside Claude Opus 5, pits it against Fable 5 and GPT-5.6-Sol on benchmarks and real coding tasks, and argues the cheaper, weirder model just won his default slot.

July 25th

19:12

Theo - t3․gg · Reaction

Claude Code's creator has some really good advice

Theo reacts line-by-line to Boris Cherny's post arguing that automation — CLAUDE.md rules, lint checks, CI — matters more than ever in the agent era, not less.

July 21st

30:48

Theo - t3․gg · Review

GPT-5.6-Sol Is Better Inside Claude Code Than Inside Codex

Theo runs OpenAI's GPT-5.6-Sol through Claude Code instead of Codex and gets visibly better designs and cheaper orchestration — then reads Codex's system prompt on camera to find out why.

July 16th

41:35

Theo - t3․gg · Review

Kimi K3 Is the Best Open-Weight Model Ever Made (Sometimes)

Theo spends a day stress-testing Moonshot's 2.8-trillion-parameter open-weight release — and comes away convinced it's frontier-class, cheap enough to matter, and genuinely dangerous once the weights go public on July 27.

July 17th

Chat about this