Modern Creator
Matthew Berman · YouTube

You NEED to know these vibe coding secrets

A 27-minute systems playbook for turning AI coding tools into a self-managing development flywheel.

Posted
3 days ago
Duration
Format
Tutorial
educational
Views
62.5K
2K likes
Big Idea

The argument in one line.

Expert agentic coders stop prompting and start building systems of skills, automations, and loops that run the development cycle autonomously so the human only reviews outcomes.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You are already shipping code with an AI tool like Cursor or Codex but still manually prompting for every task.
  • You want to run 10-20 agents in parallel without them trampling each other on the same repo.
  • You spend time on repetitive review tasks (PR checks, documentation drift, overnight errors) that an agent loop could handle unattended.
  • You want to cut your AI token spend by routing planning, writing, and review to different models by capability.
SKIP IF…
  • You have never shipped a project with an AI coding tool — this skips fundamentals entirely.
  • You want a head-to-head tool comparison; the focus here is workflows, not which tool wins.
TL;DR

The full version, fast.

There are two kinds of AI coders: beginners who prompt and wait, and experts who build systems. The expert stack has five layers: skills (reusable slash-commands for anything you do twice), automations (event-triggered agent runs that fire on PRs, schedules, or conditions), loops (autonomous cycles with a trigger, repeated action, and an end-goal), cloud agents (isolated parallel environments that never conflict), and multi-model routing (frontier models for planning, cheaper models for execution). The one thing that remains genuinely unsolved is getting a dozen parallel agents to merge to main without deadlocking CI.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0003:46

01 · Coding Tools

Cursor and Codex as primaries; Claude Code, Devin, Factory as solid alternatives. Key differentiator: model flexibility and concise agent output.

03:4608:28

02 · Skills

Four uses: anything done more than once, domain-specific rules, tool instructions, quality gates. Off-the-shelf packs available on GitHub.

08:2811:29

03 · Automations

Event-triggered agent runs in Cursor and Codex. Demo: PR opened then wait for Greptile comments then fix then push.

11:2915:20

04 · Loops

Autonomous loops: trigger + repeated action + end goal. Three production examples: overnight docs sweep, sub-50ms page load enforcer, production error sweep.

15:2016:27

05 · Best Practices

The three-flywheel: 100% test coverage + perfect documentation + exhaustive logging. All maintainable via automations with no manual overhead.

16:2722:16

06 · Cloud vs Local

Cloud: infinitely parallel, isolated, accessible anywhere, unique features. Local: faster startup, more control, latest features. Worktrees as a local parallel primitive.

22:1624:19

07 · Multi-model

Route by cost and capability: planning to a frontier model, code writing to a fast mid-tier, review to a second frontier model. Codify as a skill.

24:1926:49

08 · Merging and Deploying

The unsolved problem: parallel agents racing to merge main cause deadlocks and repeated CI runs. Current best workaround: batch-commit via a single consolidating agent.

Atomic Insights

Lines worth screenshotting.

  • Experts don't prompt AI coding tools — they build systems of skills and automations that prompt themselves.
  • If you've typed the same instruction to an AI agent more than once, you've already lost time you can't get back.
  • A loop needs exactly three things: a trigger, a repeated action, and a goal that ends it — without the goal it runs forever.
  • Running 100% test coverage is no longer a resource question — it's an automation question.
  • Cloud agents give you infinite parallelism but local agents give you faster startup and the latest features — most heavy users will eventually run both.
  • The merge-to-main problem with parallel agents is genuinely unsolved in 2025 — even Cursor is building a new Git alternative to address it.
  • You can define which AI model handles which phase of a task in a single skill file, and the agent will route itself accordingly.
  • An overnight docs-sweep loop costs almost nothing to run and eliminates documentation drift entirely.
  • The production error sweep loop means that when you wake up, fixes for any logged error from the night before are already in a PR waiting for review.
  • Worktrees let multiple local agents write to the same repo in isolation — without them, agents writing to the same file will conflict and spin out.
  • Cloud agents run in completely isolated environments, which eliminates the local worktree conflict problem at the cost of startup latency.
  • The three properties worth automating in every codebase are test coverage, documentation, and logging — all three have zero-maintenance paths with current tools.
  • A sub-50ms page-load loop that runs until every route passes is a practical, real-world example of autonomous goal-conditioned execution.
  • Agents can discover and invoke skills at runtime without being told which one to use — a well-organized skill library becomes self-directing.
Takeaway

How to stop prompting and start automating your codebase.

WHAT TO LEARN

The gap between beginner and expert AI coders is not the tool — it is whether you have built a system of skills, automations, and loops that handles the repetitive work without you.

01Coding Tools
  • Tool choice matters less than depth of configuration — Cursor, Codex, Claude Code, Devin, and Factory all work, but none are expert tools out of the box.
  • Concise agent output is a practical feature to evaluate when comparing tools — long explanations from the agent about what it just did are a productivity tax.
02Skills
  • A skill is the unit of reuse in an agentic workflow — if you are typing the same instruction twice, it should already be a skill.
  • Agents can discover and invoke skills at runtime without being explicitly told which one to use, which means a well-organized skill library reduces the cognitive load of directing agents.
  • Off-the-shelf skill packs cover the full dev cycle from PRD to deploy and can be installed with a single paste-and-enter.
03Automations
  • An automation is just a trigger plus an agent prompt — the complexity comes from chaining them correctly, not from the technology.
  • You can instruct an agent to wait for a precondition before acting, turning multi-step async workflows into a single automation definition.
04Loops
  • A performance loop can run unattended for hours and produce measurable, verifiable outcomes — the goal condition is what makes this safe.
  • A production error sweep loop means errors from the previous day already have PR fixes waiting when you arrive in the morning.
  • The Loop Library at signals.forwardfuture.ai/loop-library is a free, community-contributed resource with ready-to-use loop prompts.
05Best Practices
  • There is no longer a resource justification for skipping test coverage, stale documentation, or missing logs — all three can be maintained automatically.
  • The flywheel framing is useful: good tests catch regressions that break docs; good logs surface errors that need tests; good docs reduce the chance of introducing new bugs.
06Cloud vs Local
  • Cloud agents are the right choice when you are running more agents than your machine can handle in parallel — the isolation guarantee also eliminates an entire class of file-conflict bugs.
  • Local agents are faster to start and get new features first — most practitioners will end up using both, with cloud for scale and local for iteration speed.
  • Worktrees give local agents the same isolation benefit as cloud environments within a single machine, at the cost of some management overhead.
07Multi-model
  • Not every task needs the most expensive model — routing code-writing to a capable mid-tier can reduce costs significantly with no quality loss on the execution step.
  • Defining multi-model routing inside a skill makes it automatic: the agent reads which model to use at which phase without you specifying it on every run.
08Merging and Deploying
  • The parallel-merge problem is an architectural gap in current agentic tooling — every agent that merges to main forces all others to rebase and rerun CI, which compounds with agent count.
  • Batch-merging via a single consolidating agent is the most reliable current workaround, though it sacrifices some of the parallelism benefit.
  • Cursor building a Git alternative designed for agent-scale deployment signals this problem is recognized at the platform level and will eventually be solved by the tools themselves.
Glossary

Terms worth knowing.

agents.md
A markdown configuration file in a project repo that tells an AI coding agent how to behave — coding style, commit format, personality, workflow preferences. Most tools except Claude Code support it natively.
Automation
A feature in tools like Cursor and Codex that triggers an agent to run automatically when a specified event occurs (e.g. a PR is opened), rather than waiting for a manual prompt.
Loop
An autonomous agent task with a trigger, a repeated action, and an end condition. The agent runs the action repeatedly until the goal is met, then stops.
Cloud Agent
An agent that runs in an isolated cloud environment rather than on the developer's local machine, allowing unlimited parallel instances without consuming local CPU or RAM.
Worktree
A second working copy of a Git repository on the same machine, used to let multiple local agents work on the same codebase simultaneously without overwriting each other.
Skill
A reusable, slash-command-invokable prompt or code definition that tells an agent how to handle a specific repeated task, API, or workflow. Agents can discover and invoke skills at runtime.
Multi-model routing
The practice of assigning different AI models to different phases of a task based on capability and cost — e.g. a frontier model for planning, a cheaper model for execution, a second frontier model for review.
Greptile
A code-review tool that automatically reviews pull requests, assigns a merge-confidence score from 0 to 5, and generates fix prompts that can be pasted directly into an AI agent.
Resources

Things they pointed at.

Quotables

Lines you could clip.

00:00
There are levels to AI coding. Beginners are prompting. Experts figured out how to automate the entire workflow.
Complete hook in two sentences — no setup requiredTikTok hook↗ Tweet quote
15:26
There is no reason to have suboptimal code at this point because you can have 100% test coverage at all times.
Bold claim that reframes developer expectations — high engagement potentialIG reel cold open↗ Tweet quote
26:08
It is broken. There really is not a good way to fix this.
Rare honest admission from a practitioner — stands out in a sea of hypenewsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:00There are levels to AI coding. Beginners are prompting. They're waiting for their agents to finish.
00:06They're reviewing the work, and then they're prompting again. But experts figured out how to automate the entire workflow.
00:14And in this video, I am going to show you what the absolute experts are doing. So this is all of what we're gonna be going over in this video. But first, which tools do you use to start?
00:25So I use all of the agentic coding tools out there. I have to. It's part of my job.
00:31And so I've tested and have experience with all of them. Right now, my two primary coding agents are Cursor and Codex.
00:40Cursor is definitely one of my favorite for multiple reasons. Number one, you can have models from different AI companies. OpenAI, Anthropic, even Cursor themselves has their own model.
00:51And not only that, Cursor was one of the first to have Cloud Agents. And I'm gonna get into more details about what Cloud Agents are, but just know it's a really great feature. So this is Codex, definitely one of the best coding harnesses out there.
01:03What I like most of all about it is, first of all, the design. It's beautiful. And second, it is able to describe what it's building in a really concise way, and just the overall interaction with the model, the vibe of the model is great.
01:18I really appreciate how concise the explanations are. So you can see that right here. It runs commands, then it gives you a one to two sentence summary of what it just did and so on and so forth.
01:29And that's what I really appreciate. I cannot stand having to read essays about what the agent is doing.
01:35I want it short and sweet. Now ClaudeCode is great. I don't use it all that often just because I ran out of quota so quickly and so frequently.
01:44I just stopped using it as much. Devin is fantastic and Factory are all fantastic options. Highly recommend all of them.
01:52They all have different harnesses. They all have different pros and cons. You just need to go out and use them and figure out what works best for you.
02:00Next, we're gonna be talking about rules, agents. Md, and also claud.
02:07Md. So what are these? These are the ways to tell these tools exactly how you want them to work, exactly what your workflow is, how you like your commit structured, how you like your commit messages written, the personality of the model when it's replying back to you, your coding preferences in general.
02:26This is where you define them. Now basically, all of these tools support agents.md with the exception of Claude Code.
02:34They have their own Claude dot md. Cursor has rules, but it's basically just writing to the agents dot m d file, and it very much does support agents.
02:43Md. Alright. So if you're gonna be using it in Cursor, go ahead and go into preferences.
02:49Then on the left side, you're gonna click this little button, rules, skills, sub agents, and then right here are where the rules are written. So if I click into one, here we go. Keep responses short and simple.
02:59Avoid showing code snippets. I can just click in and see it. Respond in plain English only.
03:05Avoid talking about specific parts of the code. Then we have our project approach. Avoid writing one time scripts and permanent files.
03:12Don't mock data except for tests, etcetera. And then, of course, we have the agents file right here. These are actually learned preferences that Cursor writes to as you use it.
03:23And you could just add an agents dot m d file to any project that you're working on. You can define exactly how you want the model to behave, exactly what your workflow is, your deployed process, everything.
03:33That's where you put it. And so if you're not using agents dot m d, I highly recommend you do. Just start with the vibe of the model, the personality of the model, define how you want it to behave and talk to you.
03:44And then from there, you can learn what you like to do. Alright. Next, one of the most important things that you need to use, skills.
03:53I cannot stress this enough. You want to use a lot of skills. Anything that you do more than once, make it into a skill.
04:01Go browse off the shelf public skills. There are so many great ones you need to use.
04:07They are so very important. And so here are some examples of what you're going to use skills for. First, anything that you do more than once.
04:15If you do it more than once, it should have been a skill to begin with. You create the skill, and rather than having to, let's say, copy paste a prompt over and over again, you simply type slash and then invoke the skill, and then it will do that thing for you, whatever it is.
04:30So here's an example. I type slash. It brings up a list of commands and skills.
04:35And what we're going to do is we're going to type auto review, hit enter, and then hit enter again.
04:41And now that skill is invoked, it's going do the auto review skill. And next, one of the most important tools that I use for reviewing all of the code that AI is writing for me is Reptile. Reptile is fantastic.
04:55They're also the sponsor of this video. Let me show you how I actually use them in my coding workflow. So I have a Greptile account.
05:02I connect it to every new repository that I create, and it automatically does this incredible thing.
05:09As soon as a PR is opened, Greptile goes in and starts reviewing the code. Check this out.
05:16Right here. So here's a PR that I opened, fixed skill import context and scan false positive, GREPTILE summary.
05:23It gives me a summary of what changed. It also gives me a confidence score, zero through five. And that is the confidence that if I merge this code, if I merge this PR, it's going to land successfully, and there's not going to be bugs or errors.
05:38And it details the different files that changed and what changes were made to them. It gives me a nice flowchart of what was changed and the pieces of code, and then it tells me specifically issues that it would fix and gives me a prompt to copy paste into AI to fix it.
05:56Reptile is used already by the biggest companies in the world, including NVIDIA, Compass, WorkOS, Zapier, Brex, Scale.
06:06So many different companies use Reptile. I highly recommend it. I'm gonna drop a link down below so you can go check them out.
06:12Let them know I sent you. It really does help our channel to let them know that I sent you, so please go check them out. They've been a great partner.
06:20Link's down below. Next is when you have domain specific rules.
06:25So if your company has a specific writing style, if you have a certain way you like to write up GitHub issues, if you have certain company information you wanna provide to the agent, do that all within a skill. Next, and maybe one of the most important uses of skills, tool instructions.
06:42Tools are executable pieces of code that can be called from a skill. So if you have a specific way that you kick off tests, for example, or if you want to only run a subset of the test or how to use a certain API or CLI, all of this can be defined in a skill, and that's how you use it. You don't have to redefine all of it.
07:04You don't have to provide that context about what the API endpoints are, what responses it should expect.
07:10It's all going to be defined in that skill that you can just reuse as many times as you want. And the cool thing is the agents can actually discover and determine which skills it should be using at runtime. So you don't actually have to say slash whatever the skill is.
07:25The agent will know when to use it. And then last, quality gates. So if you want to say, okay, before we open a PR, I want to run all tests locally, and I want to make sure we have a 100% pass rate.
07:36And if we don't pass, fix the test. If you want all of that process defined and easily invoked, you can put that in a skill.
07:44And by the way, there are tons of off the shelf skills that you can use right now. So for example, here's one called agent skills. It has 61,000 stars on GitHub, and it gives you everything you need for your development cycle.
07:57Everything from refining an idea to spec in the PRD, implementing the code, testing, QA, and deployment. It's just all there.
08:05It has very opinionated ways of doing things. So if you like that, great. Just use it.
08:10All you have to do is grab the URL, go to cursor, go to codex, go to factory, wherever you want, put it in and say install this skill. And then you just hit enter, and it's going to install the skill for you. You really don't need to do anything else, and then it'll be available.
08:24Sometimes you have to restart the software for the skill to become available, but that's about it. The next two things I want to talk about are different but very related, automations and loops.
08:35Automations allow you to prompt your model automatically depending on some trigger. I'm going show you what that means. And loops allows your agent to run indefinitely until it hits a certain goal, and I'm gonna show you that specifically as well.
08:50This is what the best of the best agentic coders out there are using. So in most tools, I'm gonna show you this in cursor and in codecs, there is a first class feature called automation.
09:01So this is Cursor. In the top left, I have this automations right here. We're gonna click it, and what we can do is click this create new automation button right there.
09:11The first thing you need is a trigger, then you're gonna give your agent instructions a prompt, and then you can also include memories or add tools or MCP servers. We'll keep it simple. So as I just showed you with Greptile, I want my agent after Greptile leaves its comments, to automatically review the comments, fix them, and then resubmit the PR.
09:33And so let's just automate that. Let me show you how. The trigger will select GitHub, and we can see pull request opened.
09:41So that's when a pull request gets opened. Now there's one problem. The pull request will get open and trigger the automation, but Greptile may not have had enough time to actually review the code.
09:52So what do we do? We'll just say, wait until you see Greptile's comments on the PR. Now because I wrote that, it will literally just wait, which is nice.
10:02Then once you do, go through each of them, each of the comments, and address the comments. Once you're done, push the new code back to the PR. And that's it.
10:11Now every single PR that opens, Greptile will review it. This agent will wait until the comments are there from Greptile. Then it will address the comments and push the code.
10:22Make sure you're selecting the right repo. So I'm gonna select Astro Hub, Buy Anyone. And then last before we create this, Cursor does this cool thing where it automatically identified tools that we might need to make this automation work.
10:35So it highlighted this address the comments. Some tools might not be configured yet. Let's click Tools, go down to the GitHub tool, comment on pull request, and then we're done.
10:46Hit Create. And that's it. Now we have that running automatically.
10:50Super useful. And also in Codecs, it's kind of the same thing. Click up here to automations.
10:55You can either create via chat and just describe in natural language the automation you want, or you can click this dropdown, create it manually, and then you use a title. You add the prompt.
11:07You can select which repo down here, how it's scheduled. You can give it memories and tools. It's very similar to how they do it in Cursor.
11:15I cannot recommend using these automations enough. If you, again, are typing the same thing over and over again or you're doing the same process over and over again, automations are the way to save you a ton of time.
11:29Now let's talk about loops. And in fact, I've been thinking so much about loops, I actually created a loop library, which I'm announcing for the first time today.
11:40It is a completely free library of loops that I have used, that I've found others have used. And if you have your own loops and wanna submit them, you can do that. So here it is, signals dot forward future dot a I slash loop dash library.
11:55I know it's long. I'll drop it down in the description below. All you gotta do is bookmark it.
11:59Here's the Loop Library. And we have a few right now, but I'm gonna be growing this list, and you can always come here. It will always be free.
12:07And I'm hosting it on here. Now, so thank you to them for hosting and partnering with me on the Loop Library. Alright.
12:13So what is a loop? Well, it's kind of exactly what it sounds like. You have some kind of process that loops over itself.
12:20Right? Over and over again. Very simple.
12:22But what does that actually mean? A loop contains three things.
12:26One, some trigger to start the loop. Two, some action that it does over and over again. And then three, some goal, some end goal so that it just doesn't run forever, and the loop will stop once that goal is met.
12:41Now back to the loop library, what does that actually mean in practice? A lot of people talk about this in very hand wavy theoretical ways, but I wanted to actually give you very concrete practical loops that you could start using today.
12:53And I'm also going to explain why automations and loops kind of go hand in hand a lot of times. They don't always need to, but it's nice to be able to kick off a loop automatically. So here's an example.
13:04This is the overnight docs sweep loop. Basically, what it does is it says, each night, review the code base in full and make sure all documentation reflects the latest changes from the previous day.
13:15Update the documentation as needed, then open a pull request with those changes. The point is to keep all of the documentation in my app, whether it's the public facing readme or internal documentation, as up to date as possible at all times.
13:29And so I run this in an automation, and I say, Okay, at 1AM, run this automation. So it looks at all the changes that I made from the previous day, compares it to the documentation, and sees if there are any gaps in the documentation and updates them appropriately.
13:46Here's another amazing one that has really just saved me a ton of time. This is called the sub 50 ms page load loop.
13:54I basically set up a loop for my agent to go through my entire app, load every single page, every single modal, every single sidebar, everything. And if any one of them loads in over fifty milliseconds, I want it to optimize the queries, optimize the website, do whatever it needs to do to make sure every single thing loads in under fifty milliseconds.
14:19So the loop is continue until everything loads in under fifty milliseconds.
14:25And I've had this thing run for hours and hours and hours, and it really does help. When it was finally finished, the app was lightning fast.
14:35Now I want to show one more loop. And again, I'll drop a link to the loop library down below so you can check out all of them. And please submit your loops.
14:43If you have awesome loops that you use all the time that are generalized and anybody can use them, please go submit them. So this is called the production error sweep. I do this every single night.
14:53I have an agent kickoff that looks at our production logs and looks for any errors and analyzes the error, tries to figure out what caused it, writes up a fix for it, and then submits a PR.
15:06And so anytime there's an error and I really do have full log coverage, which I would highly recommend. I'll get to more of those tips later. But any error that happens, any error that shows up in the log, when I wake up, there's already a fix for it.
15:19It's so cool. Alright. So now that you know about automations and loops, let me give you some quick best practices.
15:26Essentially, there is no reason to have suboptimal code at this point because you can have 100% test coverage at all times. You can kick off an automation that checks if you do not have full coverage.
15:39And if you don't, write tests to make sure you have full coverage. There is really no reason not to. There is no reason to have stale or missing documentation for the same exact reason.
15:50You kick off an agent and make sure all of the functionality in your app every single day as it changes gets updated in that documentation. I cannot recommend that enough.
16:01And then last, have exhaustive logging.
16:05Log everything. It really doesn't cost that much. You can always have some, like, thirty day window for logging or seven day window for logging, but you wanna store all logs because you could just task your agent with fixing any errors that come up.
16:17It's so brilliant, this flywheel of perfect tests, perfect documentation, and perfect logging.
16:23Have these three in your code base. I cannot recommend this enough. Alright.
16:28Next, let's talk about cloud versus local agents. Most AI coding tools have both. The big ones that you've heard of definitely have both.
16:37Cursor was really the first one to have Cloud Agents, but Cloud Code has it. Codex has it. And what it basically means is that you can spin up a completely isolated environment for your code base for each individual agent, and it's not running on your computer.
16:54And this is really good for a lot of reasons. Number one, it is infinitely parallel because you're not depending on the CPU or the RAM of your computer, your home desktop or laptop, to run a ton of agents in parallel.
17:08You're using the cloud. You are using a massive data center to power this, so you really don't have to think all that much about, can I spin up ten, twenty, 30 agents? It'll just work.
17:19Next, it is accessible from anywhere. Most of these AI tools have mobile apps, and you can log in and manage your Cloud Agent from anywhere.
17:28And it's very useful for coding on the go. Now, of course, Cloud Code and Codex both allow you to control your local agents remotely. But again, you start running into some of those bandwidth constraints because you're running it locally.
17:41Next, one of the most important reasons to use Cloud Agents is that they run on completely isolated environments, which means if you have multiple agents all writing to the same repo, they're not going to conflict with each other, which is an issue that I have all the time.
17:58Even if I am spinning up new WorkTrees locally for every one of my agents, I still run into these weird edge cases, and it doesn't always work flawlessly like it does if you're using a Cloud Agent. Also, when you use Cloud Agents, there are some really unique features dependent on which AI tool you're using.
18:18For example, Cursor has this incredible feature that gives you a video and screenshots of changes it made. You don't have to ask for it.
18:28It just does it. So rather than just trusting that it got something done, you can actually see it. Check this out.
18:35So here it is. I added a new loading icon to my app, and we can see there it is. And it literally just showed me a video of it.
18:43So really cool, useful feature. Now there are some drawbacks to using Cloud Agents. Let me tell you why sometimes local is better.
18:51Number one is it's faster. It is much faster because you always have an environment ready to go on your local machine versus the cloud, which has to spin up a new environment for every single agent that you kick off. And there's a little bit of latency that you pay there.
19:08It's not huge, but it is something. Number two, you get more control. When it's running on your own computer, when you can actually see the files being changed on your own computer, you do have a better sense of control over what's going on.
19:22Also, cloud agents don't always have the latest and greatest features released by these AI coding tools. So most likely, the latest and greatest features are gonna ship with your local agents and then later show up in the cloud.
19:36But to be honest, I am most likely gonna be moving my entire workflow to Cloud Agents. There are just too many benefits to moving all of this to the cloud, especially when you start running a bunch of agents in parallel, which, you know, when I'm running twelve, fifteen, 20 agents in parallel on my computer, my computer slows to a crawl.
19:57There is no avoiding it. Now I mentioned work trees. I just wanna touch on that one more time.
20:02Alright. So what is a work tree? A work tree is a second working folder, basically a copy of your repo that is separate from your other one.
20:12So I typically spin up work trees for every agent. And so that means each agent can make changes to the same set of files, to the same methods, and then the merge, when I finally merge it later, that's when we're going to resolve all the conflicts.
20:28The problem with not using WorkTrees is if you have a bunch of agents and they start writing to the same file, they're going to get confused and they're going to spin out of control. It's very frustrating. So try to use WorkTrees as much as possible.
20:42Now there is some latency that you pay with using WorkTrees, but overall, there really isn't much downside to just using WorkTrees for all of your agent threads. Now WorkTrees are very easy to spin up.
20:54Here it is in cursor. So here's my repo. Here's the branch that I'm using.
20:59And right here where it says cloud, this is if you wanted to spin up a Cloud Agent, you can select just the repo itself, and all of the agents are going to work in the same work tree. And if you click right here, new work tree, that allows you to spin up a new work tree for that agent.
21:13And so that's it. You're done. It's that easy.
21:16In Codex, very similar. Right here where it says cloud, you click it. Instead, you click new work tree.
21:22K? And it automatically selected main, but that's it. Then when I kick it off, as you can see with this thread, this one's using a work tree.
21:30Now the times that you really don't need work trees is if you have agents running on completely different areas of the code base. One last note about Cloud Agents. Make sure to set them up with a full environment, the same thing you would give your local environment.
21:45So local keys dot env dot local, all of the things that you would give to your local environment to make sure that it runs well, to make sure it has access to the different tools it needs, you also need to do that in the cloud environment. Each one, cursor, codex, cloud code, factory, they all have interfaces on the web in which you can go in and input your client secrets, input your environment variables, and you wanna treat that as its own environment and give it full power by doing so.
22:17Alright. Now one of the benefits of using a cursor or a factory or a Devon is that you have multimodal functionality.
22:27That means you're not completely dependent on an OpenAI model if you're using Codecs. You're not completely dependent on using an Anthropic model if you're using Claude Code. That's one of the benefits of using one of these alternatives.
22:39But why is multimodal important? If Anthropic or OpenAI has the most frontier model, the best model on the planet, why don't I just use that? Well, there's two reasons, speed and cost.
22:52Not everybody has infinite tokens. And if you have to be mindful about your token spending, using multiple models is actually a really good way to reduce your AI costs. Plus, if you're not using the top model all the time, you're actually going to be able to complete tasks faster.
23:09And let me show you how I do this. So here's an example of a multimodal workflow, and you can set this up as a skill, which is really cool. You can define in a skill which model to use at which point and for what use.
23:22So for example, let's say I'm building a brand new feature. I will do the planning with Fable.
23:28I wanted to look at my entire code base. I wanted to come up with a detailed plan about how to actually do and build this feature.
23:36But once I come up with this overall plan, I don't necessarily need a Fable level model to execute it, to actually write the code. In fact, a model like Composer is actually excellent at writing code.
23:51Maybe it's not as good at seeing around corners and knowing every little bit about the code base and planning this massive feature, But once that's done and it knows what to write, it is excellent at doing so. And then last, maybe I do the review with GPT 5.5. So after Composer wrote everything, rather than sending it back to Fable, I'm going to give it to a different model just to get an alternative viewpoint on what was written.
24:15So review the code. And all of this, again, can be written into a skill very easily. All right.
24:20Next, I have to share this because it is an unsolved problem. I have spoken to the OpenAI team. I've spoken to the Cursor team.
24:30I've talked to the best agentic engineers on the planet, and this is an unsolved problem, and that is merging and deploys. And specifically, if you have, like me, potentially a dozen agents running in parallel and you're trying to get all of that code onto production around the same time, it gets so frustrating and so slow.
24:50So let's say you have one agent that is looking to merge into main. They do so. And then all of a sudden, it kicks off the CI.
24:59It kicks off the deploy process. Great. You have to wait a couple minutes for that.
25:03Then the second agent, right around the same time, comes in, and it's like, Okay. I want to get my code into main as well. Let me do that.
25:11And then it says, oh, wait. There's new changes there. I haven't seen those changes.
25:16Okay. Let me rebase on my local repo.
25:20Let me rerun all of those tests. And then let me try merging again. And then once it finally does merge, it has to actually run all of those CI and deploy process again and again.
25:31And, basically, if you can imagine, you have a third one and a fourth one, and they're all trying to do the same thing on the same code base, and they start stumbling over each other.
25:43They start locking the commit process. They start locking the deploy process.
25:48What they're all just waiting. And then every single time one of them gets through, every other one of them has to restart the process completely. It's broken.
25:58There really isn't a good way to fix this. I've heard of a couple ways, but none of them are perfect. The only real thing to do is to just be patient, and one trick that I sometimes use is set up a bunch of PRs and then do batch commits.
26:14Just allow a single agent to look at all the changes, combine them, and then merge and deploy all at once. Definitely far from perfect. And in fact, it's such a known problem that literally today, Cursor just announced they're building their own Git alternative specifically built for agent scale deployment.
26:34So this is still a big problem. It's not really solved, and hopefully it will be soon. And again, one of the most important things in this entire video that I want you to go away with is automations and loops.
26:44And if you wanna learn more about loops, I made a whole video about it. Check it out right here.
The Hook

The bait, then the rug-pull.

There are levels to AI coding — and most people are still on the bottom floor. The host opens with a clean contrast: beginners prompt, wait, review, repeat; experts have already built the system that does all of that for them.

Frameworks

Named ideas worth stealing.

15:20list

The Three-Flywheel

  1. 100% test coverage
  2. Perfect documentation
  3. Exhaustive logging

Three properties of a codebase that can all be maintained automatically by agent automations. Together they create a self-correcting system.

Steal forAny team setting up an AI-assisted CI/CD pipeline
22:16model

Multi-model routing

  1. Plan with a frontier model such as Fable
  2. Write code with a fast mid-tier such as Composer
  3. Review with an alternate frontier model such as GPT-5.5

Match model capability to task complexity to reduce cost and latency without sacrificing quality.

Steal forAny skill or automation that spans multiple AI tasks
12:25model

Loop structure

  1. Trigger (what starts it)
  2. Action (what runs repeatedly)
  3. Goal (the end condition)

A minimal three-part template for any autonomous agent loop. The goal prevents infinite runs.

Steal forAny recurring quality-enforcement task
CTA Breakdown

How they asked for the click.

VERBAL ASK
26:35next-video
If you wanna learn more about loops, I made a whole video about it. Check it out right here.

Clean callback to the video strongest section at the very end — smart reinforcement of the Loop Library announcement.

MENTIONED ON CAMERA
Storyboard

Visual structure at a glance.

open
hookopen00:00
topic map
promisetopic map00:21
tools
valuetools00:50
skills
valueskills03:46
automations
valueautomations08:28
loops
valueloops11:29
cloud vs local
valuecloud vs local16:27
multi-model
valuemulti-model22:16
merge problem
valuemerge problem24:19
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

28:16
Matthew Berman · Review

MYTHOS MYTHOS MYTHOS

A first-look review of Claude Fable 5 and Mythos 5 from someone with early access: benchmarks, pricing, firsthand quirks, and two live multi-agent demos.

June 9th
44:52
Matthew Berman · Essay

It's starting...

A 45-minute walk through Anthropic's internal data showing AI crossed from coding assistant to primary engineer — and a frank read on what that means for humans.

June 5th
Chat about this