Big Idea

The argument in one line.

When code — not the model — carries results between sub-agents, the orchestrator context never fills, letting you chain 20, 30, or 100 agents without degradation or a mounting token tax at every join.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You already use Claude Code sub-agents or skill-based workflows and have felt the context bloat or sloppy orchestration after ten or more agents.
You run repeatable multi-step jobs (issue triage, code review loops, outreach drafting) and want them to auto-retry on failure without babysitting.
You are comfortable reading JavaScript and want deterministic, code-defined control flow instead of hoping the model keeps its place.

SKIP IF…

You are still learning the basics of Claude Code and haven't shipped a sub-agent workflow yet — the token-tax problem won't feel real until you've hit it.
You only run one-off tasks; the video explicitly says those don't need a workflow.

TL;DR

The full version, fast.

Claude Code 2.1.147 adds a Workflow tool that moves orchestration from the model into a JavaScript file in .claude/workflows/. Instead of every sub-agent result flowing back through the main context, results pass directly through the code so the orchestrator context stays flat no matter how many agents run. The workflow file exposes agent(), parallel(), pipeline(), schema typing, phase logging, runtime arguments, and a token-budget guard. Three live demos (Sentry triage, dead-code sweep, personalized outreach) show the pattern in practice. Reach for a workflow when a job is repeatable, fans out across multiple agents, or is long enough that a mid-run failure is worth auto-recovering from.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:21

01 · Introduction

Hook — announces the Workflow feature, shows /workflows slash command, previews a previous Sentry triage run.

01:21 – 03:18

02 · The Big Picture

Explains the old model-as-orchestrator pattern, the token tax at each sub-agent join, and why context fills and causes sloppy orchestration.

03:18 – 04:52

03 · How Workflows Look

Introduces the code-as-orchestrator concept, shows slide diagrams contrasting the two patterns, introduces agent() and loops in code.

04:52 – 06:21

04 · Making a Workflow

Walks through triage-sentry.js manually — meta, schemas, args, phase definitions, plain JS filtering.

06:21 – 07:42

05 · Workflow Demo

Shows the triage-sentry workflow running live — phase log, parallel fix agents, retrying, background execution.

07:42 – 09:31

06 · Running the Workflow

Continues live demo — navigating /workflows UI, pausing and resuming, watching the verifying stage.

09:31 – 10:14

07 · Workflow 2: Dead Code Sweep

While loop up to 8 rounds, parallel removal with test-and-revert safety check, early exit when no dead code remains.

10:14 – 11:10

08 · Workflow 3: Personalized Outreach

CSV leads, Haiku research stage, Opus writing stage via pipeline(), model-switching per stage.

11:10 – 12:24

09 · Results

Triage-sentry completes — 7 sub-agents, 400K tokens, three fixes verified. Personalized outreach output folder shown.

12:24 – 12:46

10 · Workflow Creator

Mentions a GitHub skill that teaches Claude Code to generate workflow files; expects Anthropic to ship an official one.

12:46 – 13:31

11 · The Toolkit

Rapid summary of all workflow primitives: agent, parallel, pipeline, schema, phase log, args.

13:31 – 13:49

12 · Budgets

Shows budget.remaining() in a while loop — token-aware self-stopping loops to prevent runaway spend.

13:49 – 14:24

13 · My Suggestion

Recommends asking Claude Code to audit previous sessions and identify workflow opportunities.

14:24 – 15:04

14 · When to Workflow

The three-condition decision rule: repeatable, fans out, worth resuming. Everything else: just let Claude do it.

15:04 – 15:10

15 · Conclusion

CTA to newsletter and masterclass.

Atomic Insights

Lines worth screenshotting.

Every time a sub-agent result passes back through the main orchestrator context, you pay a token tax — with 15 agents that tax compounds to serious money and serious degradation.
Moving orchestration into a JavaScript file means the main Claude session context stays flat whether you run 3 agents or 300.
The pipeline() primitive starts stage 2 on the first completed item from stage 1 — you don't wait for all parallel work to finish before the next stage begins.
Workflows auto-retry each failed sub-agent up to three times, which makes long jobs resumable without manual re-runs.
You can mix plain JavaScript conditionals and loops directly with agent() calls — a JS filter that drops low-impact Sentry issues runs for free before any agent is spawned.
A budget.remaining() guard in a while loop lets the workflow self-regulate token spend instead of running until an arbitrary loop count.
The /workflows slash command gives live visibility into running and completed workflows — phases, agent counts, tool calls, and token usage per agent.
For one-off tasks the video explicitly says to skip workflows and just run Claude manually — the overhead only pays off on repeatable or multi-agent jobs.
Schemas on agent() calls give downstream agents typed, structured data instead of raw text, reducing prompt-engineering needed to parse intermediate results.
The same five-step job: with model-as-orchestrator you pay a token tax at every arrow; with code-as-orchestrator you pay nothing at each join.
Conditional branching is reliable when it lives in code — when it lives in a filling context window it degrades over time.
Parallel outreach research with a cheap Haiku model, then a switch to Opus only for writing, is a real cost pattern the workflow makes easy to express.

Takeaway

Replace the model with code and pay nothing at every join.

WHAT TO LEARN

The token tax in multi-agent pipelines is not a fixed cost — it multiplies at every sub-agent join, and moving orchestration into a JavaScript file eliminates it entirely.

01Introduction

The Workflow feature shipped in v2.1.147 as an opt-in with a /workflows slash command for browsing history — visibility that previously required parsing scrolling terminal output.

02The Big Picture

Every sub-agent result that passes back through the main orchestrator context burns tokens twice — once as output, once as input to the next agent — and this tax compounds with agent count.
An orchestrator running 15 agents degrades because its context window fills with intermediate results it never needed to hold.

03How Workflows Look

Replacing the model orchestrator with a code file means joins between agents are free — no token cost at handoff, no context growth, no degradation over long runs.
Conditional branching and loops expressed in JavaScript are deterministic; the same logic in a model context drifts as the context fills.

04Making a Workflow

Plain JavaScript runs for free between agent() calls — a filter that drops low-impact issues before spawning agents is a zero-cost code operation, not a model operation.
Schemas defined at the top of the file and passed to agent() calls give downstream stages typed data, removing the prompt-engineering needed to parse free text.

05Workflow Demo

The phase log in /workflows gives real-time visibility into which agents are running, how many tools they called, and how many tokens they spent — replacing the wall-of-text problem.

06Running the Workflow

Workflows run in the background — the main session stays interactive while agents execute, and multiple workflows can run simultaneously.
Auto-retry per sub-agent means a failed MCP call does not abort the whole workflow; Claude Code retries that agent before moving on.

07Workflow 2: Dead Code Sweep

A while loop with a round counter and an early-exit condition is the correct pattern for iterative cleanup jobs — it terminates when the job is done, not after a fixed number of runs.

08Workflow 3: Personalized Outreach

pipeline() is the right primitive when stage 2 can start on item 1 before all stage-1 work is done — it reduces total elapsed time compared to parallel-all-then-batch.
Assigning different models to different pipeline stages is a cost pattern that only works cleanly when orchestration is in code.

11The Toolkit

The six workflow primitives — agent, parallel, pipeline, schema, phase, args — cover the full space of fan-out, fan-in, sequential, and conditional patterns without needing a model to manage control flow.

12Budgets

budget.remaining() turns a while loop into a token-budget-aware run that self-terminates when budget is depleted, preventing open-ended loops from exceeding cost targets.

14When to Workflow

Repeatable, fans out, worth resuming — any job that fails all three conditions is a one-off and should be run directly without a workflow file.
The value of a workflow file is highest when the job will run many times; the upfront cost of writing the file amortizes across every future run.

Glossary

Terms worth knowing.

Workflow tool: A Claude Code feature (enabled via CLAUDE_CODE_WORKFLOWS=1) that reads a .js file from .claude/workflows/ and runs its defined agents and phases as a managed background process, visible via /workflows.
Token tax: The cumulative cost of passing sub-agent results back through the main orchestrator context window; grows with each agent handoff and causes the orchestrator to degrade as its context fills.
Code-as-orchestrator: The pattern where a JavaScript file — not the model — decides sequencing, conditionals, and data passing between sub-agents, keeping the main session context flat.
pipeline(): A workflow primitive that streams items through sequential stages, starting the next stage on each completed item rather than waiting for all items in the prior stage to finish.
parallel(): A workflow primitive that launches N agent() calls simultaneously and waits for all of them to complete before continuing.
schema: A typed return definition attached to an agent() call that tells the sub-agent what structured shape to return, so the next stage can reference named fields without parsing free text.
budget.remaining(): A workflow-runtime property that returns the number of tokens still available in the current budget, usable as a while-loop guard to self-terminate expensive loops before they overspend.

Resources

Things they pointed at.

12:33toolWorkflow Creator skill (GitHub) ↗

15:04linkMaster Claude Code masterclass ↗

Quotables

Lines you could clip.

02:05

“Four sub-agents is not one tax, it is a tax at every join. The more you fan out, the more you pay.”

Visceral one-liner that reframes the cost model for anyone who has run multi-agent workflows.→ TikTok hook↗ Tweet quote

03:25

“Same five steps — one run pays a tax at every arrow; the other pays nothing.”

The before/after contrast is complete in one sentence, no setup needed.→ IG reel cold open↗ Tweet quote

06:10

“Reviewing is no longer the model's decision, it is the file's.”

Tight philosophical punchline about determinism vs. model drift.→ newsletter pull-quote↗ Tweet quote

14:38

“Repeatable, fans out, worth resuming — that is the sweet spot.”

Portable decision rule, stands alone.→ TikTok hook↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphor

00:00Okay. So a few hours ago, Anthropic added a brand new feature that may fundamentally change the way that you use Cloud Code going forwards, and that is by adding the workflow tool for deterministic multi agent orchestration. Now they haven't yet made an announcement about this feature, so by watching this video, you're gonna be ahead of the curve.

00:17So I'll be going through exactly what the feature is and how you can use it to be a Cloud Code power user. So they do say that it is off by default, and we have to set this environment variable to enable it. So I will copy this over, go to the terminal, paste this in, add a space, and then run cloud.

00:32And after doing that, you will see we have a brand new slash command, which is slash workflows, and this will allow you to browse your workflow history running and completed. So going over here, I have no workflows I have run-in this session.

00:44So we basically gotta go ahead and make ourselves a workflow. But, essentially, I can go back to a previous session where I was testing out the workflows feature. And if I do slash workflows inside of here, then I can see that these are all the workflows that I previously ran.

00:57So I ran this, like, triage workflow for Sentry, then I can see that it looks like this. So we have different stages of a workflow that I predefined, where each workflow is running as a individual subagent. Then I can press enter on any of these workflows, and then I can see the run that happened, all the tool calls that were made, the prompt, and so forth.

01:14And then I can go over to any stage of the workflow and see all the different agents that have run inside of the workflow, how many tools they use, tokens, and so forth. Okay. So before actually going ahead and making a workflow, I wanna talk about what people have been doing until now before the update and why this new approach is better and solves a lot of the problems that people have been previously facing.

01:33So until now, you may have had some kind of skill that described a set of steps, a workflow, whereby Claude, your main session would be the Orchestrator, and then tell a sub agent to do something like implement a feature. It would then pass that information back into main session, and that would then go into the next sub agent.

01:51That would then do a review, pass it back to main session, go to next sub agent, and so forth. So, essentially, the orchestrator is deciding what to run next and holding every intermediate result. And this is the old, less effective way of making workflows that people have been doing until now.

02:05Now this can have some drawbacks. The first of all being a token tax. So every time Cloud Code spins a persuasion and then it gets a result back, that goes into the main context or note of the orchestrator, and then it's passed again into the next sub agent, goes back, and so forth.

02:20And this can consume quite a lot of tokens in your main session, especially if we have something like ten, fifteen sub agents that run. And your main orchestrator would do worse over time as a context window fills up with all these results, and these tokens are going back and forth unnecessarily between the main orchestrator and the sub agents.

02:37So ideally, should be passed directly from one to the next without ever entering the context window of the orchestrator. We also have some other problems whereby you may trigger a workflow and you have no visibility to what is going on. You basically see some kind of like a wall of text scroll by as a workflow is triggering over the next twenty, thirty minutes.

02:56And finally, because the orchestrator's context window is filling up, bypassing tokens back and forth between different sub agents, eventually, it just starts forgetting and acts more sloppy and lazy, which is not what you want from an orchestrator. And this forgetting in the orchestrator can be even worse if you have conditionals.

03:11So you may have a conditional like, if we get this particular result, then spawn another sub agent to verify it. If not, don't do that. So then the Cloud Code team were like, okay.

03:20So what if instead of having a nondeterministic model being a wrapper of all these sub agents, we had a code wrapper instead? So instead of a model passing information back and forth between sub agents and incurring a token tax every single time and also getting sloppier the longer it runs, what if we just had a workflow file, some kind of code that could pass the result directly without ever entering the main conversation, which means that you can have, like, twenty, thirty, 100 agents run one after another and the context of the main Orchestrator never actually filling because the Orchestrator is now code instead.

03:54So the way that the workflow tool works is we have a workflow dot j s file, which is a JavaScript file, and this basically defines how our workflow is gonna look. So we have phases that we can define. So for example, we can have a review phase, and then we can define an agent that runs, like does a basic review.

04:11This is a prompt. And then if the review passes, it would just end as it is by doing a return. If it doesn't, then it will fix the issues.

04:19And because this is code, we can also implement loops as well. So you can see in this example, we would have, like, implement the feature. And then over here, it would do up to three reviews.

04:28And if the review passes, then it would just break. If it doesn't pass, then it would fix it with a different agent. So every time you see agent over here, that is a different sub agent running.

04:37And what's interesting about this as well is that we can define schemas as well. So we want this review sub agent to return a schema of, like, either past or issues that we can then reference in the next agent. Okay.

04:49Now let's go through the process of actually making a workflow. So I'll be manually going through the file so you can see what it's like behind the scenes. But, ideally, you will be getting Cloud Code to make you workflows instead.

04:59So you wanna go to your dot Cloud folder in your project and make a brand new folder, which is workflows. And then inside of this, we can define a workflow. So I'm gonna call this triage sentry dot j s.

05:10So this will be a JavaScript file. And, essentially, at the very top, we have to define a meta for the workflow. So the workflow meta looks kind of like this.

05:18We have the name, the description that will appear inside of Cloud Code, and then we have the different phases as well. So then we can define some schemas. So this will be one of our schemas, which is an issues schema.

05:30So you can see over here, it has the issue ID, the title, and the user count of how many users are affected. Then we will define another schema here, verdict, which is whether or not it was fixed and any notes about this.

05:42And then because we can pass in arguments into our workflows, for example, you can see an argument would be passed in kinda like this, Minimum number of users affected, 20 users.

05:51We have to then pass the arguments so they are loaded in properly. So it looks kinda like this. So what happens here is that 20 will be the default.

05:59So any issue that affects more than 20 users on Sentry will be loaded in unless we have specified an argument instead. So firstly, we'll define our first phase that will load in the issues. So phase over here will pull in issues.

06:13So we define an agent. So you can see that it says, use a Sentry MCP to list unresolved issues for each return its ID, title, and affected user account. And the schema that we define up here for issues is also passed into agent, so it knows what kind of schema to return back.

06:27Now the interesting thing is that we can now define some plain JavaScript. So if we were to write something like this, then we can basically filter down the issues that are returned from two ones that are above the threshold that we defined earlier. So this will then log inside of the workflow for us to see.

06:42And if no issues are present that are big enough, then it will just end the workflow as it is right over here. So say fixed, no issues affecting more than the threshold number of users. And now next up, we can define a pipeline.

06:55So this is our pipeline. And essentially, all the big issues that we have will be passed into a two stage process. The first stage, what it will do is that for each of the issues that are returned so let's say we have five issues that are returned, it will then investigate and fix the issue, and then it will go to next stage of verifying that the fix is actually real and working.

07:15And then finally, at the very bottom, we can say this of how many issues were actually fixed, how many, uh, issues we found, and the final results. So the interesting thing that you will notice is that we're basically mixing in plain JavaScript with the sub agent that will be running inside of Cloud Code.

07:31So the issue ID from earlier, the title, and the user account are being passed directly into the prompt. So they're not going back in through the main orchestrator. They're going in through the prompt instead.

07:41So after that, triage sentry, then you will see that it appears over here with the workflow tag next to it. And if I press enter, then you will see that it begins to trigger a workflow. So it's triggering the triage sentry workflow, and it's running with zero out of one agents right now.

07:57It has one out of three phases, and then I can see right over here as a background workflow. So if I go over here, then I think I have to zoom out. I can see right now it's pulling the issues.

08:07If I press enter, then I can see that, uh, it's running here. Press enter again.

08:12I can see the different tools that it's calling and the prompt and stuff like that, and then I can go back, or I can pause this workflow as well by pressing the p command. So let me go back to workflows and then press p again to resume it. And you can see the workflow has now returned a result of 25 unresolved sentry issues.

08:29So if I go back and then go over to next stage here, which is a fixing stage, then I can see that it only decided to spin up three sub agents because only three of them were affecting more than 20 users. So all of these sub agents are running in the background right now, and I can skip any of the sub agents by pressing x or retrying them.

08:46And one nice thing about this is that we have automatic retrying. So if one of the sub agents were to fail for any reason, like the MCP server stopped working, then Cloud Code would then retry that particular sub agent. And then if I press down, I can see that the verifying has not been started yet.

09:00So whilst this workflow is running in the background, I can just continue to send messages to Cloud and work with Cloud, or I can trigger another workflow as well. So I can have multiple different workflows running at the same time. So by default, it's using the Opus model of the main session, but I could define a different model.

09:16So whilst we're waiting on this, let's go through a different workflow. So here is another workflow that I made, what's called dead code sweep. So find and remove unused code round by round.

09:27So we don't have any arguments. We have the meta defined over here, so the name and description. Then we have some types defined over here, and then we've defined our own variables.

09:36So how many rounds do we want in this loop? So then we have a while loop right over here whereby one agent will find unused code in the code base. It will then list it out according to schema that we defined earlier.

09:49If no dead code has been found, then it would basically end. But if dead code has been found, then for each of those issues, each of that dead code, it would then remove it one by one. And then once this is over, it will then run another time up to eight times, and it will exit early if it finds out there's no more dead code available.

10:07So that's a pretty good example of combining loops inside of these workflows and adding some conditional logic as well. Another example is for personalizing outreach. So you may wanna load a set of leads from a file, research them inside of SubAgent, and then draft and save a personalized message.

10:23So we have our leads object up here, then it loads in any arguments such as the leads file, where we want our emails to be stored after we have written it, and then we have the different phases. So first phase will load in the leads from the file. We could also load it in programmatically if we have structured data like a CSV.

10:41So the first pipeline will research the lead, and you can imagine this, like, does research with a cheap API or cheap MCP server. If it can find anything for that lead, then it would switch over to a more expensive one in a conditional. And you can see this first model is running with Haiku over here, and then any information will be passed into the next stage where it will basically write that copy.

11:03Now, of course, this is a simple example. You may wanna make it more complicated for your own workflows that you have defined. And now the workflow has completed.

11:11So it used seven sub agents, 400,000 tokens, and this was a stage that I went through. So we verified each of those fixes successfully, and, yeah, this looks pretty good.

11:21Now let's actually do our personalized outreach workflow. So I have a list of leads over here that I'm gonna be contacting, and this is completely for demo purposes.

11:30And now our workflow is underway. So going over to workflow, we can see that it's loading the leads right now. And now we have eight research agents, all high q ones spawned in parallel with one agent per lead, and that's because we have eight leads inside of our CSV.

11:45And now after it researches all of them, so it seems to be going pretty fast, then it will move on to next stage where it will write eight different messages ideally for each of the leads separately. And you can kind of imagine a workflow whereby, like, it couldn't really find the contact details using a lighter model, and then it switches over to a heavier model, and it uses a more expensive APIs to get those details for the leads.

12:07So I finished writing the personalized outreach with the Opus model. So we can then see we have a brand new folder with the output, and then we have all the personalized outreach for Bill Gates or whatever, blah blah blah. Like, this is outreach.

12:20And, of course, I don't think this would get a reply, but, like, this is for demo purposes. Now to make making a workflow easier for you, I have a workflow creator scale that you can download from below from my GitHub. And this basically teaches Cloud Code how to make a workflow file, all the functionality that is available, and stuff like that.

12:36But I think once the feature is officially announced, then maybe Anthropic will add their own official workflow creator skill, in which case you should use that once it is released. Anyways, so to summarize, we have a toolkit over here. We have an agent where we spawn one fresh sub agent every time.

12:51We can run them in parallel, so we can batch, like, 10 agents in parallel, wait for all of them to complete. We can use pipeline where streams items through stages instead.

13:01So for example, with the lead outreach workflow, if you are paying close attention, you may have noticed that as soon as one research agent was done, then the next writing a message agent would start immediately rather than waiting for all eight research agents to be done, and that is because of the pipeline here. So we can combine these both together, parallel and pipeline.

13:19Then we have a schema, so we get a structured answer back. We have the phase log, which basically gives us a live view of what is happening. And then finally, we have the arguments that would be passed in as well.

13:30Now we also have budgets when it comes to our workflow too. So there is a budget parameter. So for example, in this case, we can have a while loop.

13:38And whilst the budget remaining is more than 50,000 tokens, we can try and find end of a bug, for example. So this can kinda keep your workflow structured and prevent them from growing out of control. Now I would suggest basically getting Cloud Code to look for your previous sessions, identify any opportunities for making workflows, and make workflows around them.

13:57So for example, if you're a big fan of Ralph Loops and you like using Ralph Loops to quickly go for a backlog of issues, for example, then you can make a workflow where it loads in the issues from GitHub. It will then go through a loop whereby for each issue, it will then make a fix.

14:13It will do verification. It may also do an adversarial review as well, and then it will move on to the next issue. So I've made a whole bunch over here that I am testing myself, like implement and review.

14:24So when should you be reaching for a workflow? Firstly, anytime you wanna do something repeatable, so you will be doing it over and over again, probably every single day, anytime you want to fan out agents, for example, based on conditionals or loops or getting some data in some way, and anything that may seem long enough to fail halfway.

14:42So you can split that down into workflow, which is automatically resumable because Cloud Code will retry each sub agent up to three times if it does fail. But for any one off task, you probably should just select Cloud Date manually. There's no point making a workflow because we get to take advantage of the fact that the results aren't being passed back and forth from the main session and the sub agents.

15:02It just goes directly because of the code. And if you want to get free insights delivered from me on a regular basis, then you may want to join my newsletter as well linked

The Hook

The bait, then the rug-pull.

Claude Code 2.1.147 shipped a feature Anthropic hadn't announced yet: a Workflow tool that replaces the model-as-orchestrator pattern with a JavaScript file, and with it, the token tax that compounds with every sub-agent handoff.

Frameworks

Named ideas worth stealing.

14:24list

When to Workflow (3 conditions)

Repeatable — you will run it over and over
Fans out — conditionals, loops, or parallel agents based on data
Worth resuming — long enough that a mid-run failure is costly

A portable decision rule for deciding when to invest in a workflow file vs. just running Claude manually.

Steal forAny decision framework for AI automation — use as an audience filter or CTA hook

03:18model

Code-as-Orchestrator Pattern

Replace the model orchestrator with a JS file — agent(), parallel(), pipeline(), schema, phase(), args, budget — so context stays flat and joins are free.

Steal forExplaining deterministic multi-agent architecture to a non-technical audience

CTA Breakdown