Modern Creator
Mike Szach | AI Automation · YouTube

13 Ways to Save Claude Code Tokens

A ranked walkthrough of every structural fix that extends your Claude Code session before you hit the limit.

Posted
yesterday
Duration
Format
Tutorial
educational
Views
29
2 likes
Big Idea

The argument in one line.

Most Claude Code token burn is structural not conversational - a lean CLAUDE.md, /compact at 60%, and Sonnet as default eliminate the bulk of it before you need any advanced tooling.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You hit the Claude Code session limit mid-project and want a concrete checklist to extend your runway.
  • You use Claude Code daily but have never audited your MCP server list or measured your CLAUDE.md token footprint.
  • You default to Opus for all tasks and want to know what you are actually losing by switching to Sonnet.
  • You have tried batching prompts or using plan mode inconsistently and want a structured rationale for each habit.
SKIP IF…
  • You already run a lean CLAUDE.md, compact at 60%, and audit MCPs regularly - the advanced tips are the only new ground.
  • You do not use Claude Code; the commands covered are CLI-specific and do not apply to Claude.ai chat.
TL;DR

The full version, fast.

Claude Code token drain has three structural root causes: a bloated CLAUDE.md that reloads every session, MCP servers consuming context even when idle, and running Opus where Sonnet is sufficient. This video presents 13 fixes tiered by complexity starting with keeping CLAUDE.md under 200 lines and running /compact at 60% context fill, then adding Sonnet as default, @ file references, and an MCP audit, then finishing with the Caveman skill (65-87% output reduction), live command monitoring to catch reasoning loops, and Codex handoff for bugs Claude cannot resolve. The six-item quick-start combo at the end is the highest-ROI entry point for anyone who wants immediate gains without touching advanced settings.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:29

01 · Intro

Context-limit screen shown; 13 tips promised ranked basic to advanced.

00:3001:49

02 · Tip 1: Lean CLAUDE.md

Keep under 200 lines / 2,000 tokens. Use conditional triggers for larger instruction sets.

01:5002:38

03 · Tip 2: Run /context regularly

Audit token spend per category; identify MCP servers eating context silently.

02:3904:40

04 · Tip 3: /clear and /compact

/clear starts fresh; /compact at ~60% summarizes history before it stacks up. Never wait past 90%.

04:4105:56

05 · Tip 4: Default to Sonnet

Opus costs 2x tokens; use Sonnet for everyday tasks, Opus only for major architectural work.

05:5706:44

06 · Tip 5: @ file references

Pin exact files with @ so Claude does not traverse the full project tree.

06:4507:18

07 · Tip 6: Keep external docs lean

Only attach what is directly relevant; trim irrelevant PDF pages before attaching.

07:1908:34

08 · Tip 7: Disconnect unused MCPs

/mcp audit; CLI is cheaper than MCP. Disconnect anything not actively used.

08:3508:58

09 · Tip 8: Batch prompts

Combine small tasks into one message list; follow with /compact.

08:5910:05

10 · Tip 9: Use plan mode first

Read-only exploration before execution prevents token-heavy trial-and-error loops.

10:0610:38

11 · Tip 10: Work off-peak

Schedule heavy sessions outside 5AM-11AM PT / 1PM-7PM GMT peak windows.

10:3911:43

12 · Tip 11: Install Caveman skill

65-87% output token reduction. Three intensity levels: lite, full, ultra. Install from GitHub.

11:4413:00

13 · Tip 12: Watch Claude live

Monitor thinking tabs and commands; interrupt 4-5 repeated attempts immediately.

13:0113:27

14 · Tip 13: Codex for debugging

Hand unsolvable bugs to Codex; return the solution to Claude rather than burning tokens on loops.

13:2814:14

15 · Bonus quick-start combo

Six highest-ROI habits: MCP audit, batch prompts, plan mode, @ refs, compact at 60%, off-peak.

14:1514:56

16 · Conclusion

Full three-tier grid recap on screen.

Atomic Insights

Lines worth screenshotting.

  • CLAUDE.md loads into context at every session start, so a 500-line file costs tokens on every single message you send.
  • Running /compact at 60% context fill costs far fewer tokens than waiting for 90% because the summary is shorter when started earlier.
  • Opus costs twice the tokens of Sonnet; using it for a font change is like hiring a surgeon to change a lightbulb.
  • MCP servers consume context even when you never call them - disconnecting unused ones is a one-time free saving.
  • The Caveman skill forces terse responses and benchmarks show 65-87% output token reduction with no reported accuracy loss.
  • Batching five small UI fixes into one prompt is cheaper than five separate prompts because each message reincludes the full context.
  • Plan mode keeps Claude in read-only exploration - no file edits until you approve, eliminating token-heavy trial-and-error.
  • When Claude loops on the same problem four or five times in a row, stopping immediately saves more tokens than continuing.
  • CLI tool calls are cheaper than MCP for the same operation - prefer CLI when both options exist.
  • Peak hours (5AM-11AM PT / 1PM-7PM GMT) correlate with slower responses; scheduling heavy sessions off-peak is free.
  • Using @ to pin a specific file stops Claude from reading the entire project folder to find what you meant.
  • A conditional trigger in CLAUDE.md loads large instruction sets only when filename matches - pay for them only when needed.
  • Handing a stuck bug to Codex and bringing the solution back to Claude is cheaper than watching Claude attempt the same fix repeatedly.
Takeaway

Six habits that stretch every Claude Code session.

WHAT TO LEARN

Token limits are mostly a structural problem, not a usage problem - and the structural fixes are free, one-time, and compound across every session.

  • CLAUDE.md reloads into context on every single message, so every line over 200 is a recurring cost you pay indefinitely, not just once.
  • Running /compact at 60% context fill is cheaper than waiting for 90% because the summary Claude writes is shorter when there is less to summarize.
  • Sonnet handles the vast majority of coding tasks at half the token cost of Opus; upgrading is only justified for complete architectural restructuring.
  • MCP servers consume context even when you never call them in a session - a one-time audit to disconnect unused ones is a permanent free saving.
  • Using @ to pin a specific file stops Claude from reading the entire project directory, which matters most in large repos with many files.
  • Plan mode forces a read-only exploration phase before any file writes happen, eliminating the token-heavy back-and-forth of trial-and-error execution.
  • A reasoning loop of four or five repeated attempts is a signal to stop immediately - continuing costs tokens and rarely resolves without a reset.
  • The Caveman skill reduces output tokens by 65-87% by forcing terse fragment responses with no reported accuracy loss on coding tasks.
  • Batching five small fixes into one message is cheaper than five separate messages because each message reincludes the full context history.
  • Scheduling heavy sessions outside peak hours (5AM-11AM PT) reduces latency and effective cost without changing any model settings.
Glossary

Terms worth knowing.

/compact
A Claude Code slash command that summarizes the current conversation into compressed context so Claude retains the gist without holding every message verbatim.
/context
A Claude Code slash command that shows a token-by-token breakdown of what is currently loaded in context, including MCP servers, skills, tools, and conversation history.
CLAUDE.md
A markdown file in a project root that Claude Code loads automatically at every session start, functioning as a persistent system prompt.
Caveman skill
A third-party Claude Code skill that forces terse minimal responses to cut output token usage by 65-87%, with three intensity levels: lite, full, and ultra.
Plan mode
A Claude Code execution mode that restricts Claude to read-only exploration so it can research and propose a plan but cannot write or edit files until the user approves.
MCP server
A Model Context Protocol server that extends Claude Code with external tool access; each connected server adds tokens to every context load even when unused.
Codex
OpenAI coding agent referenced as an alternative to hand off debugging tasks when Claude Code enters a reasoning loop it cannot resolve.
Resources

Things they pointed at.

13:01productOpenAI Codex
Quotables

Lines you could clip.

00:53
CLAUDE.md is something that Claude will load at every session start and keep in its context forever. It makes sense to keep it under 200 lines or 2,000 tokens in total.
Counterintuitive for new users - the file most people treat as free is the biggest token cost.TikTok hook↗ Tweet quote
04:07
Never wait until 90 to 100%. Always auto compact much earlier. Around 60% is a good number.
Specific actionable non-obvious threshold.IG reel cold open↗ Tweet quote
12:37
If you see four or five of those paragraphs in a row, you know Claude is coming across a problem it cannot solve right now. Stop at that point.
Practical interrupt rule with a concrete threshold.TikTok hook↗ Tweet quote
13:57
If you do those six, you are already in power user territory.
Punchy payoff line.IG reel cold open↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

00:00You are almost at the end of your Clothcode session. You are just about to finish the project and send it off to a client, and then you get hit with this. We've all been there.
00:08All Clothcode models are expensive as hell. So in this video, I'm gonna show you 13 tips that you can use today to start saving tokens when using Clothcode. And even if you are beginner or an advanced Clothcode user, you'll still get some value from this video even if it's going to serve as a reminder of what to do and what not to do.
00:27So I would give it a watch anyway. And I've ranked all of my tips starting from basics moving on to mid and more advanced so we can see the complexity bumping up slowly. So with that said, let's start with the basic ones.
00:39So starting with tip number one, and that is to create and maintain a lean clot dot m d file. And that alone is gonna be the biggest saver in terms of tokens, and it can save you as much as 40% session wide, which is absolutely huge.
00:55So Clot dot m d is something that Clot will load at every session start and kind of keep in its context forever. Right? So it's like a master prompt, and it makes sense to keep it under 200 lines or 2,000 tokens in total.
01:10But, uh, 200 lines is a good target to aim for. So mine, if I showed you mine, for example, is around 75 lines long.
01:20There you go, 74. So I kept it very short and to the point and only instructed Claude with, uh, the necessary stuff. You want to keep yours under 200 lines.
01:31And for any larger projects where you need large set of instructions, I would actually use triggers. So I would use something like, if file name matches this, then load this document, and then that will direct Cloz to load a specific markdown file from within your project files.
01:49And that's actually a good way to, uh, manage the context as well. Because that way, Clog will only load this as and when it needs to. And so moving on to tip number two, and that is to run slash context regularly.
02:01And what that's gonna show is amount of tokens that you spend on each of the different things like MCP servers or skills or tools and basically anything else that Cloth has access to at any time. And the idea here is that you have the visibility into what Cloth is spending their tokens in.
02:22And then if you notice that there's a lot of tokens going into one area like MCP servers, then, you know, you're gonna have to disable some of them. And I will show you how to do it later on in the video. But that's literally as simple as coming to Cloth and then doing slash context.
02:39And then that way, I'll run the context where you can see exactly where your Cloth is spending the most amount of tokens on. And this is something that you would want to do regularly, so it's a good habit to check that from now and then.
02:51And so the tip number three that I've got is couple more commands. So use the slash clear and slash compact.
02:59Those are probably the commands that I use the most. And what they do is slash clear basically clears the current conversation and starts a new new session.
03:11Well, it doesn't clear. It doesn't delete the conversation, but it just starts a a new session. And then that way, Clothe doesn't have to have all those messages that needs to be read every time you send it a message.
03:24But instead, you kind of start off fresh and then slash compact. What that's gonna do is when you have a long conversation with Claude so for example, this one here, and you would want Claude to still have some context of what was said.
03:40Instead of just doing slash clear, you can do slash compact. And then what that's gonna do is that will compact and summarize all of this information that was said here.
03:51And then going forward, Cloth will only have access to the summary, which, uh, will obviously cause Cloth to spend have to spend less tokens having to read all of this for slash context. I usually run it at around 50 or 60% capacity.
04:06And what that means is whenever you're chatting to Claude, at the bottom here, you will see the little square icon, and, uh, you will see that when the context start filling up.
04:18And that will basically tell you what the percentage is that the context has been filled out with. I don't tend to wait until it's 90 to 100% because then Claude has access to all of that context.
04:34So I would say never wait until 90 to a 100. Always auto compact much earlier. So around 60% is a good number.
04:41If you go above that to, like, 70 or 80, then that's completely fine as well. But 60 is a good number to aim for. And moving on to tip number four, and that is to default to Sonnet for everyday tasks.
04:53Because at the moment, we've got different cloud models that we can use. So we've got Haiku. This is the cheapest one.
04:59We've summit. Got And then we've got Opus, is the biggest and the best model that we have. And it's generally a bit of overkill to use something like Opus for simple tasks and simple edits, like changing the font or moving in an image on a on a website.
05:14And, actually, many users leave it on by default, and that causes them to spend a lot more tokens, twice as much tokens when compared to SONets to be exact. And so if I come over to Versus codes, if you want to change the model, you click on this icon right here, and then that will allow you to switch model right here.
05:33And then as you can see with the default, it's on at 4.6 as of right now, which is what I use for any everyday docs. And if you're working on anything a lot more complex, like the complete restructuring of the of the projects, then you can try with Opus. At the moment, it's Opus 4.8.
05:51And as you can see, it will consume two times as much usage versus Sonnet. So just be careful with that, and only use it when it's absolutely necessary. So the next step that I have here is to use at when referencing any specific files.
06:07So in your Versus code over on the left hand side, you've got different files and folders. And whenever you want Claude to be able to access any specific ones, instead of saying, oh, access agents.md for me, you would do at and then you can start typing the name of the file.
06:26So agents, and then that way you can click on it, and then it will direct Claude straight to that dot m d file. So that way Claude doesn't have to go back to the the full folder and have to read through every single file.
06:38It'll know exactly where to look. So it's a simple habit to implement, but I can save a little bit of time as well. And, of course, tokens.
06:45And moving on to tip number six, it kind of goes hand in hand with the previous one. So do not paste any unnecessary documents and files into the project or chat. And the reason why is because we don't wanna give Clot too much relevant context.
07:00You know, if you have, uh, the long PDFs with the SOPs, for example, or any other, uh, important documentation, always make sure that whatever you give it is actually relevant to what you're working on. And then if it's not, then might be worth, like, going back to the PDF and removing the pages that you don't need.
07:16And then that way, can keep everything clean. So if while I go back to Versus Code, if you want to attach a file to the chat, all you do is click on the plus button, and then you can you have the option to upload from computer.
07:30Or alternatively, you can paste the file inside of the the folder itself. So moving on to the mid tier, we've got tip number one, disconnect unused MCP service. So before, I've talked about the slash context where you can see how much of your context is going to MCPs.
07:47So if you want to disable anything or if you want to take a look at what Claude has actually connected with, all you do is you type slash m c p, and then that will show you the exact m c p servers that Claude has access to. As you can see, some of mine are disabled here.
08:04But if you see any that you know you're not gonna need, then you just click them, and then you have the option to disable right here. So this is how you can go through and audit your MCP servers.
08:17And it's generally better to use something like CLI instead of MCP anyway because those consume a lot more a lot more context. So just have a little think whether the MCP's that you have are actually needed, and it's go through the list and disconnect the ones you know you're not gonna use. So it's a good little one off task to do.
08:35And moving on to tip number two, and that's gonna be to batch your prompts into one single message. So whenever you're working or testing an app and you notice the different box or different things you want to improve, I would actually batch the small ones.
08:53So if you have, like, three or five, you can create a list, and you can just give Clot the whole list at once. So for any small tasks and improvements, this is exactly what I do, and follow that with slash compact if needed. And the tip number three that I've got is to use plan mode before executing any real task.
09:10So in Versus Code, you've got access to different modes. So I've got this right here in the bottom right corner. If you click this, you'll see four different options.
09:19So they do exactly what it says. So ask before this edit automatically, plan mode, and then bypass permissions.
09:26So plan mode is what I usually start with. So whenever I'm starting a project or I'm starting any major improvement, I always switch to plan mode.
09:35If you're using the Cloudbot MD that I give you, it actually says here that it will default to the plan mode whenever working on a substantial task, but you can always just switch it manually before you start chatting to it. And then after that, you can move on to something like permissions if you want to or edit automatically.
09:53You know, it's totally up to you. But plan mode is a good starting point whenever working on a substantial task. And because, of course, before you actually approve the plan, a thought stays in read only, so it won't make any changes.
10:06It will just, uh, read whatever it needs to read, do a bit of research, and then come up with, uh, the plan of action, and then that's a good way to save tokens as well. And there's actually one more tip that I have for you here. So is to work outside of peak hours.
10:18So as of the time of this recording, so June 2026, the peak hours are between 5AM and 11AM Pacific time or 1PM to 7PM GMT. So what I would do is I would schedule the heavy sessions of those hours. So early mornings, late nights, whether, uh, the time that is for you, and weekends as well if you can.
10:39I know it's not always possible, but for any kind of heavy tasks, I'll definitely do them outside of those hours and then use those hours to plan properly what you're gonna code and build with Clothcodes. So finally, moving on to the advanced tips and number one is to install the caveman skill. And then that itself is going to save you anywhere between 6587% of output tokens reportedly.
11:05So quite huge saving if you ask me. And what that is is that's a skill that forces Claude to respond or even think in very short and to the point sentences.
11:16So a bit like Kevin from of the the office episodes. So few words to trick. Right?
11:22So this is the kind of idea there. And then you can install it from this GitHub here. I'll give you a link right here so you can click it and then access it.
11:29And then, basically, it will go through the files and install them for you. And after that, you're good to go, basically. So it'll just output things like tool work, result, done, move on.
11:39Right? And this is probably the peak, if you ask me, in terms of the token savings. So the tip number two that I've got here is to actually watch Claude do the work, and that's probably the least fun out of all of them.
11:52Uh, because what you could be doing is, uh, whenever Claude is working on a task, you would go through and then just read whatever Claude is up to. So just open up the thinking tabs, see what Claude see what kind of approach Claude is thinking about, and then see what commands it's running, what kind of files it's accessing, and what edits as well.
12:13And then just make sure that this is more or less in line with what we are what you're trying to achieve. And so, you know, instead of just leaving cloth to do the work and then going off to do something else, you will actually sit there and then watch and read.
12:26Because cloth tends to get stuck in loops, and I've seen this so many times where it's like, okay. Let me try this. Oh, no.
12:32Actually, but this won't work. Let me try this instead. And it's like, if you see four or five of those paragraphs in a row, even less than that, then you know cloth is coming across a problem that I can't solve right now.
12:42So it's always better to just stop at that point, uh, maybe even do slash clay and then start from scratch, or you can do one other thing that, uh, I'll mention in a second. But, um, it's usually good idea to just, at this point, just stop.
12:54So you can press this icon right here, and then I'll stop. And then you can go back to the plan, come off the plan again, and then go from there. And then finally, the tip number three that I've got is to use codecs for debugging.
13:05So, again, whenever Claude comes across a problem that I can't seem to be able to solve, if you have Codex, then it's good idea to just hand over whatever Claude has been working on to Codex and then see if Codex can figure it out.
13:18And that's especially useful when you have different bugs and different problem start that Claude caused, but can't figure out how to fix them. So if that's the case, then just hand over to Codex and then bring the solution back to Claude. So just before we wrap up, I want to mention this quickly.
13:35If you don't want to play around with, uh, codecs and any skills, if you don't wanna do any other stuff, those are the six things that I would absolutely make sure that I do. So do your MCP audit, see what kind of MCP is your cloud is connected to, batch your small tasks together in one prompt, use plan mode before executing any real task, and use at for file references.
13:58And then I would also compact at around 60% capacity. And, finally, I would work outside of peak hours. And if you do those six, you are already in a power user territory.
14:07So those are all those no cost and under five minutes setups that you can use to massively save your tokens. So if you don't wanna do any other stuff, just make sure that you go through this list and make sure that you do this. So start with one or two, and you will already see the massive improvement.
14:24And even if you do all 13 and you're still hitting your limits, I would say that's good because, uh, you're actually using close to its maximum potential. So there you have it.
14:33This is the complete list of my top token saving tips for cloth code. I hope you enjoyed it. Feel free to save the video and come back to it when you need to.
14:44And you've got all the links down below in the description so you can access all the files, mycloud.md, and everything else. So I hope you got a lot of value from the video, and thank you very much for watching.
14:54I'll see you in the next one. Bye bye.
The Hook

The bait, then the rug-pull.

The session limit hits at the worst possible moment. The video opens on that exact screen - the context-full warning arriving mid-project - then promises a ranked list of 13 fixes that compound across every session you run.

Frameworks

Named ideas worth stealing.

00:29list

Basic / Mid / Advanced token-saving tiers

  1. Basic: Lean CLAUDE.md, /context audits, /clear + /compact, Sonnet default, @ file refs, lean external docs
  2. Mid: Disconnect unused MCPs, batch prompts, plan mode first, work off-peak
  3. Advanced: Caveman skill, watch Claude live, Codex for debugging

Three tiers of token-saving interventions ordered by complexity and setup cost.

Steal forAny AI tooling tutorial - tier by complexity so viewers can exit at the right level.
13:28list

Six-item quick-start combo

  1. /mcp audit
  2. batch small prompts
  3. plan mode before executing
  4. @ for file references
  5. /compact at 60%
  6. work off-peak hours

The six no-cost under-5-minute habits with the highest combined ROI.

Steal forAny tutorial with a long full list - always give the viewer the minimum effective dose as a closing summary.
CTA Breakdown

How they asked for the click.

VERBAL ASK
14:15subscribe
Feel free to save the video and come back to it when you need to. You have got all the links down below in the description.

Soft close - no hard subscribe push, just resource links and a natural sign-off.

FROM THE DESCRIPTION
PRIMARY CTAWhere the creator wants you to go next.
OTHER LINKSAlso linked in the description.
Storyboard

Visual structure at a glance.

hook
hookhook00:00
tier grid
promisetier grid00:29
tip 1 doc
valuetip 1 doc00:30
tip 4 doc
valuetip 4 doc05:40
caveman
valuecaveman10:39
bonus combo
ctabonus combo13:28
full list
ctafull list14:15
Frame Gallery

Visual moments.

Chat about this