The argument in one line.
Prompt caching is what makes Claude Code's session limits feel generous, and three behavioral habits capture 95% of the savings: respect the one-hour TTL, start fresh between tasks, and avoid mid-session model switches.
Read if. Skip if.
- A Claude user running multi-turn coding sessions or long projects who wants to understand why their token limits feel tight and how caching actually works.
- Someone using Claude via API or sub-agents who's building automation and needs to optimize token spend without overhauling their existing workflow.
- A developer or technical founder who's curious about prompt caching but finds the official Anthropic docs overwhelming and wants the 80/20 version.
- You're using Claude only for occasional chat or one-off queries — caching won't meaningfully impact your token usage or session limits.
- You've already optimized your Claude workflows and understand cache TTL rules, hit rates, and session handoff patterns — this is introductory material.
The full version, fast.
Claude prompt caching quietly cuts token costs to 10% of normal input, but most users burn through session limits by triggering recaches they don't understand. Cached tokens live for one hour on a Claude Code subscription, five minutes on API or sub-agents, and the cache rebuilds in three layers: system instructions and tools, project files like CLAUDE.md, and the growing conversation. Switching models mid-session, including the opus-plan toggle, invalidates the cache entirely, as does pausing past the TTL or editing the system prompt. Three habits cover almost everyone: don't pause longer than an hour, start fresh when switching tasks with a handoff summary instead of compact, and put large documents into projects rather than pasting them into chat.
Chat with this breakdown.
Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.
Create a free account →Where the time goes.

01 · What caching actually costs
Hook on real dashboard numbers. 10% cost for cached tokens. 1hr TTL on subscription, 5min on API/sub-agents. Thoric/Anthropic quote on cache hit rate monitoring.

02 · How the cache grows per turn
System layer globally cached, Project layer per-project, Conversation layer grows every turn. Prefix-matching via Thoric diagram.

03 · The 4-turn visual example
Four-turn diagram showing what is cached vs processed fresh each turn. Danger: changing system prompt at message 16+ means full recache.

04 · Three layers and what breaks each
System / Project / Conversation table with exact events that bust each layer.

05 · Cache lifetime TTL table
Subscription within plan ~1hr. On usage credits: 5min. API key: 5min. Sub-agents: 5min. Addresses April Reddit panic.

06 · Three habits that cover 95%
1. Do not pause too long. 2. Start fresh when you switch. 3. Do not paste big one-off docs. Demo of session-handoff skill.

07 · What else breaks the cache
Model switching = full recache. Opus plan mode = cache-breaking on every plan/execute toggle. Editing CLAUDE.md mid-session is safe.

08 · Token dashboard and CTA
Free GitHub repo in School community. Tracks sessions/turns/tokens/cache reads/cost. Local device only. Setup via one Claude Code command.
Lines worth screenshotting.
- Cached tokens cost only 10 percent of normal input tokens — saving 91 million tokens in a single day is the equivalent of processing just 9 million.
- The Claude Code cache window is one hour on a subscription; leave a session idle for an hour and you pay full price to reload everything.
- If your Claude subscription runs out and you tip into API usage, the cache TTL drops from one hour to five minutes by default.
- Changing your system prompt mid-session forces every token to recache from scratch, no matter how far into the conversation you are.
- Anthropic runs internal alerts on cache hit rates and declares service-level incidents if they drop too low.
- The three habits that fix 95 percent of caching problems are: do not let sessions sit idle for over an hour, do not change the system prompt mid-session, and manage sub-agent cache TTL explicitly.
- Sub-agents using the API default to a five-minute TTL, which means rapid-fire multi-agent runs burn through tokens at an explosive rate unless configured otherwise.
Three habits stop 95% of cache waste
Claude's prompt cache is on by default and saves up to 90% on repeated tokens — but three common habits silently bust it, and knowing the TTL rules is the whole game.
- Cached tokens cost only 10% of normal input, so a high cache hit rate is the primary lever on both session limits and per-token costs.
- The cache grows in three layers — system instructions cached globally, project files cached per-project, and conversation turns cached and extended each turn.
- The cache is rebuilt from scratch on three events: waiting past the TTL, changing the system prompt, or switching models — any of which can turn a cheap session expensive.
- Each cache layer has its own failure mode — system layer breaks on prompt edits or TTL expiry, project layer breaks on CLAUDE.md restarts, conversation layer resets every session.
- On a Claude subscription, the cache window is one hour; on API access or sub-agents, it drops to five minutes by default, which is easy to exceed when managing multiple sessions.
- The three habits that cover most users: do not let a session sit idle past the TTL, start a fresh session when switching tasks, and avoid pasting large one-off documents into the chat.
- A session-handoff pattern — summarize state, clear the session, paste the summary into the new session — preserves continuity without the cost of a cache miss on a long conversation.
- Switching models mid-session resets the cache entirely because the cache relies on prefix matching, and a model switch changes the prefix.
- The Opus-plan-mode setting causes a model switch on every plan/execute toggle, which resets the cache each time — useful to know before enabling it as a token-saving move.
- Editing CLAUDE.md mid-session does not break the cache because the change is not applied until the session restarts.
- For large documents, dropping them into a project rather than the chat gives them better caching treatment, even though the exact mechanism is not fully documented.
Terms worth knowing.
- Prompt caching
- A feature that stores parts of a conversation (system instructions, tools, project files, prior turns) so they don't need to be reprocessed on every new message, dramatically cutting cost and speeding up responses.
- Cache read
- Tokens that the model reuses from an existing cache instead of processing fresh. These are billed at roughly 10% of normal input token cost.
- Cache create
- The one-time act of writing content into the cache the first time it's seen. It costs more than a normal input token but pays off on subsequent turns that read from it.
- Cache hit rate
- The percentage of incoming tokens served from cache versus reprocessed from scratch. A high hit rate means cheaper, faster responses and longer practical sessions.
- TTL (time to live)
- How long a cached snapshot stays valid before it expires and must be rebuilt. On Claude subscriptions it's one hour by default; on direct API calls and sub-agents it's five minutes unless explicitly extended.
- Claude Code
- Anthropic's command-line coding tool that runs Claude inside a terminal or editor extension, with access to file reads, writes, shell commands, and project-level memory.
- Sub agents
- Secondary Claude instances spawned by a main session to handle delegated subtasks. They run on a separate five-minute cache TTL regardless of the parent plan.
- Session limits
- Per-week token or usage caps tied to a Claude subscription tier. Hitting them pushes the user into pay-per-token API billing for any additional work.
- System prompt
- The baseline instructions and tool definitions loaded at the start of every Claude session. Changing it mid-session invalidates the entire cache and forces a full rebuild.
- CLAUDE.md
- A per-project markdown file that Claude Code automatically loads as persistent context, holding rules, conventions, and reference material specific to that codebase.
- Prefix matching
- How the cache decides whether a new request can reuse stored tokens: the incoming conversation must start with the exact same sequence of tokens as what was cached. Any change earlier in the prefix breaks the match.
- /compact
- A Claude Code slash command that summarizes the current conversation into a shorter version to free up context. It breaks the existing cache because the conversation prefix changes.
- /clear
- A Claude Code slash command that wipes the current session and starts a fresh one with no prior conversation history loaded.
- Session handoff
- The practice of summarizing an in-progress session's key files, decisions, and next steps so the work can be resumed cleanly in a new session without losing context.
- Plan mode
- A Claude Code mode where the model drafts an implementation plan before writing code. It's commonly paired with a stronger model for planning and a cheaper one for execution.
- Opus / Sonnet
- Two tiers of Anthropic's Claude models. Opus is the most capable and expensive; Sonnet is the mid-tier workhorse used for most coding tasks.
- model opus-plan
- A Claude Code setting that uses Opus during plan mode and switches to Sonnet for execution. The mid-session model switch resets the cache, so the savings come from cheaper execution, not from cache reuse.
- Claude Projects
- A feature inside the Claude.ai web app that lets users attach reference documents to a workspace so the model can access them across conversations, with caching optimized for repeated document lookups.
- Skill
- A reusable prompt or workflow packaged for Claude Code so it can be invoked on demand, like a custom slash command that performs a specific multi-step task.
Things they pointed at.
Lines you could clip.
“We run alerts on our prompt cache hit rate and declare SEVs if theyre too low.”
“Keep it alive. Keep it focused. Start fresh when you switch.”
“If you switch the model, you are recaching everything.”
Word for word.
The bait, then the rug-pull.
Nate Herk opens on a live token dashboard: 91 million tokens saved in a single day, 300 million in a week, all from prompt caching running silently in the background. The pitch is disarmingly simple: you do not have to change anything, but you do need to know the two or three things that can quietly blow it all up.
Named ideas worth stealing.
Three-Layer Cache Model
- System (globally cached)
- Project (cached per project)
- Conversation (grows each turn)
Every Claude Code session has three stacked layers with different caching rules.
Three Habits That Cover 95% of People
- Do not pause too long mid-task
- Start fresh when you switch tasks
- Do not paste big one-off documents
Opinionated 80/20 reduction of all caching complexity into three daily behaviors.
TTL Table by Setup
- Subscription within plan: ~1 hour
- Subscription on usage credits: 5 min
- API key / Bedrock / Vertex: 5 min
- Sub-agents any plan: 5 min
Definitive reference for how long a cache snapshot lives across different Claude access modes.
How they asked for the click.
“You will go to my free School community. The link is in the description. Click on classroom, click on all YouTube resources.”
Soft community CTA driven by giveaway perceived value. Two free tools bundled makes it feel like a package, not a pitch.








































































