Modern Creator
Ben AI · YouTube

This Skill Instantly 10x's Every Claude Output

Long Claude chats quietly get dumber long before the 1M-token window is full — here's how to catch it and hand off to a fresh chat without losing your rules.

Posted
yesterday
Duration
Format
Tutorial
educational
Views
2.5K
109 likes
Big Idea

The argument in one line.

Claude's output quality degrades well before its 1-million-token context window is actually full, so tracking token usage with /context and handing off to a fresh chat with a comprehensive summary matters more than the window's advertised size.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You keep reusing the same long-running Claude chat for a recurring task (newsletter review, content ideation, ongoing project) because re-explaining context every time feels wasteful.
  • You use Claude Cowork or Claude Code for knowledge work that leans on saved context documents, brand guides, or reference transcripts.
  • You've noticed Claude's answers get vaguer, more repetitive, or more error-prone the longer a single chat runs.
SKIP IF…
  • You mostly use short, single-purpose chats that never approach heavy context accumulation.
  • You're looking for API-level or code-level context management (this covers the Claude Code/Cowork product UI, not the Messages API).
TL;DR

The full version, fast.

Every Claude chat has a large context window, but output quality starts declining long before that window fills — a phenomenon the video calls context rot. The fix is twofold: use the /context command regularly to see actual token usage broken down by category, and switch to a fresh chat once a session drifts into a degraded zone. Claude's built-in /compact command tries to help by summarizing a chat into a fresh window, but it only works in Claude Code, produces a thin summary, and drops the specific rules and context documents accumulated during the conversation. The video introduces a custom skill called /refresh that asks what the next session's goal is, writes detailed handoff files (outline, decisions, rules, research notes) to the working folder, and generates a paste-ready prompt that tells the new chat to read those files plus the original context documents before continuing — preserving accumulated instructions that /compact would otherwise lose.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:47

01 · Intro

Cold open: recurring long chats save re-explaining context but cause context rot.

00:4701:45

02 · How Context Rot Works

Brain analogy for the 1M-token context window: more info loaded in means fuzzier, less accurate answers well before the window is full.

01:4502:34

03 · When to Switch Chats

Whiteboard zone map: 0-200k sweet spot, 200-350k drifting, 350k+ degraded, 400k+ dead zone / 'you are right to push back' zone.

02:3405:08

04 · How to Track Tokens

/context command in Claude Code and Cowork shows token usage by category; four signals for when to check it.

05:0806:11

05 · Why /compact Isn't Enough

/compact only works in Claude Code, makes a thin summary, and drops accumulated rules and context documents.

06:1109:32

06 · How the /refresh Skill Works

Custom skill asks the next session's goal, writes handoff files, and generates a comprehensive re-priming prompt for a fresh chat.

09:3212:01

07 · Re-fresh in Action: Same Task, Fresh Chat

Live demo continuing a YouTube-prep chat in a fresh window with all prior context and rules intact.

12:0114:19

08 · Examples & Use Cases

Second demo: pivoting a YouTube-ideation chat into a newsletter-writing task without polluting the original chat; credits Matt Pocock's handoff skill as inspiration.

Atomic Insights

Lines worth screenshotting.

  • Claude's output can start degrading well before a chat's context window is technically full, which the video calls context rot.
  • A recurring long chat feels efficient because it already has context primed, but that same accumulated context is what eventually degrades output quality.
  • Research, connectors, and MCP tool calls burn tokens far faster than plain conversation, so sessions using them hit degraded zones sooner.
  • Claude's built-in /compact command only runs in Claude Code, not in Claude Cowork, which rules it out for a large share of knowledge-work use cases.
  • A short chat summary forces you to re-explain accumulated context anyway, defeating the purpose of compacting instead of starting over.
  • Small rules and corrections built up through back-and-forth conversation are often the most valuable part of a long chat, and a thin summary silently discards them.
  • Re-using one long chat for a new, unrelated task pollutes its context and can cause the model to drift away from the original task entirely.
  • Selecting a working folder at the start of a Claude Cowork or Claude Code session is a habit worth forming even without a dedicated context-handoff system, since it anchors where handoff files get written.
  • Getting into the habit of checking token usage regularly builds an intuition over time for which actions in a workflow are the most expensive to run.
Takeaway

Long AI chats degrade before they run out of room.

WHAT TO LEARN

A conversation with an AI assistant can start producing worse answers long before it hits any stated limit, so tracking usage and refreshing context deliberately beats waiting for a hard wall.

  • Quality decline in a long AI conversation often starts well before any advertised capacity limit is reached, so don't wait for an error message as your signal to start over.
  • Reusing one long conversation for a new, unrelated task can cause the assistant to drift and lose focus on the original task — keep unrelated work in separate sessions.
  • Heavy use of connected tools, searches, or external data sources burns through a conversation's capacity far faster than plain back-and-forth text.
  • A short summary of a long conversation often loses the specific corrections and preferences built up over time, forcing you to restate them anyway.
  • Building a habit of periodically checking how much of a conversation's capacity has been used helps you learn which activities are the most resource-intensive.
  • When an AI assistant starts apologizing repeatedly or contradicting itself, that behavior is often a signal that the conversation has accumulated too much competing context, not that the tool itself has gotten worse.
Glossary

Terms worth knowing.

Context window
The total amount of text (measured in tokens) a language model can hold in memory at once, including every prompt, response, file, and tool result in the conversation.
Context rot
The colloquial term for an AI model's output quality gradually declining as a conversation grows longer, even before its context window is completely full.
Token
A unit of text (roughly a word fragment) that language models use to measure and price how much content is being processed.
MCP (Model Context Protocol)
A standard that lets an AI assistant connect to external tools and data sources, such as a research tool or a business's own database.
Compaction
The process of condensing a long conversation's history into a shorter summary so a new session can continue with less accumulated content.
Resources

Things they pointed at.

13:50toolMatt Pocock's handoff skill
Quotables

Lines you could clip.

00:00
There's one big problem we're ignoring while doing this, which is context rot.
Names the core problem in one sentence, works as a cold-open hook.TikTok hook↗ Tweet quote
06:40
The slash compact, the summary is just not comprehensive enough... it basically means re-explaining and re-prompting a lot of the same stuff in that new chat session again.
Clear, specific critique of a well-known built-in feature.IG reel cold open↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogy
00:00Most people that use Cloth, including me, have a few giant chat sessions we keep coming back to for reviewing copy, writing responses, or other repetitive use cases. And the reason we don't start new chats is because it means reexplaining all of the rules and context for that specific task again. But there's one big problem we're ignoring while doing this, which is context rot.
00:22So in this video, I'll explain what context rot is and why actually managing it makes a huge impact on your AI outputs, show you a simple skill that tells you when to continue in a new chat session, and show you the refresh skill that helps you start a new chat session without ever having to re explain stuff again. Now you'll be able to download the refresh skill for free in the first link in the description below, But before showing you the scale, it is important to actually understand how a context window works.
00:48Each chat session you open in Cloud Opus, for example, has a 1,000,000 token context window. And every prompt, every answer, every file, every connector you use, or every SkillCloud uses, this all fills up that context window. And the easiest way to understand the limitations of this context window is by thinking of the context window as your own brain on a normal day to day.
01:10The more information you ingest, the more it fills up and the closer you come to your daily processing limit. And the more you start forgetting information you learned earlier in the day, the less clearly you remember things, the more fuzzy all of this info becomes, and the less productive you become. And although Cloud has a very large 1,000,000 token window with Opus 4.8, the same thing happens with Cloud.
01:32Even far before hitting that limit, it starts forgetting, becoming fuzzy, and therefore significantly impacting the quality of your AI outputs. Essentially, it becomes dumber the more you use it, which you've probably noticed.
01:45But this actually starts happening a lot earlier than most people realize. From zero to 200,000 tokens is really where you're gonna get the best outputs from AI. And from 200 to 350,000, it can still be very useful, but it can already start drifting a bit.
02:00By 350,000 tokens, it really starts significantly impacting your outputs, and after reaching 400,000 tokens, it's usually a point where you really start to get frustrated with AI, and therefore I call the your right to push back zone.
02:14Uh, because when you hear Claude say that, you're probably in this zone and is where you really need to start a fresh chat session. Now before showing you the refresh skill that helps you continue the conversation in a fresh context window or fresh chat without having to re explain the rules and the context again. You first need to be able to know how many tokens you actually used in each chat and when you actually should switch to a new chat.
02:36Now if you use Cloud Code in the terminal, you actually have a a setting to show how many the amount of tokens you used inside of one chat, so it becomes really easy to track. But if you're like me and often work in a Cloud desktop because you prefer the UI, especially for knowledge work, there's no way to directly see the amount of tokens you've used inside of one chat window, not in Cloud Cowork, but also not in Cloud Code desktop.
02:58But there is actually a skill that many people don't know about, which is the slash context skill that you can use at any time in any chat in co work or Cloud Code to know how many tokens you've spent and what on. So first of all, I highly recommend starting to use this slash context regularly and get in the habit of using it because using it will not only instantly give you the insight in where you're at in the chat so you know when to start fresh, but also a nice benefit fit of this is that the more you start using this, the more you start understanding what burns tokens and whatnot, and you start to develop a better feeling of when you're at the sort of this cutoff limit and should start a new chat.
03:35For example, I've learned by using this more and more that doing research and using connectors and MCPs is probably by far one of the biggest token spenders, so I know I have to switch to a new chat a lot sooner. Now in order to form that habit of using slash context, it's good to look out for a few signals that indicate that you probably should be using slash context to know where you're at in the context window.
03:57Now the first one, of course, the obvious one is in really long sessions, you wanna start using Secondly, if you're anything like me and you keep coming back to similar chats because you already have primed a specific chat for specific tasks. For example, here I have a really long chat where I usually review all of my new newsletter copies, and I keep coming back to this one because, uh, this one already has all my specific rules and instructions on optimizing the copy for my newsletters.
04:23I usually have these very long chats too for every YouTube video I create where I go from title ideation to research to outline preparation, everything in one chat because it already has all the context in there. Then thirdly, you especially wanna start doing this earlier in chats, uh, when you use a lot of connectors or do a lot of research because these tend to spend a lot of tokens much more than you think.
04:45And, fourthly, of course, the biggest signal to look out for is clearly seeing the outputs getting worse, you having to reexplain things, AI may be drifting a bit, or you getting frustrated, and this is probably your strongest signal. And instead of screaming at AI, you have to get in the habit of using slash context because you'll notice that often it is simply because you're already past that 400 k mark.
05:07Now once you have identified a session in in the dump zone, you need to start a new chat, but how do we actually carry over the context into the new chat so we don't have to re prompt and give all of that context again? Now some of you might already know that Clot has a built in skill called slash compact, and this basically aims to help you do exactly this.
05:25It basically makes a summary of everything you've done in that chat, the chat history, the context files, and everything else, and compacts this into a summary of around 50 to 60 k tokens so you can continue with the same task in a fresh context window while still having relevant context from your previous chat session. And if you use it in cloud, for example, here slash compact, you see that it starts compacting the conversation and making a summary.
05:50And once it's compacted the conversation, you can actually continue in the same chat window. But what happens in the back is it actually starts a fresh context window with that summary.
06:00But in my experience, this slash compact has some real flaws, and it's why I haven't been using it that much and often preferred to just keep going in a longer session even with some context throughout. So this is exactly why I built the refresh skill to get around some of the flaws of the slash compact. So in essence, this refresh skill is very similar to the compact skill.
06:21They both summarize the chat and the context, But, firstly, the refresh scale actually works in co work because slash compact only works in cloth code. And for many copywriting or YouTube ideation or knowledge work type task, I just prefer co work.
06:36But the three bigger problems with the slash compact that I tried to resolve with the refresh scale is that, firstly, in my experience, the slash compact, the summary is just not comprehensive enough. It basically makes a short summary of everything, which in my experience just means re explaining and re prompting a lot of the same stuff in that new chat session again.
06:55So firstly, the refresh scale just makes a far more comprehensive summary to avoid that reexplaining. It does mean it spends more tokens, probably around the hun 100,000 mark, but again, in my experience, it's just worth the time and the effort you gain by spending a bit more tokens in that fresh context window. It also actually asks you what's the goal of your next session before creating this summary so you can actually put in the relevant context for what you're trying to do in the next chat.
07:22Because you don't just wanna use this to continue on the same task in a very long session. You also wanna do this when you're veering off your original task and trying to do a different task that still needs that same context. For example, I wanna write a newsletter based on the context in my YouTube ideation chat, then I'd also wanna use the refresh scale to make sure I'm doing the iteration on the newsletter in a fresh chat so it doesn't pollute the context window of the YouTube ideation chat.
07:49Secondly, the reason these recurring chats or long chats are often so valuable and why you keep coming back to them is because you've accumulated a lot of little rules and instructions and dos and don'ts around the specific task while going back and forth with AI, and I noticed that those little rules and and instructions get completely lost with the slash compact.
08:08And this is exactly what the refresh scale also does. It looks, first of all, the status, where are we in the process in this chat. It looked at what worked, what didn't work, and all the rules.
08:17And it write writes down the specific rules and instructions and the dos and don'ts that the user gave inside of that chat and takes them into the new chat session. And lastly, which is the biggest one for me, because when I'm using AI, for example, for my YouTube ideation or my copywriting tasks or many other things, I often feed a lot of context docs to my AI to get a full understanding of me, my business, my ICP before giving me outputs.
08:41For example, you can see here in one of my YouTube ideation chats, I let it read my strategy, my ICP, my brand, some of my old video transcripts, and some more, uh, documents because it just improves the outputs, and they get far more relevant in my experience. And the problem with the slash compact scale is it basically makes a very tiny summary of each of these files, which means, again, I have to re prompt to read all of these files or refeed this context each time I start a new chat.
09:05And this is the next thing that the refresh scale does. It basically lists out all of the context docs that were fed inside of this chat and instructs the next session to first read all of these docs before actually continuing with the task, which for me just creates so much less friction in the process of transferring this into a new chat that I actually start doing it and actually can get far better outputs because I'm not, you know, prompting back and forth in a context window that's actually in the dumb zone.
09:32So let me show you some examples of how this skill works in practice. So here, I used it, for example, in one of my YouTube preparation chats where I do anything from title ideation to research to outline generation to script reviewing and more. But because I do that, of course, it reaches that 400 k mark pretty quick.
09:47So after a long session here, I use that refresh scale. And what the scale always does is it first asks you about what you wanna do in your next session because depending on what I wanna do, it can then actually save the relevant context based on that goal. Now in this case, I said I wanna continue working on the same video, and what the scale then does is it first starts creating a few files and adding them into the folder that I selected at the start of this chat.
10:10One around the outline that we confirmed for this video in this chat, one around the intro, one about the research findings, uh, one around the decisions and rules. Basically, a very comprehensive summary with all of the relevant context from this chat and the history that we need the next session to know, and then it creates a prompt for the new session in Claude where, first of all, states the goal for this task.
10:32It then directly gives an instruction of read all of these files first. So basically tells Claude in that new session to first read all of these files that it just created, and it also points to all of the other context, um, we used inside of this chat, like my strategy, my ICP, my brand, and some YouTube transcripts.
10:50Now this is, of course, especially powerful and relevant if you already use a lot of context in your chat or are using a second brain. And then lastly, it adds a few sections on, uh, where things stand and all of the specific rules and instructions that I've given in this chat, what worked in this chat, and what didn't work, and what corrections were made.
11:08We can then just copy this, go to a new task, make sure that we select the same folder as we worked in in the previous chat. That's why you always wanna select a folder when you work in co work or cloth code.
11:19It's just a habit you wanna get into even if you haven't set up a second brain or use a lot of context yet. And once you paste that in, you see that it starts reading all of the context files before doing anything else. So you can see by reading all of this relevant context around the last chat and the context that used in our last chat, it is immediately primed to continue on that same task.
11:38It knows the status. It knows the specific rules I've given in the last chat. I've also been using this on these chats where I keep coming back to, like my, uh, newsletter, uh, email copy optimization chats, because as you can see, this has just become way too long over time.
11:52So So same thing here. I used my refresh scale to get a fresh session while still having all of those specific instructions, the context docs, and examples, uh, still saved.
12:00There's a second use case you wanna start using this for, which is when you're actually veering off your original task and start using that same chat window for a new type of task. I know this is also an issue and a trap many people, uh, fall into, including myself. For example, if I now, in a YouTube ideation chat, want to create a newsletter out of the context that I already have inside of this chat, which is great because, of course, it already has lots of context around this topic, I shouldn't actually start doing that in that same chat because, of course, I start polluting the context window.
12:32Cloud completely drifts off from the original tasks, so it's far better to use the refresh scale because you still keep that same comp context, but, um, use it for a new type of task in a fresh window. So you can see for this YouTube video, I used to refresh, but in this case, I told it to I want to create a newsletter based on the context in this chat.
12:50It then did did the same again, created some files, created the prompt, pasted that into a new window. It read all the relevant files, and then actually started using my newsletter scale right away. Again, you can download the scale for free in the first link in the description below.
13:04It is a little bit of a habit you have to get into, but I highly recommend it. Because first of all, if you start tracking your context window with slash context, if you get in the habit of that, it is really gonna improve the way you use AI and the outputs you'll get from AI.
13:17And second, once you start doing that, highly recommend starting to use this refresh scale more and more and maybe earlier and earlier in your context windows because you'll be surprised how much better your outputs become in general. Also, if you want access to all of the skills that me and my team are building out, including, uh, workflow specific ones for sales, marketing, operations, and general business skills.
13:37You can check out my AI accelerator in the second link in the description below. We also have unlimited one on one live tech help, full cloud courses, and courses on setting up your own AI OS and second brain, a community of serious professionals and business owners. So if that's interesting to you, you can check it out in the second link in the description below.
13:54Also, credits to Matt Bocock who came up with the handoff scale, uh, which inspired me to actually build this refresh scale. The handoff scale works in a similar way, but it's more of an engineering scale. I built a refresh scale more for knowledge type work.
14:07So, again, credits to Matt Pocock. Definitely follow him if you don't know him yet. He's a great creator.
14:12Thank you so much for watching. And if you wanna learn eight more skills that I use almost every day, you can check out the video here above.
The Hook

The bait, then the rug-pull.

Most people keep reusing the same long-running Claude chat because re-explaining context every time feels wasteful — but that habit quietly makes every answer a little worse. The video calls this context rot, and it happens long before the chat's context window is technically full.

Frameworks

Named ideas worth stealing.

01:45model

Context window zone map

  1. 0-200k: the sweet spot
  2. 200-350k: starts drifting
  3. 350-400k: significantly impacts outputs
  4. 400k+: dead zone / you are right to push back zone

A rough, self-reported heuristic for when Claude output quality starts to decline within a 1M-token context window.

Steal forDeciding when to start a fresh AI chat instead of continuing a long one.
03:55list

Four signals to check /context

  1. Long sessions
  2. Recurring/reused sessions
  3. Sessions with heavy connector, MCP, or research use
  4. Visibly degrading output quality

Practical triggers for when to run the /context command and check token usage.

Steal forBuilding a habit of checking AI context usage before it becomes a problem.
05:08model

/compact vs /refresh comparison

  1. Summarizes your chat
  2. Works in Cowork
  3. Comprehensive
  4. Carries your rules/instructions
  5. Carries the context it read

A 5-point comparison table showing /compact passes only the first criterion while /refresh passes all five.

Steal forEvaluating any chat-summarization or handoff tool against a clear checklist.
CTA Breakdown

How they asked for the click.

VERBAL ASK
13:00product
You can check out my AI accelerator in the second link in the description below.

Soft pitch woven into the closing minute after delivering the free skill; not pushed at the top of the video.

FROM THE DESCRIPTION
Storyboard

Visual structure at a glance.

open
hookopen00:00
context window zones
valuecontext window zones01:45
compact vs refresh table
valuecompact vs refresh table05:08
refresh live demo
valuerefresh live demo09:32
accelerator pitch
ctaaccelerator pitch13:50
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this