Modern Creator Network
Mark Kashef · YouTube · 1:02:57

How Claude Code Actually Works (What the Top 1% Know)

A 63-minute plain-English teardown of every layer of Claude Code — tools, context, sessions, skills, hooks, and the live build of a PDF-offloading skill that cuts token usage by 98%.

Posted
3 months ago
Duration
Format
Tutorial
educational
Channel
MK
Mark Kashef
§ 01 · The Hook

The bait, then the rug-pull.

Mark Kashef spent hours asking Claude Code to explain itself, then rebuilt the whole map from scratch. The promise is direct: watch this once and you will understand what is actually happening behind the terminal better than 99% of people. The class-marker hook in the title lands before the first sentence is done.

§ · Stated Promise

What the video promised.

stated at 00:15If you watch this video till the very end, then you will understand Claude Code better than 99% of people out there.delivered at 59:00
§ · Chapters

Where the time goes.

00:0001:45

01 · Intro + cheat code

Hook, @claude-code-guide sub-agent trick as the research method, sets the 9-element agenda.

01:4503:10

02 · TLDR architecture diagram

Four-layer overview: UI, Orchestration, Tool Execution, Security. Claude API is the intelligence; everything else is open-source scaffolding.

03:1005:00

03 · Open-source foundations

React/Ink, TypeScript, Node.js, ripgrep — none invented by Anthropic. The magic is marrying them with the Claude API.

05:0008:00

04 · Gather Act Verify loop

The single mental model governing every task. Gather = understand. Act = edit/create/bash. Verify = test and loop.

08:0012:00

05 · Bug-fix walkthrough

Concrete login-bug example tracing all three phases. YOLO/bypass-permissions mode and agent browser verification.

12:0014:00

06 · Permission modes intro

Default ask mode vs YOLO mode. settings.json breadcrumbs permissions over time.

14:0019:00

07 · Claude toolkit overview

Six tool categories: file, search, execution, web, orchestration, extensions. How Claude decides which tool to use.

19:0024:00

08 · File tools deep-dive

Read, Write, Edit (surgical string-replace), Glob (wildcard file finder). PDF token-cost warning.

24:0028:00

09 · Grep and ripgrep

Three search modes: files_with_matches, content, count. ripgrep runs in parallel. Full search-to-read workflow.

28:0032:00

10 · Bash explained

What bash means for non-developers. Short commands queue; long commands background. Git crash course.

32:0036:00

11 · GitHub crash course

Branches, commits, main branch, sandbox environments. Lovable/Bolt vs Git discipline.

36:0042:00

12 · Context window the bucket

CLAUDE.md + system prompt pre-fill 20-80%. /context demo. PDF nukes the window. Quality degrades past 40-50%.

42:0046:00

13 · Compaction strategy

What compaction produces (sparse summary). Write your own summary first, then /compact so Claude weights it higher.

46:0049:30

14 · Session management

Stateless sessions. What persists: files, CLAUDE.md, Git commits, installed packages, stored conversations.

49:3052:00

15 · File snapshots

Before-edit snapshot saved in memory enables rollback. 5-step flow shown on screen.

52:0056:00

16 · 5 ways to extend Claude Code

CLAUDE.md, Skills (just-in-time), MCP (cap at 1-3), Hooks (before/after events), Sub-agents (parallel workers).

56:0059:00

17 · CLAUDE.md best practices

Routing source not instruction dump. Trigger words for freestyle commands. Dangers of global CLAUDE.md.

59:001:02:00

18 · Skills vs MCPs

MCPs auto-inject context; skills are on-demand. Convert rarely-used MCPs to skills.

1:02:001:03:00

19 · Permissions hierarchy

Org > User > Project. Most restrictive wins. Allow/block/ask states with concrete examples.

1:03:001:02:57

20 · Live demo and CTA

Builds read_large_doc skill live. Raw PDF = 1.8M tokens; markdown extraction = 22K (98% reduction). CTA to community.

§ · Storyboard

Visual structure at a glance.

hook/intro
hookhook/intro00:00
architecture TLDR
promisearchitecture TLDR03:10
Gather/Act/Verify
valueGather/Act/Verify06:20
toolkit overview
valuetoolkit overview14:00
context bucket
valuecontext bucket36:00
5 ways to extend
value5 ways to extend52:00
CTA
ctaCTA1:02:32
§ · Frameworks

Named ideas worth stealing.

06:20model

Gather Act Verify

  1. Gather (read/search/understand)
  2. Act (edit/create/bash)
  3. Verify (test/check/loop)

The atomic task loop. Every Claude Code session runs this until complete or context runs out.

Steal forAny tutorial on AI-assisted coding or agentic workflows
28:20concept

The Context Bucket

200K-token window as a physical budget. CLAUDE.md + system prompt pre-spend it. PDFs are budget-killers. Quality degrades past 40-50%.

Steal forAny explainer on context limits; premium-tier sales angle
44:00concept

CLAUDE.md as routing source

One-line pointer to a playbook file rather than inlining all instructions. Use trigger words for behavior changes.

Steal forCLAUDE.md optimization content
39:40concept

Skills vs MCPs

  1. MCPs: always-on, auto-inject, bloat context
  2. Skills: just-in-time, only loaded when invoked

Convert rarely-used MCPs to skills to reclaim context budget.

Steal forClaude Code setup content
51:00model

Permission tiers Org User Project

  1. Org: company-wide policy
  2. User: personal settings
  3. Project: project-specific rules

Most restrictive tier wins. settings.json global vs settings.local.json project-specific.

Steal forTeam deployment content
§ · Quotables

Lines you could clip.

00:52
The actual engineering is unbelievably simple. It is just the way that everything comes together where you have this harmony of orchestration that makes these agentic workflows possible.
Disarms complexity anxiety; instantly shareable takeTikTok hook
10:00
The goal is that you are a human in the loop. It was not designed to just go off on its own.
Counter-narrative to autonomous-agent hypeIG reel cold open
32:30
You can nuke your entire laptop, you can nuke different services if you let it run wild without understanding exactly what it is doing.
High-stakes warning delivered casuallyTikTok hook
1:01:00
The raw PDF was showing 1,800,000 tokens. But proper extraction gives us only 22,000 tokens. So we went down by 98%.
Concrete before/after number for context-management contentnewsletter pull-quote
32:40
You can think of this context as a bucket. And in the bucket, as you have a longer conversation, that bucket fills up until it gets to the very top.
Clearest mental model in the video — instantly repeatableIG reel cold open
§ · Pacing

How they spent the runtime.

Hook length105s
Info densityhigh
Filler8%
§ · Resources Mentioned

Things they pointed at.

28:13toolGitHub
17:21toolCursor
41:07toolAWS MCP
30:06productAnthropic Economic Index Report
§ · CTA Breakdown

How they asked for the click.

1:02:32product
Check out the first link in the description below for my early AI adopters community. A beginner to intermediate course is coming out in the next couple weeks.

Soft sell after demonstrating real value. Two-tier: free diagram download (second link) and paid community (first link). Well-earned by runtime.

§ · The Script

Word for word.

metaphoranalogystory
00:00I spent hours interviewing Claude Code on exactly how it works. I wanted to understand all the nuts and bolts because one of my many goals is to be a top 10% user of Claude Code across the board. And if you clicked on this video, then I assume that you have the exact same intention.
00:16So whether you're a technical or nontechnical, I'm gonna walk you through every single thing that you need to know to understand Cloud Code better than 99% of people out there.
00:24And here's the thing. When I was a kid, I used to buy computers, break them apart, and rebuild them just to understand how each and every part kind of came together. So after breaking apart and mapping each and every part of the CLOD code system, I not only have a deeper understanding, but I know exactly what's happening behind the curtain.
00:41If you watch this video till the very end, then you will too. Let's dive in. So like I said in the intro, we're gonna be diving deep into how each and every part of Claude code works.
00:50And the goal is that you understand what's happening behind the curtain, but also the fact that the actual engineering is unbelievably simple.
00:57It's just the way that everything comes together where you have this harmony of orchestration that makes these agentic workflows possible. So we're gonna explore nine different elements of Cloud Code, and I will do my absolute best to keep things in as plain English as I possibly can.
01:12There will be some software engineering concepts, and I'll do my best to break those into analogies so it fully lands. Before we even dive into the details, I wanted to show you how you could run your own interview with Claude Code, and this is what helped me get into the nuts and bolts and ask it for advice on how memory works, how sessions work, how context works.
01:32So if you go into Claude code, most people don't know this, they've come up with a series of sub agents. And if you've watched my last video, I talked about some easy ways to use sub agents. So as a refresher, if you ask Claude straight up, hey.
01:45What agents do you come with out of the box, specifically sub agents? And we send it over, I'll show you which one we care about right now. And you'll see in the response, the one I really care to show you is this Claude code guide, which is literally designed to answer questions about how the Claude code CLI, the agent SDK, and the Claude API work.
02:03So all you have to do to invoke it is write at Claude code guide, and now you are basically tagging the coding agent.
02:11So if you ask it something like, can you explain to me, like, I'm in grade 10, how the memory function and context management of Claude works, and can you create some form of ASCII art so I can visually see how the system works? And you send this over, you'll now get a glimpse of how I started my deep dive into each and every nook and cranny of Cloud Code.
02:31You'll see right there, it comes back with a breakdown of how the short term memory works, how it is structured in terms of different messages and how it summarizes messages, how long term memory works, and the full picture.
02:44So it breaks down each and every part of the system with a series of key takeaways. So this is your cheat code. So if you watch this video and you have even deeper questions or you have a particular use case and you wanna see if you can harness deeper power in Claude code that you might be putting on the shelf, then this would be the best way to do that.
03:02Now back to our learning journey. This is the TLDR of the Claude code system. So you have the Claude code CLI, the interface itself, where you have the terminal, you have the session manager, you have the tool executor, and you have what's called the permission layer.
03:16Now if you're a complete novice and noob to Claude code, the permission layer is essentially when it's asking you, hey. Can I edit this file? Can I get your approval for this?
03:24Can I install this package? And if you want, you can go into, what I usually do, YOLO mode or bypass permissions mode to allow it to do whatever it wants. Now as you'll see when we walk through the components of the CLI, a lot of this stuff exists in the wild, and it's even open source.
03:39The real magic is how it gets married to the Claude API and more importantly, the Claude models themselves, especially OPUS 4.5. The rest of the system is really how it interacts with your local machine, which is why it's become so useful for personal applications like organizing files, creating folders, and doing a lot deeper analysis, looking for system memory where there might be leaks, doing anything you want on your local computer, as well as interacting with things like GitHub, which you're not familiar, is like version history for code that you can always go back and pull the latest one that you had and then push the latest code that you wanna commit there.
04:14So if we were even to summarize and bring Claude code to the highest layer, you have a user interface, and then you have what's called the orchestration layer. This layer is where the Claude code brain lives. This is where the intelligence lives.
04:26And everything below it is how it executes tools, which tools it executes based on what scenario is popping up. And the last part, like I alluded to before, is the approval and security layer where this takes control to make sure that Cloud Code is designed at least to be your coding companion, not some autonomous agent, even though people are obsessed with using it in that way.
04:46Now here's something that most people don't realize. Now there are lot of words on screen here like React, Inc, TypeScript, Node. Js.
04:54All of this stuff, all of this core infrastructure is open sourced and has been used for years. Anthropic didn't actually invent any one of these singularly.
05:02What Anthropic did in the Cloud Code team is marry this, like I said before, with the Cloud API. So these are things that can be abstracted, is why you have things like open code.
05:13You have antigravity because a lot of these principles, like the way you search a code base, the way you traverse different files, all has existed before.
05:21It's all about them putting them together modularly like a series of Lego blocks and combining them with actual intelligence. So ClaudeCode is a combination of smart engineering standing on the shoulders of open source giants.
05:33So another key concept that's made Cloud Code viral is how simple it is. Most people, especially nontechnical people, if you're watching this and you hate the idea of a terminal, Cloud Code made something like the terminal easy to use and much more powerful than ChatGPT.
05:47All it is is the same box. You can change the color of the IDE you're using. It could be antigravity.
05:53It could be cursor. It could be whatever you wanted. But behind the scenes, all of the thinking is happening from the model, orchestrating all the bits and pieces, the files, the options, and the tools.
06:03And if you wanted one single diagram to understand what Cloud Code does behind the scenes is step one, it gathers the information. It reads your prompt. It reads the Cloud MD, which is the command center of Cloud Code that it looks at as the first thing when you initiate a brand new terminal session.
06:21And then it acts to make changes based on what it's read and understood and ideally has planned out and you've approved that plan. And then it verifies whether it did the right thing, it did it correctly. If there's a test that it can run, it will run that micro test, and you as the user can encourage it to write even more tests and make sure that they're not fictitious tests that are passing according to some weird criteria, and that's how the loop keeps on going.
06:44So if you have those three words running in the back of your head, gather, act, verify, we can dive deeper to see how it does those three things. So when it comes to the life cycle, in the gather stage, like we said, it understands before acting. So understanding means it reads files, it searches code, it explores the structure of a folder, a code base, whatever you give it, and then it asks you proactive questions, or at least it should ask you proactive questions.
07:11Especially in the later versions of Cloud Code, you have this tool. It's called the ask user input tool, and it basically pops up this multiple choice question that could be multiple tabs long asking you for preferences, direction, especially if your prompts are on the vaguer side.
07:27When it comes to acting, this is where things are make or break. So acting could be physically editing files. It could be creating brand new files and folders, shifting one file to the next or one folder to the next, running commands called bash.
07:41And for all intents and purposes, if you are nontechnical and you're a nondeveloper, bash just lets it do things on your computer that you can do through a terminal.
07:49And just to give you a very tangible example, let's say when you open a folder or a brand new browser session in Safari, Chrome, whatever, you are double clicking on some EXE file that makes basically executes the browser itself.
08:03And just to give you an analogy on this, imagine you are opening a brand new browser, let's say Google Chrome, and you double click on that icon. Behind the scenes, you could write some code that could also open that application from the backdoor, from the actual system itself.
08:17Bash allows you to do this, meaning it can take control of your computer, change settings, change permissions. It can do whatever you want, and that's where you have to be careful because you can nuke your entire laptop, you can nuke different services if you let it run wild without understanding exactly what it's doing.
08:33And the last thing it can do, which is super helpful, especially as someone who comes from the data scientist world where I had to live in Python quite a bit, is if you need a specific package. And for all intents and purposes, think of a package as either a program or a bridge of functionality that can take your application or your use case from a to b without you having to intervene.
08:54Packages are essentially compressed code that allow you to execute all kinds of functions without you having to figure it out from scratch. So instead of you having to figure out how to convert something like HTML to a PowerPoint x file, there's a library that's existed and has already done this well, eighty twenty, that you can just take from the web, install on your computer, and run.
09:14And the verify stage is where Cloud Code runs automated checks to see what did it do, how well it did, and go down the path if it didn't do the right thing. So if everything goes well, then you won't have to go to the next step. But if it doesn't go well, so if it does not work, you'll see right here, it will try something, check the result.
09:33If it doesn't work, it'll keep looping, which is why if it goes down this rabbit hole and you haven't given it the best instructions and you haven't given it the best code base or the best ClotMD to run and do what it needs to do, it will endlessly go in a loop. And if it fails, it will keep going, meaning your context window or as you'll see my analogy for it, the bucket will get full.
09:53The one key thing though, the ingenuity, is once it does figure it out, Claude can learn from each failed attempt. Meaning if it's tried to access some form of API, let's say, make an image for you or a video using Gemini's Nano Banana or VEIO API, and it keeps failing to either send the request to create the video or image or retrieve it and actually display and render it for you.
10:14Once it figures it out, it can commit its understanding of what to do and when to the Claude MD file, or you can even make a markdown file called a playbook or commit it elsewhere as a skill. So if you wanted a real example of a Claude code flow, let's say you're fixing a bug and you were vibe coding an app and you're adding some form of authentication and you create a login, and in the login when you click on a button it's not working.
10:37So you can tell it fix the login bug. Ideally, you tell it what that bug is. When I click it, for some reason, it doesn't reroute me to another page.
10:45It doesn't let me do Google authentication, whatever. It first gathers information, finds the authentication code, reads any error logs in the application itself, and then based on that, it can either create a plan or if you put it into YOLO mode or bypass permissions mode, it goes to act to edit the associated files.
11:04So instead of trying to YOLO and go through the whole code base and pick a file and randomly change it, once zeroes in on what files might be affected and could be the culprit of this bug, it then goes to act on this and maybe fixes it if there's a typo. Once in a while, as soon as you understand how it changes code and augments code, it can forget certain syntax.
11:25That syntax could be the difference between a functioning button and a non functioning button. And once it verifies and now we live in a world where you have agent browsers where it can click into the browser, spin it up on what's called local host, meaning you run the application locally on your computer, and it can click around and see did it resolve the issue, and then you set it off on its own new feedback loop.
11:47But devoid of that, you as the user can also verify that as well. But devoid of that, you as the user can verify that as well. And like I said, the core goal is that you're always in control.
11:56And I didn't mean for that to rhyme, but it does so happy days. But the goal is that you are a human in the loop. It wasn't designed to just go off on its own.
12:05Many developers are working with these things called Ralph Loops or Ralph Wiggum, which I think is a lot of hullabaloo and a waste of time and basically brute forcing Claude Coat to do something that it it wasn't designed to do, comma, yet. We'll probably get to the point where it can go fully autonomous if you want it to, but it was designed in mind to be annoying on purpose.
12:25So if we go back into Claude code and we go into the terminal, by default, a lot of you will see the question mark mode. Or if I open up a brand new session, let's say I open one of these.
12:36Let's click on any one of these just so I can see my bar, and I click on this, and I open this up, and I close this. Right?
12:44So you usually will see something like ask before edits. The whole point of this is as you're starting out, especially if you're a noob and you have no clue what you're doing, then it will ask you before it does things. And as you give it permission, it will create a file, a settings dot JSON file that will store okay.
13:00It looks like Mark is okay with me editing a file in this way. It looks like he's also okay with me installing this package. So if you wanna be able to bread crumb your way to competence, even though it's annoying, I understand, it's helpful to do this because it helps you be more aware of what's happening and gives you a chance to intervene if you need to.
13:20Expert tip, if you use something like this extension, which if you're not familiar, I like using cursor because it's been around the block for a while. Some people like using anti gravity.
13:29Maybe one day we'll use it, but not for now. This is what it's called the Claude code for Versus code. If you want to go on YOLO mode, you wanna graduate from there, you can go on to settings, and then you can go on to the bottom here, and you can make it so it always is on YOLO mode.
13:46But once again, I'll tell you, don't do this until you know what you're doing. Now Claude has different categories of tools, and you can think of them as specialized workers. So you have file reading tools for observing and checking your code before it decides to act and do something.
14:01You then have very importantly, we'll touch on this in-depth, search tools. This allows it to not only search the entire code base, but also look within a file and not necessarily bloat your limited context window by loading the entire code at one shot.
14:16Now search is unbelievably important not because it just lets you search your entire code base, but it also has the judgment powered by the intelligence to not necessarily take a whole file and ram the entire file of code, which could be tens of thousands of lines of code versus a snippet where it thinks the issue or the opportunity lies.
14:35And then like we said, we have execution tools that can run commands on your computer. You have web tools to search the web, especially do deep research. You have orchestration in general, which allows you to manage different workflows.
14:46And one example of a micro workflow is a skill MD file that I'll touch on later, and then you have extensions or plug ins. So the TLDR of the process is Claude in the cloud Claude in the cloud, decides what tools it needs to use, and then it checks whether or not has permission to use said tools. If you are in bypass permission mode, it will then just go and execute it and retrieve and analyze the results.
15:10If you wanna get into the mind of Claude code, when it's assessing what it should do based on your request, it looks at the following different functions. So it tries to understand what does this person need? What does Mark need?
15:21Does he need to read something? Then great. We'll use the read tool.
15:25Does he need to find certain code? Then no problem. We'll use this thing called the grep tool.
15:29And if you see Cloud Code running sometimes and you are nontechnical and you're seeing weird words like glob and grep, well, grep allows it to do that search. And I'll tell you how it does that search shortly. If it needs to change a file, then it will use the edit tool.
15:42If it needs to run a command, then like we said before, it will use the bash tool. And finally, if it needs to search the web, its plan a is using internal web search. But you can do things like add skills that use perplexity.
15:54You can add different plugins from the marketplace. You can do whatever you want. But out of the box, this is what it has at its disposal.
16:00And once again, once it executes a task and it goes from Claude running a tool to getting the result of that tool, then that tool result creates brand new context, a brand new state, if you will, of understanding of reality. And then if it's something that needs to remember, it'll commit that and it creates brand new persistent context.
16:19Now if we dive deeper into the tooling of Cloud Code, let's take a look and see what do the read, write, edit, most importantly, glob l s actually do. And, again, if you're nontechnical, stick with me. This is not gonna be very painful.
16:31So these are essentially Claude's hands if you wanna think about them. So the read tool, pretty straightforward, but it's nuanced in the way it works sometimes. So let's say you have a document.
16:41It will then read that document, and then it will see lines with numbers because if you see a PDF, it will see different lines of numbers of that PDF. And PDFs, by the way, are a great example because if you tell it to read a 50 page PDF file, you'll be shocked at how quickly your context fills up because PDFs in their essence are full of noisy tokens behind the scenes that you don't see with the naked eye, but to create the visualizations, to create the file itself in the way it's smooth and buttery.
17:12There's a lot of tokens that get rammed into memory for no reason. So expert tip here is if you see Cloud Code attempting to read a huge file, ideally, tell it to create a Python script where you use any of the cheapest APIs possible.
17:25Let's say the Gemini 2.5 flash API that has a million context window. Tell it to go use that API, create a Python script to read it, and then report back the TLDR, the summary, or the most salient points.
17:39This helps you offload the read tool, shoving all of this unnecessary text into your context window and preventing you from taking the next step.
17:47Now the right tool is helpful because it creates everything from scratch, from your code to brand new markdown files. And for me, whenever I go through a session and we've had to traverse back and forth into errors and feedback loops, when we finally get a crystallized understanding of what has worked and what's worked well, I always tell it to either commit its understanding as a compressed TLDR in the CloudMD file, or I tell it to create a playbook of exactly what we went through, what decisions were made, and what the final outcome was, and how we could go directly from a to the final outcome next time.
18:22So in a way, is a glorified version of reverse meta prompting, is something I used to talk about a lot on this channel back in the ChatGPT days. But now that we've leveled up, you can still adopt and migrate this concept over here. Now the edit tool is not mind blowing, but the one nuance here is ideally it tries to do pinpoint edits.
18:41So instead of refactoring and recreating the whole file, which would take many more output tokens, are more expensive and take more usage from your account, then it tries to find the exact old string. If you don't know what a string is, it's essentially text. You can think of it as text or anything that's converted into text format, and then it tries to directly replace it with a brand new string.
19:01Now we're getting to words you probably haven't heard of, but you see quite a bit in Cloud Code. And one of them is glob. And if you look at the word glob, probably looked at it and you're like, okay, whatever.
19:11But it's actually doing something really important. When it goes and looks for files, it's looking for file patterns. So if you say something like go and search all the files associated with JavaScript or whatever it is that's powering my front end of my user interface, it will go and search for, let's say, the ending.
19:28This asterisk is a wild card. This could be anything. It could be any name, but it's primarily looking for anything that is dot t s, which stands for a language called TypeScript, which is used heavily in a lot of these vibe coded apps that you see nowadays.
19:40Nowadays. And if we zoom in just a bit, you'll see that we have asterisks here located in different parts. So if it knows exactly what folders or subfolders to look for, then it will try to also narrow its search to preserve the number of tokens being used just for exploration.
19:56Because ideally, the system is designed so it can focus on the building and satisfying your requirement more so than prepping on how to actually tackle it. So if you're still with me, this is typically the full flow of tool calling. You have glob go and find files ideally by their suffix, and then you have read to understand that code.
20:15You have edit if needed to make any changes ideally with surgical precision to said code, and then it reads it to verify the changes, and then we go back into that feedback loop. Now this is probably another word you haven't heard before, which is grep. So we have glob that finds patterns.
20:31We have grep that does actual search, kinda like you would search on a website or if you go on a retail site and search for socks and you wanted to do some form of elastic search there, that's where GREP comes in. So if you say, wanna go and edit how we do x y z thing on the profile part of my app. Let's say you wanna go so that someone can edit their profile picture, and it doesn't exist.
20:53You just have first name, last name, and email. So it will go and look for auth, and then it will see every single file that contains the word auth. If that doesn't work and it finds nothing, maybe it will brute force finding profile or first name as a variable because it knows, okay, if you have profile and there's only first name and you wrote that in your prompt, then there's likely some form of variable that someone has to fill out.
21:16Let me go and find that. So this is what it's doing behind the scenes to go and narrow down not only because that same thing can appear in multiple files, by the way, depending on what app it is, the dependencies, the use cases, etcetera. And behind the scenes, it's all powered by this thing called ripgrep.
21:33And the reason why this has been adopted in Cloud Code is it is incredibly fast. So it can quickly, like I said, traverse your whole code base, do pinpoint searches, and then behind that, do surgical accuracy to edit it if needed. And the reason why rip grep, which powers grep, is fast is it runs things in parallel.
21:52So if it wants to search a keyword, it won't search one file, then queue it up, then do the next file, and queue it up. Otherwise, your cloud code sessions would be that much more painful. What it does is if it's isolated four or five files, it will go and search them all in parallel and come up with its own understanding of which one needs to be changed or whether or not it needs to edit multiple files.
22:12So regular search is not only slow, but it could take many tokens as well. So that's another added benefit, which is that it's memory efficient as well. So if you wanted to think in three different search modes, you have files with matches.
22:24So where is this specific type of file? You have content based matching, so where is this specific context or series of functions. And then you have count, so how many of x.
22:34Maybe it sees off five times, and then it uses that to zero in and go to the next step. So the tilde r of search to read workflow is you have grep going and finding the files, then it searches everything before it actually reads it to see what is worth reading.
22:50Here's where the nuance here and the engineering ingenuity comes in. And once it narrows down what's worth reading, then it reads it, and hopefully it's reading only the part that matters. Alright.
23:00So next up is bash. And for you nontechnical folks out there, this is the part where your eyes might start to glaze over. So I'll do my best to make it as painless as possible.
23:09The TLDR of what bash is once again is it allows Cloud Code to run commands on your computer, create, run tests. If you wanna do something like committing your code to git, which again is a proxy for version history, it's literally version history on steroids, then this allows you to do all of that.
23:27So Claude can run NPM install to install packages like we said. It can create tests for different things that it's created. It can go and organize files and test whether it did it correctly according to your requirements.
23:39It can do all kinds of things, and it can spin up a local server. This especially becomes very helpful when you're running things like local host, but also apps that might need some more firepower. And instead of you deploying it right away on the cloud, like on Vercel or on render, it will allow you to do what's called docker build, which will let you create this containerization of whatever you've put together and run it in a very isolated manner.
24:04So you can also limit the blast radius if things go wrong. So bash is fundamental to how Claude code works and more importantly, why it's become so quote unquote famous, especially when it comes to local based tasks. And the way it's designed is that you always have control by default unless you forego that control.
24:22So Claude will quickly just check whether or not it has permission to take control of your computer in the way that it's looking to do so. If it doesn't have control, this is where it will ask you for approval, and then it will run on your machine, return an output.
24:35And like we said before, it will go through this feedback loop to test what it did to make sure it did it the right way and the best way. So there are two core scenarios that can happen with Bash depending on the tasks at hand. If there are a series of small and quick wins, you'll have to wait until it queues them all up and executes them.
24:52Otherwise, if it detects that there's a longer command that will take five, ten minutes to run, it will push that into the background and run different tasks in parallel. And this is fundamental because this allows you to run and continue all kinds of other work. So you can even open another terminal and have the other terminal take care of whatever it is it started on, and this bifurcation of tasks and task management makes Cloud Code that much more potent.
25:16Now in terms of the common bash patterns that you might see on your screen even if you don't recognize them is when it comes to version control, you might see things like this. So git status, git commit, git push. And if you're still watching this video and you have no clue how to even get started with GitHub, easiest thing honestly is going to github.com.
25:35You create a brand new account. Once you create an account, then you can use it for free. And then once you have it ready to go, you can go into Claude.
25:43And let's say you're just purely on the terminal. I'm gonna make that assumption. I will just do slash right here, and I'll do slash install.
25:52You can see right here. You could do Slack or GitHub. If you walk through this, you'll have a wizard that walks you through exactly what you need, the keys you need to grab.
26:02And once you go through this slightly painful process for fifteen, twenty minutes, you'll be good to go, and then you can use Git wherever you want. Meaning, you can use the words I want you to commit this or create a new repository for this to store all the code. Every time we make a change, I want you to commit it, and a commit is like a checkpoint in version history.
26:20This will allow you to do that that much more easily. So we do something like git status, the crash course here is it will just see what's going on in your different branches. You can think of GitHub as this tree.
26:32It's literally called there's one part called the main tree, and then they are called branches. Branches let you build things without touching your main tree, meaning if you're building an app to take that example and you implement a feature and that feature's on a branch, you have the ability to audit whether or not you can safely merge it to your main tree or will it break everything.
26:51So it gives you that extra roadblock or that stop sign. Commit means literally committing or pushing whatever code you have and storing it as another checkpoint and usually pushing it to your main branch, which means it goes straight production.
27:05If you've ever used any of the browser based tools, let's say the lovables, the bolts of the world, usually, anytime you make a change and you click publish, it takes effect. There's no intermediary.
27:15There's no sandbox. And when it comes to versioning, you can add more and more layers the way you would as a developer where you can create a sandbox where you go from committing small changes to having them on branches to then either merging them or testing those branches separately in a sandbox environment, but this allows you to have more flexibility to build responsibly.
27:34And when it comes to building, any of these commands here will allow it to do what's called compiling. So let's say you're building a React based app. If you don't know what React is, 90% of vibe coding apps that you see nowadays are built using this framework called React.
27:48Then when it's doing a build, it's trying to compile all the code in a way where it can render it as the web page or the app that you end up seeing on your screen. Okay. So we have four more sections to go.
27:59So hold on if you're still with me. We will get to the promised land of milk and honey where you can say you understand how Cloud Code works better than 99 of people. Now when it comes to context management, this is arguably one of the most important sections for you to understand what your limitations are in dealing with Cloud Code.
28:15So you can think of this context as a bucket. And in the bucket, as you have a longer conversation, that bucket fills up until it gets to the very top, and this is usually where the average person compacts the conversation and keeps going. Now there are implications for how compaction happens, and I'll touch on that shortly.
28:34But for now, out of the box, behind the scenes, if you ever open a brand new session, and if you wanna see this visualized, all we have to do is go into Cloud Code and send and submit this called slash context. This will show you your overall context window. So you could see right out of the gate, in this case, this project is very bloated.
28:52The Cloud MD is very bloated. So we start off at a huge disadvantage. We have the overall system prompt that tells Claude how to act in all conversations, and then we have our Claude MD behind the scenes that is polluting our context window.
29:06So we're already at a 80% disadvantage. So if we go back, these are taking a lot of my personal bucket. As you push claw to go do glob and grep, if you remember, glob is searching for patterns, grep is searching for actual searches within files, and you ram all kinds of context and make it read it, this bucket really fills up here.
29:27So the eighty twenty of filling it up is usually associated with reading files or reading code bases, especially larger ones. So on top of that, if you have MCP servers, especially if they're very interactive, let's say you're using the Supabase MCP.
29:42If you don't know what Supabase is, it's a database that has a back end that allows you to more easily create what are called edge functions and tables. If you allow the agents to run autonomously and build these tables and test them out, that feedback loop of the results of those tools, which is usually in JSON format, will take many tokens because JSON is very token heavy.
30:02And if I'm saying the word token is still not resonating, as of this recording, Opus 4.5 has a 200,000 token window of which let's call that a 150,000 words even though it's not one to one.
30:15So you can imagine how quickly things can escalate, especially if you throw something like a PDF. So if I were to open a brand new session here, and I have this huge PDF just to give you a visual.
30:26Let's open this up.
30:30Reveal and finder. Open this up. You could see this is a very large index report.
30:36And because it's full of characters, like I said, it's full of tokens, and you can see that right here. Look at all these lines that to you, the naked eye, you don't know this exists in a PDF, but these are taking tons of tokens yet they offer zero value to you as the user for Klau to know about it. So if you were a complete noob and you said, uh, read this and you said economic index, watch what's gonna happen.
31:01It is going to fill that bucket immediately. We're gonna get to a 100% because we're already at 22%.
31:07Remember, this will take us to a 100% because it's so thick in tokens that it will completely max everything out, and you will be pushed from the beginning to compact the conversation against your will. And you can see right here, it did manage to quote, unquote read it, but look what happened to our context window. It's a 100% used and have yet to do anything.
31:28Now there are many solutions around this. One of them is, like I said before, you can have a script go and read this PDF and then break it down, create a markdown file out of it.
31:37Markdown files won't be full of all that garbage you saw, all that garbage metadata, and then you can feed that in, or you can ask the script to use something like Gemini or Claude or OpenAI and then summarize it and then feed that to Claude, especially depending on whatever it is that your use case covers.
31:55So with that, you can quickly see that you can easily take up space for absolutely no reason. Now there are quick fixes on top of scripts, for example. You could have spun up a sub agent to have a virgin 200,000 contacts window to go and read that PDF and report back on what it found or the summary of that PDF.
32:13Totally an option. So you have to be very smart and nimble on where you wanna use Claude's abilities. It's not worth it just send it blindly to read a PDF and just have it max out everything from beginning.
32:25And one thing that I've observed and many others is as soon as you pass the 40 to 50% threshold, Cloud Code is still usable, but it starts to really degrade in the quality of the answers, the quality of the code, and how lazy versus proactive it is.
32:40So at the beginning of a conversation is where you wanna preserve and be as frugal as possible with that context window. So you can even think of it as a physical budget where you have 200,000, and every single time you send a request, you're sending $10,000, and you keep keep working your way towards that overall window.
32:57If you take that mentality, then hopefully, we'll push you to be a lot more thoughtful with the next steps you take, especially if you're not on the max plan, especially if you're on the $20 plan or the $100 max plan. It's smarter to plan out everything. Use plan mode, especially since they've recently upgraded plan mode to create the plan, explore the code base, then you have the option to clear your context window and then start a brand new session to execute said plan.
33:23Now this part is super important, and this is when it comes to compaction. So if we go back here, I have to naturally do compaction.
33:31This will go look through our conversation, which really isn't a conversation at this point, and summarize what it thinks are the most salient points. And you can see right here, this is the version of its compaction. So it's created the session summary.
33:46It's gone through and walked through the different tiers of requests, the actions conducted, the response, which honestly is kinda useless.
33:54I don't need an understanding of this. We need to carry over the actual context. So we have some minimal information, but you could see it's really devoid of a lot of the details that we'd need to truly carry this conversation forward.
34:07So if we go back into this diagram right here, you don't know what's happening at every point of compaction. Especially if you're five or six compactions deep, you can have a part of the old conversation, then a part of the recent conversation, and then a part of the results of some tools that were executed.
34:25But you as the user, especially if you're nontechnical, are less likely to really audit what's happening. So you want to design your sessions so that you have the lowest likelihood that you run into overly compacting your conversation, which is why spinning up multiple terminals can make sense, and I even record a whole video on how I like to create and spin up different terminals depending on the mutually exclusive tasks that I can identify, which is why I made a whole video on how I like to use different terminals for different tasks if I can make it so that they're mutually exclusive.
34:54So I'm gonna show that up on screen here. You can click on that if you wanna learn more. But the TLDR is one you can always use compaction, but one thing I really like to do is I like to ask Claude to give me a plan or write a plan for itself before I compact on everything that should be summarized from the conversation, all the most salient points, and tell it exactly what to worry about.
35:14So then when I do slash compact, it will take that most recent summary more heavily into account. It'll weight it more when it creates its own compaction.
35:23But ideally, I can avoid it as much as possible. And if you're running into issues where Claude forgets nonstop, then in session one, let's say you discussed x.
35:31And in that session, Claude knows everything. And then when you compact it, especially multiple times, that context is gone. So when you go to session number two and you have a fresh start, Claude basically has amnesia.
35:43Well, again, if it's something that needs to be understood and it's something that you can compress in a couple sentences of understanding, this is where it makes sense to update your CloudMD because this will be your persistent comprehensive command center brain that will persist across sessions, which is why you wanna be so careful when it comes to taking care of your CloudMD because it takes the center of the beginning of your session where, again, it's the most suggestible and the most helpful and at the same time might help you from having to overly compact over and over again.
36:13Now when it comes to session management, this is essentially where you open multiple terminals at the same time. Behind the scenes, you can see it all loading up.
36:21And what's happening while it loads up is it starts and it runs clogged behind the scenes, and then this is where the conversation would happen. And then at the end, you either quit or close it. But the cool thing is is that even when you close it, it persists during conversation.
36:37So most people don't know that all the conversations that you have with Claude code in your terminal are actually stored in the root folder of dot Claude. And all you have to do to retrieve them is ask Claude to go to its original folder and pull a markdown file of all your last conversations. So if you need to, you can ask Claude to search and traverse through different conversations to pull any gold nuggets where you wanna persist that in your memory or your Claude MD file.
37:02So when you finish a particular session, you lose the context window at that point in time. You lose the file snapshots.
37:09You lose any in memory states, meaning if it's read something, let's say that PDF I showed you on screen not too long ago, it will forget that existence of that file. And it's because it's called stateless sessions.
37:21And what stateless means is that outside of the CloudMD and your system prompt behind the scenes, every single session is a blank slate of context. And what persists are the files you've created, like I said, the conversations, if you dig for them, your CloudMD, any Git commits, and installed packages.
37:37And one thing I wanna mention on Git commits is once you get more at ease with GitHub in general, you can create what are called GitHub issues. And a GitHub issue is essentially a series of to do list tasks that can live in GitHub, meaning it lives in the cloud, so you can always refer to it at the beginning of a session as well.
37:56So if you don't want to bloat your CloudMD file, but you're looking for a way to have a to do list that's also not a markdown file in your Cloud Code session, you can level up to using GitHub issues to act as that to do list task. Now when it comes to editing your files during the session, it takes what are called snapshots.
38:13And the whole point of the snapshot is it knows exactly what the file looked like before. So let's say you change a series of things, and these series of things is not deleting your whole drive.
38:23You're changing a file, and it changes it in the wrong way. If you tell it to go back, it has enough context, and it can retrieve that in memory snapshot to go back to the original factory default settings of that particular file.
38:36So to break this down more tactically, before it even edits the file, it looks at the current file state, what it looks like at the moment, then it takes a snapshot, then that snapshot is saved in memory, and then Claude edits the file.
38:50And if something goes wrong, it goes back to restore the original snapshot from here. And then assuming everything is good to go, it updates the new file state. So does these temporary file saves so that you don't have to?
39:02Now like I said at the beginning, Claude code was designed to be hackable, and the five core things that you can use and customize to make your own, like we said at length, are the Claude MD, but also things like skills.
39:14If you're less familiar with what skills are, they're essentially a series of metadata where it's a long form explanation to Claude on how to use and invoke different Python functions. So it's not fully deterministic, meaning it's not fully predictable and it won't work the same time every single time, but it's definitely a lot more reliable to execute a small workflow in a particular way.
39:38So you can think of skills as mini any then workflows or make.com or Zapier workflows where it's not a 100% predictable, but for the most part, you know exactly the entire line of reasoning and the path that input a will take to get to output b. And another reason why I like them is they're injected just in time, and just in time means MCPs usually, when you load a brand new session, they're auto injected in your context.
40:05So if you have a huge MCP server or an MCP server with tons of micro tools, you can start off at a 50% context window with you doing nothing, which absolutely sucks from a user experience. Skills are only invoked when Claude feels that they're needed.
40:21So as long as you very well delineate when those skills are needed, then you should be good to go. Now with MCP servers, they used to be the absolute hottest thing last year until people started poking holes in the fact that, one, many MCP servers are built once and never maintained, or they have security issues, many of them have, and some of them need to have more malleable ways of ignoring certain tools versus others.
40:45Because let's say you wanna use one tool out of a 100 that loads the entire 100 all at once. So I used to use like 10 MCP servers and then over time, especially as they started bloating your context window, now I use one to three. On average, I'll use ones that make deployment of MVPs easy and creation of MVPs easy.
41:04So the Supabase MCP, Vercel MCP, or the Amazon Web Services MCP.
41:09Now when it comes to hooks, can think of them as mini automations that are tethered to different actions that Cloud Code takes. So if Cloud Code reads a file, you can tether a hook to that particular event. And anything we mentioned in passing, whether it's editing, whether it's writing, anything where there's a specific action or a tool call that's documented, you can attach some form of hook to.
41:32And hooks are useful because you can also change the way that cloud code behaves. You can literally have a hook that pops up on screen whenever your contacts window is passing a certain limit as a warning, and all you'd have to do is tell it to enable it in your terminal, make sure that your terminal can show notifications on your laptop, and that could be one example of infinite hooks that you can use.
41:53And when it comes to sub agents, this is something that I've recently gone over in a video that I'll show on screen now. And the TLDR is sub agents allow you to have digital pseudo employees that have their own prompt, that can have their own tools, that can work in tandem, in parallel, and most importantly, if you design it correctly, you can design it so that you don't step on each other's toes and don't have what's called agent collisions.
42:17An agent collisions is essentially when two different agents, let's say, a UI validation or UI improvement agent clashes with a back end agent because they both have to change the same file, the file that maybe powers both of the front end and the back end. So if you're nontechnical, you wanna avoid that as much as possible, which is why typically my gateway drug for you is to use sub agents for nontechnical tasks.
42:42Now CloudMD, I've already spoken about it at length, but in terms of specificity, you wanna think about this as your instruction manual, your project manual for your Cloud Code repo. Can you have multiple CloudMDs?
42:54Yes. You can have one per folder. If you have one repo full of folders, I personally like to create one folder, one repo, one CloudMD, and I separate them all out.
43:04It makes it easier for me. So I have a social media command center. I have a YouTube command center.
43:10I have a financial command center where I manage all the finances of my business and my agency and the community. So I like to isolate each task so I can really tailor and groom the CloudMD for that particular ecosystem. If you work in an environment where it makes sense to share a singular CloudMD, then what you wanna pay attention to is this.
43:29So you wanna be as hyper specific as possible. Your CloudMD should be as compressed and concise as possible.
43:37You don't wanna overdo examples. This isn't some huge prompt that you're throwing into, like, Gemini where you have a million contacts window. You wanna use this as your quick onboarding cheat sheet for each and every session.
43:48So one thing I like to do is let's say I create a playbook for how to do x. Instead of bloating my CloudMD by teaching it how to do x, I just tell it that whenever I, let's say, wanna write a LinkedIn post, I want you to go look at the LinkedIn playbook markdown file in this folder.
44:05So I'll create that one line in the CloudMD. So I use it as a routing source where if it needs to look at this, it knows where it needs to onboard itself more in detail on that particular area. And the other thing you can do is freestyle commands, and freestyle commands is saying, you know what?
44:21Every single time I say reverse, go through our entire conversation and update your CloudMD to learn from anything that might have changed in terms of the way we structure x, y, and z.
44:32So you can change its behavior. You can have trigger words. You can do whatever you want, which is why CloudMD is a blessing and a curse, and it's only a curse if you don't know what you're doing.
44:40Now some people ask me, should I do a global CloudMD across all projects? What I would say to that is if you're just starting out or you're even intermediate, if you have CloudMDs that are pretty different across all your projects, I would stick to one CloudMD per project.
44:55But if you find a way to unify everything, then it can make sense to have a universal CloudMD. So maybe you have playbooks where some of the values of those playbooks deserve to graduate to a global CloudMD.
45:08Me, personally, I'm very careful with anything global because you never know when you forget it, and then you're starting a project. And for whatever reason, something isn't behaving as expected, and you end up troubleshooting for hours.
45:20You could tell I've been through this before, and it turns out that it's just a global CloudMD setting that clashes with one of your prompts. So I know there are many opinions on how to structure a CloudMD, and what I'll say is there's no predefined absolute right way, but there are definitely wrong ways.
45:37And the wrong ways is, again, to overload it to have a twenty, thirty, 40,000 token CloudMD that's loaded each and every session.
45:45So skills, like I said, are on demand workflows, and these workflows can do anything from creating PDFs to creating docx files to reviewing code to committing code. It can be whatever you want. And the beauty of this is you can have technically a 100 skills, but because they're only injected when they're needed, as long as Claude knows when and where to use those skills, then you're good to go.
46:07And it helps you save on space, and one thing that I'm finding is you can convert a lot of MCP servers that can bloat your contacts window into skills. And you'll notice some companies are moving from maintaining their MCPs and updating them to creating skills that are way more powerful at using their API.
46:25Because an MCP server, especially for those of you that are nontechnical, is a layer of abstraction. And what that means is it's an extra unnecessary layer on top of what's called an API, an application programming interface, which means a way that you can interact with the back end of any service.
46:41And because it's not needed and it was a layer that was thrown on top of the AI stack to make it easier for people to connect agents to different services and have two way communication, we don't technically need it, but skills basically capture the functionality of the API service with some metadata explaining how to use it and when to use it.
47:00So in a way, if you're not always using the same MCP server each and every time, you don't need it to work all the time. You can also have it on demand as a skill. Now in the future, I see that skills will roll up naturally into agents.
47:14I think they'll have swarms where they come prebuilt with a series of skills. Now that they're all out there, I can imagine a world where you not only have a predefined marketplace where agents can shop skills on demand, and they are skills that you might not have to create yourself, but they will maybe learn a skill you teach it over time.
47:32So this concept, I expect to evolve, and I'm excited for it. Now one additional note I wanted to make on hooks is you can make hooks fire before or after a tool runs.
47:43So before we write something, you can make a hook occur with that predefined mini automated workflow, or after something happens, you can have it commit and do some form of action additionally. So hooks can happen before and after.
47:59One thing I would say as a point of warning is make sure that once you implement a hook, you decide whether it should be on the project level or the global level. So let's say you wanna experiment with something. The other day, I wanted to build something with hooks where I could stop Claude from making me compact conversations, and I wanted it to become self aware where it could count the number of tokens being exerted through the session and auto compact without me actually physically doing it myself.
48:25So I went down a rabbit hole for hours, but I made it project based. I eventually broke Cloud Code.
48:32I've stopped Cloud Code being able to create the next thought or the next action by adding so many micronuances to the hooks that I was thankful that I did it at the project level.
48:43Because had I done it at the global level, I'd be pretty annoyed and I have to find some way to undo it. The good thing with a project level hook is if it's not working, if it breaks things in that project, I can just blow away that project. It will blow away any associated local settings associated with it, and then we're good to go.
49:00The last part for this section is a double take on sub agents. So I see these as increasingly more important in the future as context window explodes. So once we have a 5,000,000, 10,000,000 context windows, this will be infinitely more powerful, especially as you add skills to each one of them because you don't just have to attach an MCP server.
49:21You could have a set of skills that that each sub agent is correlated or associated with. Once you have that, it's beautiful because you go and you set off an agent, let's say an explore code based agent. You can explore three or four agents at the same time and say, you go look at the front end, you go look at the back end, and you look at any overlap between them.
49:41Then you have preserved your contacts window. You've used their focus to focus on one core task of that part of the code base, and they bring back the TLDR to the main agent or the main session. And it's beautiful because you can continue this flow state and this lucid train of thought while you bifurcate and you delegate all of the additional pieces that you don't want to initially have to create a brand new terminal session and a brand new blank slate with.
50:08So one last concept that's important to understand are extensions, and extensions might vary depending on whether you're a solo DOLI user, you're part of an agency or a company, you're an individual contributor in a larger organization. So if you are an org, then you might have company wide policies on permissions. Maybe you don't let any of your developers go on bypass permissions mode.
50:28Maybe you have a very specific set of settings because you can share settings dot JSON with all of your team, and settings dot JSON would have a list of different bash commands that are white listed. So you could live in a world where you don't have to live in either approval with edits or bypass permissions.
50:47You can have a very specific set that you make uniform across the board. As a middle user, you might have some personal settings, and then if you are really isolating a project where you wanna be able to do whatever you want, you can make all those rules and permissions specific to that project again because you'll have something called a settings local dot JSON, which is different from the global settings dot JSON that would make permissions global across every single project that you pursue subsequently.
51:14And in terms of how this flows, so you would make some form of tool request either in passing or through your prompt implicitly, and it will then check the allow list. If it's allowed, then it's green.
51:25You're good to go. It executes it. If it's unsure or if it's never happened before, that's why if you get started with Cloud Code and you go into the standard mode, it'll ask you permission a million times before you finally get to the point where you're doing something.
51:39Because it's trying to ask, okay. I'm trying to run this bash command. Now I'm trying to do this bash command.
51:44Now I'm trying to install this library. So the first time it sees it, it has to get your permission for each and every micro action, which is better to have by default than not. Now if you say no, then it also updates the settings local dot JSON to say, looks like Mark is never okay with doing x.
52:03So moving forward, every single time that this is about to happen, we either change course or we ask if we can enable it or tell him that we can't do this thing unless you whitelist this action. So examples where allowed, blocked, and asked would make sense is allowed would be auto execution, which is giving constant permission to Claude to be able to read whatever file you want.
52:24Now in terms of blocked, it makes a lot of sense, especially if you're doing something more on the organization scale. They deny things like deleting system files, especially if that's a part of a workflow. This will make sure that you would have to manually go and override this to delete a file, or you would go delete the file yourself as the user slash developer.
52:43And when it comes to ask, it makes a lot of sense to keep ask whenever you work in an organization where security is first and foremost, and you have something live in production with actual customers and users. Because if you install a random library that cloud code thinks is amazing as of 2024 or wherever its last training was.
53:02But for whatever reason, that library or that NPM install is written with injections or SQL injections or bugs or whatever, viruses for all intents and purposes, wanna be able to always intervene and see what is getting delivered into your ecosystem.
53:18So the yellow ask here is helpful in case you wanna be able to micromanage and rightfully so each and every package that's entering the rest of your stack. So TLDR of the TLDR is reading and editing code files usually makes sense to keep that on YOLO. Deleting files makes sense to block, and installing things as well as pushing things to production directly without putting it in some form of a branch or a pull request.
53:42If you don't know GitHub, then close your ears for what I just said. And then when it comes to removing things entirely from your system, from files, also usually too dangerous, especially if you, yourself, your organization, or your team is just getting started with Code.
53:57And if you made it this far, you deserve to pat yourself on the back. This is a big accomplishment. You understand more than 99% of people about Cloud Code even as a nontechnical person.
54:07So before I leave you and I give you some resources as post homework reading to get this really synthesized in your brain, let's just execute one basic action. And now that we can identify everything, it's nice and a beautiful thing to just watch Cloud Code and identify and understand each and everything that's happening.
54:26So if you remember our original conversation, all I wanted to do was read this PDF file without nuking my entire context window. So let's open a terminal, and I'm doing this versus the extension just so that we see more verbosity, some more breakdown of what's happening.
54:42And let's just ask it to create a Python script that uses a very cheap language model, let's say, Gemini 2.5 Flash that has a million context window, and it reads the entire file, loads it into context, and gives us a TLDR. That's all we wanna do.
54:56So let's go, and I'll just go to Gemini 2.5 flash API documentation.
55:05This should pull that up. Let me zoom into the right page, then we'll come back.
55:10So I've located the right model on the right part of the website, and I'll take this link right here. We'll go into Claude, and I'll say, go read this page and understand how to implement and use this model.
55:24So I'll just give it the URL. It can go and search this, and it'll come back with a TLDR understanding of how to use it.
55:32So now that we better understand how Cloud Code works, you'll see first it executed a fetch tool to go and look at the website. It realized that it wanted to go a little bit deeper, so it fetched another part. It basically pulled on the thread.
55:44It got the model overview over here. It told us exactly how it's going to work, and now we can tell it exactly what needs to happen for us to use it. So let's say the following.
55:54Yeah. So I wanna be able to create maybe, like, a function we can invoke, maybe even a skill if it makes sense. Let's call it read large doc.
56:03And then what happens is when we invoke this skill, I want you to use the Gemini 2.5 flash API. Obviously, create some form of environment file, a dot n file, make it available in the main folder so I can put my Gemini API key. And anytime I tell you that I have a large document, I invoke this skill, you use this to take the entire document in memory.
56:24So make sure that we have available context window, especially input tokens that are around 500,000 or 400,000 large so we can actually give it a pretty large file.
56:36And the goal is the system prompt of the request to this Gemini API should be to summarize and synthesize this file so we don't have to ram this in your context window. So a bit of a mouthful, but this should now start its thinking process to come up with all the microparts that needs to do this.
56:56So, again, let's see here. So now it searches for a pattern. It's looking for the root Claude folder.
57:00You can see here we have some wild cards, and then it's looking for a specific pattern for environment. It's looking to see does an environment file to put your API key already exist. Now it's reading the Cloud skills.
57:13It's reading 253 lines. So, you know, as you're stacking the lines, you're stacking your context window, that bucket is filling up.
57:20It's now using the right tools to create a new file, then create the Python file, then create a skill associated with it called read large doc, and that skill should be located in the skills folder right here.
57:36Read large doc. If you take a peek, here's the document put together.
57:40Let me see. Does it include Gemini? There we go.
57:43Yeah. There we go. That should be good to go.
57:47If I go to the very bottom, it's using Bash now, which is taking control of our terminal to see if it can basically send a test request to Google, and it's setting up what's called a virtual environment.
57:58This allows it to execute requests, and it realizes it needs to update its software.
58:06There we go. So it's updating the skill dot m d file. I'm only doing the play by play here, not because you can't read or you can't understand the concept.
58:14But now that we understand this lens, we can better audit exactly what's happening, and more importantly, stop it in its tracks if we absolutely have to. Alright.
58:23So it says everything's set up, and here are the files that were created. So an environment file, a Python file, a git ignore, a virtual environment.
58:30We have a Gemini key that already pasted mine in, and now it even created a slash command. So let's take it for a spin.
58:39I'll open it maybe in a new terminal. Let's do that. I'll spin this up, and hopefully, it should work.
58:46So let's do slash, and I already forgot what I named the skill. That's a skill issue, pun intended.
58:53We'll do read large. Okay.
58:57There it is. Read large doc. Now I'm not sure if the skill is smart enough to ask me for where the doc is.
59:03It's checking the environment file. I think it's just onboarding itself. Okay.
59:08Cool. Now if I say use this file, make sure you don't read it, but make the skill read it.
59:17Now I probably would have done this anyway, but I'm just being careful so I don't nuke my session.
59:24So hopefully, the next action should be to read said skill. Okay.
59:30It's using bash to execute the function right away. If I saw read, I would actually should be worried because they would read the file itself, and then we'll come back to the result.
59:40Well, well, well. So it's so large that it's 1.8 tokens.
59:44So what I'll do is I'm gonna ask it. Cool. Can we add to this skill the ability to take a PDF, break it down to a markdown file, and remove all the unnecessary characters that exist in a PDF so it's hopefully less than this many tokens?
59:58We'll just see if this works. Alright. And that did the trick.
1:00:01We were able to adjust the skill. I'll show you the steps that it took to get here. You'll see it did a lot of changes.
1:00:08So first reads the Python file to see what's missing, then it executes bash to see whether or not any of these libraries exist, any of these packages are installed. It realizes none of them are installed in this project, so it installs them.
1:00:23And the point of those is to break down a PDF and convert it into what's called a markdown file. And then the reason why I love the terminal is I can at least audit, especially as a technical person, what's happening on the code side. If it's over removing things or over adding things, then it updates a skill.
1:00:40It reads the current skill first, then it updates, uses the right tool, the current skill. And once we get to this point, it then tells me the Eureka moment.
1:00:50So the proper PDF extraction made a huge difference. The raw PDF was showing 1,800,000 tokens because it was reading the binary PDF data.
1:00:59That's all the junk I showed you earlier in this video. But proper extraction gives us only 22,000 tokens.
1:01:06So we went down by 98% just by making this one change, and now it's solidified in a skill. So the next time I have a huge PDF and I wanna bring it in, I'm not gonna make Claude code read 1,800,000 tokens when it only has 200,000.
1:01:21I will use a skill, invoke that, keep my beautiful bucket in my context as is, and just bring in the summary which we have right here. So if we go to summary, uh, if I close this out, it walks through everything that Claude would really need to know about this file for us to actually do something with it.
1:01:38Alright. So with that small example, hopefully, shows you a glimpse of what Claude code could look like through your new lens that you now understand through the tools it uses, how it manages its context, and how it functions overall. And if you've been on the fence that you're not technical enough and you made it this far, trust me, you are better off than the majority of people, so this shouldn't stop you anymore from hopping in, getting your hands dirty, and building whatever you want in a terminal.
1:02:03Now if you want access to the diagrams I showed a full guide walking through everything I explained in plain English, I'll make that available to all of you in the second link in the description below. But if this video really unlock things for you and you wanna take things to the next level and upscale yourself, then I would strongly recommend you check out the first link in the description below for my early AI adopters community.
1:02:23I personally manage it every single day, and we have a beginner to intermediate brand new course coming out in the next couple weeks along with the existing systems that I've made available and the new ones we're coming out with soon. And on top of that, we hire all kinds of coaches whether it's for n eight n or Claude code or cybersecurity to help you go to the next level.
1:02:42And for the rest of you, I would truly appreciate if you could just leave a comment and a like on the video. If it was helpful, share it with someone who's learning Claude code. It would really help me, the video, and the channel, and stuff like this really takes hours to put together, so I genuinely appreciate it.
1:02:55I'll see you in the next one.
§ · For Joe

Steal the bucket and the loop.

Claude Code operator playbook

The Gather/Act/Verify loop and the bucket analogy are the two mental models that make every Claude Code session legible — build your content and your CLAUDE.md around them.

  • Use @claude-code-guide to interview Claude Code about your specific use case before building anything.
  • Treat your CLAUDE.md as a routing table — one line per playbook, not the playbook itself.
  • Convert MCPs you use less than daily into skills; you will reclaim 20-50% of your starting context.
  • Before /compact, write your own summary of what matters as the last message — Claude will weight it higher.
  • The PDF-to-markdown extraction skill (Gemini 2.5 Flash offload) is a direct steal: 1.8M tokens to 22K, reusable on every future session.
  • The @claude-code-guide interview format is a ready-made tutorial series: one concept per episode, interview Claude, show the ASCII diagram.
§ · For You

What this means if you actually use Claude Code.

For builders who are not watching to study the format

You do not need to understand Rust or TypeScript to use Claude Code well — you need three words: Gather, Act, Verify.

  • When Claude seems stuck in a loop, it is filling the bucket. Open a fresh terminal and give it a clean start.
  • Never tell Claude to read a large PDF directly. Ask it to create a script that summarizes the PDF — one sentence saves tens of thousands of tokens.
  • Keep your CLAUDE.md under 2,000 tokens. More than that and you are burning your own budget before the first message.
  • If Claude starts giving lazy or wrong answers mid-session, run /context — you are probably past 50%. Compact or start fresh.
  • Skills are free context. Every workflow you repeat more than three times is worth turning into a skill.
§ · Frame Gallery

Visual moments.