Big Idea

The argument in one line.

Claude Code's real power is in its harness layer, and the gap between casual users and high-leverage practitioners comes down to whether they treat CLAUDE.md, agent orchestration, and iterative research loops as configurable systems rather than optional extras.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You have already shipped at least one Claude Code project and want to understand the harness layer at a systems level, not just how to prompt it.
You run a freelance or agency business building AI automations and need workspace organization that scales across client projects without colliding contexts.
You are spending too many tokens or getting inconsistent outputs and want a systematic CLAUDE.md optimization process.
You want to run unattended research or automation loops and need a principled framework for when to use HTTP versus browser use versus full computer use.
You are curious about multi-agent architectures and want a practitioner's honest assessment of which org-chart patterns actually work versus which are overhyped.

SKIP IF…

You have never used Claude Code before - the presenter explicitly points to a separate 4-hour beginner course and this course will not re-explain fundamentals.
You want a single end-to-end build of a specific app; this is a systems-thinking course with live demos but no one deliverable project.

TL;DR

The full version, fast.

The course argues that Claude Code's real power surface is its harness layer and that most users underuse it. It builds the configuration stack systematically: compressed CLAUDE.md files at global and project level, multi-agent fan-out using Opus as orchestrator and Sonnet as researchers, Karpathy auto-research loops for any measurable goal, and a three-tier browser automation framework. The security section covers eight practices that block 90% of real attack vectors, and the closing chapter frames people who internalize harness-layer thinking as holding asymmetric productivity leverage as model intelligence accelerates.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:57

01 · Introduction

Prerequisites, tool setup in Antigravity/VS Code, and course roadmap across 10 sections.

00:57 – 26:47

02 · Advanced System Prompts and CLAUDE.md

Four functions of CLAUDE.md: knowledge compression, preferences, token conservation, meta-learning. How to build global vs. project-level files and use Claude itself to distill past conversation insights into high-density rules.

26:47 – 42:07

03 · Agent Harnesses

Definition of a harness as everything wrapping the LLM. Comparison of Claude Code against Droid, Pydantic AI, Crew AI. Security implications of harness choice.

42:07 – 1:22:36

04 · Parallelization and Agent Teams

Fan-out architecture, stochastic consensus, debate patterns. Parent-researcher-QA system versus lean developer-QA loop. Skills vs. sub-agents structural comparison.

1:29:26 – 1:53:35

05 · Auto-Research

Karpathy hypothesis-execute-assess loop. Live demo improving leftclick.ai Lighthouse score unattended. How to configure the agent and measurement script for any measurable goal.

1:53:35 – 2:07:51

06 · Browser Automation

Three tiers: raw HTTP requests, browser automation (Browser Use, Computer Use), OS-level computer automation. Reliability, detectability, and terms-of-service risk tradeoffs.

2:07:51 – 2:24:17

07 · Performance Fluctuations and Model Diversification

Monoculture risk of single-model dependency. How to blend Claude, Gemini, GPT-4o, and local models. MCP server integration patterns.

2:24:17 – 2:39:16

08 · Workspace Organization

Personal, business, and client project directory structures. CLAUDE.md hierarchy across global, project, and task levels. Directory hygiene and temp file policies.

2:39:16 – 3:00:28

09 · Security

Eight practical security rules: API key centralization, RLS enforcement, dependency injection defense, prompt injection via web content, OAuth basics, and never touching credit card numbers directly.

3:00:28 – 3:18:23

10 · The Future of Claude Code

Three predictions: decreasing human involvement, tooling commoditization, accelerating pace of change. Productivity-divide thesis and the William Gibson uneven-distribution quote.

Atomic Insights

Lines worth screenshotting.

A Claude Code harness is everything wrapping the model - system prompt, hooks, tools, memory, parameters - not the model itself; improving the harness compounds faster than waiting for the next model version.
CLAUDE.md does four things: compresses workspace knowledge, encodes user preferences, enforces token conservation rules, and seeds a meta-learning loop that improves itself from past sessions.
Stacking a global CLAUDE.md over a project CLAUDE.md over task-level inline context injection is the minimum viable configuration for any serious Claude Code project.
The parent-researcher-QA pattern - Opus orchestrates, Sonnet researchers run in parallel, a fresh context-free Opus QAs - is the leanest multi-agent setup that measurably improves output quality.
Every additional delegation step in a multi-agent chain introduces compound probability of diverging from the original intent; most elaborate org-chart agent frameworks produce worse results than a 3-node architecture.
Karpathy's auto-research loop works for any task with a measurable outcome: define the metric, let the agent hypothesize and execute changes, log results, and loop overnight without human involvement.
There are exactly three levels of web automation - raw HTTP requests, browser automation, and computer use - and matching the level to the job is the only decision that matters.
Relying on a single AI model for all production work is monoculture risk; routing research tasks to cheaper models and QA to fresh instances reduces both cost and cognitive lock-in.
Centralizing all API keys in an env file and never letting the model handle them in plaintext is the single highest-ROI security habit for any Claude Code production project.
Row-Level Security disabled on a Supabase database means anyone with the public key can read or delete every row; it is off by default and has caused the majority of high-profile vibe-coded app breaches.
Dependency injection attacks - malicious packages named close to legitimate ones - are a real and underappreciated threat vector for AI coding tools that auto-install packages.
Skills and sub-agents are structurally near-identical - both store name, description, and tools schema - differing mainly in context volume; the distinction will likely collapse as the field matures.
People who understand agent harnesses today are in a sub-1% productivity cohort that will hold asymmetric economic leverage as model intelligence accelerates.
The intelligence gap between current models and human-level intelligence is proportionally small and will compress faster in the next 12 to 24 months than it did in the prior five years combined.

Takeaway

The harness layer is where Claude Code actually compounds.

WHAT TO LEARN

Every quality and efficiency problem in Claude Code traces back to how well the harness is configured, not how smart the underlying model is.

02Advanced System Prompts and CLAUDE.md

CLAUDE.md does four things: compresses workspace knowledge so Claude avoids re-reading every file, encodes user preferences, enforces token conservation rules, and seeds a meta-learning loop that can improve itself from past sessions.
Running Claude over your own conversation history and asking it to distill high-density behavioral snippets is faster than writing CLAUDE.md rules manually and produces more accurate rules.
Global and project-level CLAUDE.md files must not directly contradict each other - auditing for conflicting rules is a required maintenance step as files grow.

03Agent Harnesses

A harness is everything wrapping the LLM - tools, memory, parameters, hooks - and improving the harness compounds faster than waiting for the next model version.
Different harnesses make meaningfully different security tradeoffs; some will execute destructive commands when prompted by injected content while Claude Code's default permissions mode blocks this class of attack.
Understanding what a harness is prevents the category error of attributing model-level intelligence gains to what is actually a harness configuration improvement.

04Parallelization and Agent Teams

Every additional delegation step in a multi-agent chain multiplies divergence probability from the original intent; elaborate 26-agent org-chart frameworks are almost universally worse than a 3-node architecture.
Skills and sub-agents are structurally near-identical - name, description, tools schema - differing mainly in context volume; this distinction will likely collapse as the field matures.
The parent-researcher-QA pattern preserves Opus context for high-level decisions while using Sonnet's cheaper context for parallel fan-out, producing the best quality-to-cost ratio of any tested architecture.
A context-free QA agent that has never seen the project catches a different class of bugs than a development agent that has been building it - the lack of prior context is the feature, not a limitation.

05Auto-Research

Karpathy's loop works for any task with a quantifiable outcome and requires only three components: a metric measurement mechanism, a change-execution agent, and a log file.
Auto-research is not suited for open-ended creative tasks; its power is specifically in optimization problems where success is measurable, the change space is bounded, and rollback is cheap.
Running auto-research overnight turns idle compute into a compounding log of what works and what does not, which itself becomes input for future CLAUDE.md improvement rules.

06Browser Automation

Raw HTTP requests are the correct starting point for any web automation task because they are fastest and cheapest; move up the tier stack only when JS rendering or session handling makes HTTP insufficient.
Browser Use and Computer Use are meaningfully different tools - browser automation operates the browser API while computer use operates the full OS GUI - conflating them leads to using a sledgehammer where a scalpel is needed.
All three tiers of web automation are against the terms of service of most platforms; understanding the detectability and rate-limit risk of each tier is necessary before deploying any at scale.

07Performance Fluctuations and Model Diversification

Over-relying on a single model is monoculture risk: a quality dip, rate limit, or pricing change cascades across every workflow with no fallback, exactly as a crop disease cascades through monoculture farmland.
Routing research sub-tasks to cheaper models while reserving Opus for orchestration and final evaluation cuts cost without measurably reducing quality, because research tasks do not require the same reasoning depth as synthesis.
MCP servers let you connect external tools to Claude Code's harness without writing custom integrations; treating MCP discovery as a first step before building bespoke tooling saves significant development time.

08Workspace Organization

Separate workspace roots for personal, business, and client projects prevent cross-contamination of CLAUDE.md rules and the common failure mode where a client-project convention leaks into personal work.
CLAUDE.md files should never contain temporary debugging notes or one-off context; that content belongs in task-level inline prompts that do not persist between sessions.
A never-create-temp-files rule in CLAUDE.md pointing Claude toward a designated scratch directory eliminates the project pollution that makes long-running projects progressively harder to navigate.

09Security

Centralizing all API keys in a single env file and never allowing Claude Code conversations to handle them in plaintext eliminates the most common credential-leak vector in AI-assisted development.
Enabling Row-Level Security on every Supabase table before deploying anything publicly is non-negotiable; its absence by default is the root cause of the majority of publicly reported AI-vibe-coded app breaches.
Prompt injection via web content, OCR output, or document ingestion is a real attack vector specific to agentic tools; treating all external input as untrusted and restricting execution based on that input is the correct architectural default.
Dependency injection attacks target AI tools that auto-install packages; auditing any unfamiliar package before Claude installs it and preferring pinned version numbers over latest is a practical first-line defense.

10The Future of Claude Code

Human involvement in agentic engineering will continue to decrease; the trajectory from vibe-coding to agentic engineering to fully autonomous research-based production is already underway, not speculative.
Software tooling will cease to be a competitive moat because AI can generate best-in-class software at the limits of human reasoning; the moat will shift to who knows how to direct these systems toward economically valuable outcomes.
The intelligence distance from current models to human-level is proportionally small and compressing faster than the prior five years; practitioners who internalize harness-layer thinking now hold asymmetric leverage in the window before that gap closes.

Glossary

Terms worth knowing.

Agent harness: Everything that wraps around a language model to give it tools, memory, and parameters. In Claude Code this means the system prompt, hooks, bash access, file tools, and context compaction settings.
CLAUDE.md: A markdown file that Claude Code reads at session start to load compressed workspace knowledge, user preferences, and behavioral rules - analogous to a persistent system prompt at the file level.
Fan-out: A multi-agent pattern where an orchestrator dispatches the same research or processing task to multiple sub-agents in parallel, then consolidates their summaries before acting.
Auto-research loop: An unattended agentic loop formalized by Andrej Karpathy where the agent proposes a change, executes it, measures a metric, logs the result, and repeats without human involvement.
Stochastic consensus: Running the same prompt against multiple model instances and taking a majority or aggregated answer to reduce the variance of any single model run.
Computer use: A browser or desktop automation mode where the AI controls the full OS-level GUI via screenshot-and-click, rather than operating a browser API or sending HTTP requests.
RLS (Row-Level Security): A Supabase/PostgreSQL feature restricting which rows a user can read or write based on identity. Disabled by default, its absence means any holder of the public anon key has full table access.
Dependency injection attack: A supply-chain attack where a malicious package is published under a name close to a legitimate library; AI coding tools that auto-install packages are specifically vulnerable.
Prompt injection: An attack where adversarial instructions hidden in web content, documents, or OCR output are read by the agent and override its intended behavior.
Monoculture risk: Over-reliance on a single model or vendor such that a quality degradation, rate limit, or outage cascades across an entire workflow with no fallback.
LCP / FCP / TBT: Google Lighthouse web performance metrics: Largest Contentful Paint, First Contentful Paint, and Total Blocking Time - the three numbers targeted in the auto-research demo.

Resources

Things they pointed at.

00:57toolAntigravity (VS Code fork by Google) ↗

1:35:00linkKarpathy autoresearch repo ↗

46:20toolDroid by Factory AI

47:00toolPydantic AI (py.dev)

1:08:20toolCrew AI

2:30:00productMaker School community on Skool ↗

Quotables

Lines you could clip.

27:00

“An agent harness is just everything that wraps around the LLM that is not the actual LLM itself.”

Clean one-sentence definition that cuts through jargon→ TikTok hook↗ Tweet quote

1:11:20

“Every step along the chain that is further from you, the results and the quality is a little bit more diluted.”

Counterintuitive warning against over-complex agent chains→ IG reel cold open↗ Tweet quote

3:15:22

“You are the 1% right now. You are that group of people that other people will be raising their hands about and shaking their fist at.”

High-energy motivational pull quote that stands alone→ TikTok hook↗ Tweet quote

3:16:03

“The future is here, it is just unevenly distributed.”

William Gibson quote used as the course landing note→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

Hey. This is the definitive Claude Code course for advanced users. I use Claude Code and AI agents in my own business every day to generate over $4,000,000 a year in profit.

I also teach around 2,000 people how to use Claude Code and other tools to improve their lives, both personally and business. Okay. So this course is gonna assume a foundation of Claude code experience.

It's not for total beginners, but if you are a total beginner and you happen to stumble on this course, that's okay. Just look over my left shoulder here, click that button, and then I have a four hour guide that will walk you through everything you need to get to the point where you understand what I'm about to say. Assuming you're still here, no fluff, here's what we're gonna cover.

We'll start with an advanced look at Claude dot n d's and system prompts, and learn how to optimize these to actually improve quality, which is simpler than you think. We'll then cover agent harnesses and how to build larger projects with Claude code. After that, we'll chat agent teams and other examples of extreme task parallelization.

Then we'll do skills, sub agents, and other forms of organization. After that, I'll cover Karpathy's auto research approach for improving stuff progressively over time, and a few actual use cases you can apply this to, not just fancy demos. We'll then talk browser automation.

The major players will do computer use, browser use, and which tools to apply to different use cases depending on what you want. I'll then cover how to deal with performance fluctuations in Claude code because they do happen, as well as some alternatives that you guys could use and ways to bundle in multi agent orchestration into your workflow.

We'll then cover workspace organization. So for personal, business, and then even client projects, assuming you're selling this sort of thing as a service. Security for larger projects, we'll chat stuff like the recent auto mode.

We'll talk a little bit about OAuth. And at the end, I'll finally round it out with a discussion about where I think Claude code is going and the future of work more generally. Hopefully, you're as excited as I am to level up your Claude code skills.

Please use the bookmarks and chapter headings as needed to jump around the course. Subscribe to the channel and let's get into it. So for most of the course, I'm gonna be building directly using the Claude code extension inside of anti gravity.

That's this over here. If you don't have anti gravity installed, this is an installation tutorial, but get that from Google's official antigravity.google website.

Then head over to extensions, click on clog code for Versus code, give that an install, and then everywhere you go, you'll have this little clog logo that you can use to spin things up. After a brief login, you'll have more or less the exact same layout that I do.

I want you to know though that the Clod desktop app is also getting better and better by the day. And because Clod is attempting to get you, obviously, on their infrastructure as opposed to on your own, They're just continuously adding new cool features that allow you to do things like mobile development and so on and so forth.

So everything I'm gonna show you today works in both the Claude code tab of the Claude desktop app, also works natively inside of Claude codes extension with an anti gravity or some other, you know, IDE like thing. So if you're intimidated at all by the way that I've laid things out, what all these different folders mean and how they collaborate in order to improve your workflow, I'm gonna cover all that in this course.

First though, we're gonna cover Claude. Md and other advanced system prompts. Basically, how to set up your system prompts in a very efficient and effective way, both to save you financially, but also to improve the quality of your outputs and significantly minimize the amount of time it takes to build anything.

So what is a claude.md really? Well, as far as I could tell, it's four things. The first is, it's a form of knowledge compression.

Okay? And when I say knowledge compression, what I mean is, instead of Claude having to read through your entire workspace, you know, file by file, like for instance over here, instead of having to open every single folder here, every single one here, read through all of the files and so on and so forth to be able to reason and then make high level declarations about your code base or folder.

What your Claude NMD does, k, is it basically just compresses all of that down into a highly succinct summary of what the heck is going on in your freaking folder. So that the next time you say, hey, what was that file I made a couple of weeks ago about x, y, and z?

Claude doesn't have to look through every single file in your code base. You don't have to spend a tremendous amount on tokens, and you also don't have to wait a long time.

It's just sort of baked into the Claude NMD, or at least a reference to where the file lives is baked into the Claude NMD. So you can actually like reason with it at a superficial level, at a bird's eye level as opposed to actually going down through the weeds. So it's sort of like the very first thing that I'd say, you know, a Claude.

Md is. The second thing that a Claude. Md is, is it's obviously your own preferences as a user.

And what you'll find is, you know, more or less every time Anthropic updates Claude code, you have better and better baked in native preferences and conventions for things like, you know, delivering you file paths or how to deal with like documentation or debugging or how to update itself and so on and so forth.

But obviously, Cloud Code lags behind these preferences a little bit because they have to see what users are actually using it for and and, you know, like, collect that information and figure out what ways to make things more effective.

So if you're an advanced user as I am, you'll have a list of these preferences and conventions that improve your user experience. And, uh, advanced users will always have just some better preferences that kind of adapt their own workflow, as well as, you know, programming conventions, ways to organize information, structures, and and that sort of thing.

Okay? So it's both a form of knowledge compression, but it's also preferences and conventions that are not natively baked in that you get to decide on. The third thing that Claude.

M d is, is it's a declaration of capabilities. Now, I don't know how many times this has happened, but if you do not have a substantiated enough Claude. Md, and then you have, let's say, a skill somewhere in your your your workspace where you have just some knowledge that's sort of floating around in a few files.

And you say, hey, Claude, do x y z thing for me. Go, you know, find some knowledge on x y z person or go do some research or, you know, compile a plan using x y z framework. Half the time, k, if it's not in your cloud entity, Claude will just look at you metaphorically, obviously.

It doesn't have eyes yet. And it will say, like, oh, like, I don't have a built in way to do this. Sorry.

What were you referring to? Do you want me to build something from scratch? I'll happy to do it.

And this this sort of slowdown loop is completely unnecessary. And so what CloudMD allows you to do is it basically allows you just to itemize. Okay?

You know, everything that your agent can currently do within your workspace, and you can make that really clear. You could say, hey, you currently have access to this functionality. You can do this.

Hey, uh, you know, you can build a a full step plan that lasts ten or fifteen minutes and then execute it on an autonomously. In fact, that's my that's my preference or the convention that we're using. You know, you can call this API.

You can call this database. You can retrieve all this information. You can act autonomously using browsers and so on and so forth.

The reason why that's important is because as agentic as Claude is, hopefully, we're we're all still on the same page here about this fact, Claude still lacks a lot of agency. Okay? If you ask it to, you know, help you do something, or if you ask it how long it'll take to do something, it'll often significantly underestimate or overestimate because it's not really factoring in its own agentic capabilities.

Like, I asked the other day, hey, you know, how long is would this x y z thing take to build? And then it was like, about three months or so because you would have to build this, you'd have to build that, you'd have to build that. And it's obviously like, no, I don't have to build that.

I'm asking you to build it. You could build it in five seconds, so why don't you just go ahead and do Or, you know, you're having to do some API stuff and then it sends you a a little command line interface prompt and it's like, hey, just pump this into the terminal. It it sort of needs reminders that, no, I don't have to do this.

That's why I'm asking you to do it. So you can actually do all of this stuff, Claude. Declaring capabilities in this way, whether it's your own personal, like tooling or workflows or whatever, or it's, you know, Claude understanding that it has the ability to do things that it might not realize at first glance is pretty important.

And then finally, the fourth thing that a Claude NMD is, is it's a log of failures and successes. What I mean by this is, as you accumulate various files, as you accumulate, you know, bits of code through your project and stuff like that, every single one of these things is hard won. You didn't get them for free.

Realistically, you spent tokens and then your time, which are soon to be two of the world's most valuable resources. And so because you spend all this time and energy, it is more efficient for you to take all of the learnings basically from every single piece of development or so every single action Claude does, and then insert it in its next system prompt, then just have it restart kind of from scratch every time.

You know, viewed another way, mathematically, if this is the total space of all of the different possible things that Claude could do when you say, hey, do x. What this log of failures and successes is doing is it's basically carving out big chunks of this theoretical solution space.

And it's saying, hey, no, you you don't do anything over here because we've already tried all this stuff over here. It's kinda looks like a planet. Meaning, the only things that you can actually try, the only things that you should try are kind of the things that exist in between.

Okay. So basically, what this log of failures and successes does, is it just allows you to immediately cross out like 80% of all possible things Cloud could do because it knows.

It's actually tried that in the past. And then in that way, focus its time, effort, your tokens, your money, and then your your energy on the 20% that actually matters. So these four will exist in different sections in your Cloud NMD.

They'll also exist at different levels, both global and local. So what I'm gonna do next is run you guys through high ROI ways to combine these four sort of principles behind system prompts, and then apply them, um, both in global, local, and then also give you guys sort of like a a little workflow loop that you can use in order to understand how to update this effectively.

And this isn't just gonna be some big long system prompt that I'm giving you guys, like, think we've probably all seen floating around various sources in the Internet. The reality is, like, Cloud NVs are highly personal devices. But these are gonna be a a list of short principles that will almost certainly help you design better projects and then get more done, whether economically or or otherwise.

So the way that all this is organized within Cloud Code is using two different scopes, global and local. And if you didn't already know, basically, there are a variety of different places that Cloud Code upon initialization will look to to get the prompts that is injected at the very top of its contacts window. Okay.

The two big ones for us are the user over here, which is equivalent to your global. And then also the project over here, which is equivalent to your local. And so basically, what this means is if you have a file called claud dot m d, all caps, that exists within this folder on your computer somewhere, it'll load that up on every claud code session whether or not you're working in the same workspace or another one.

Now if you have a claud dot m d, capital claud dot lowercase m d, located within a dot Claude folder within your specific repository directory, then it'll also be loaded. And in this way, you know, you sort of have like a global precedent that's always injected at the top of every single thing. K?

No matter what. And then you also have sort of smaller little, you know, local Claude and I mean, that's also injected. And collectively, when I say, you know, system prompts from here on out, really what I'm referring to is I'm referring to both of these.

I'm not just referring to one of these. And because global is injected on every single run, there's sort of like different strategies in order to divide the four things that we just talked about. Basically, on your global Cloud.

Md, it makes more sense to put high level reasoning and then your own personal beliefs. And then in local Cloud. Md, it makes more sense to insert local low level knowledge.

So stuff like I just talked about with the workspace itself. So, you know, if I were just enumerating all of these things up here, okay, you'd put your preferences, like your global preferences.

These could be things like, hey, you know, when you return a file, unless you return the absolute file path to click on it because whatever editor I'm using doesn't really have take that into account. You know, could be things like programming conventions. Hey, I want you to program using, I don't know, object oriented programming or hey, I want you to do like functional programming in in Rust.

Hey, when I ask you to develop a new project, I always want it done in Rust as opposed to, you know, Python or or something like that. Alternatively, it could be stuff like, hey, you know, if I ask you to do something using a tool you're unfamiliar with, always go and read the API documentation first before attempting to start.

Because every other time that you've attempted to do something without the API documentation, typically run out of loops, you waste x y z tokens. So make sure to load the API docs. By the way, you can't load the API docs through, uh, you know, HTML, then make sure to, like, load up a a Chrome DevTools MCP server to go and get that stuff even if it's dynamically loaded through JavaScript.

Okay. So these are high level reasoning strategies. These are your own preferences.

These are your own conventions. And then also, these are going to be just sort of like agency capabilities.

So stuff like, hey, Claude, you can actually do x y and z. If you believe that you can't for whatever reason you're wrong, you can absolutely, you know, go and do whatever you want. The local low level knowledge.

Okay. This is gonna be stuff like backslash in it, which I'll show you guys in a second. And so it's gonna be like a compressed version of all of the knowledge on your workspace.

Instead of Claude having to, in the future, go through every single file, it'll just be able to read the Claude and be sort of a loose understanding, like, okay. What's where? Why have we built this?

What's the purpose of this workspace? And so on and so forth. Some additional things you can do are things like context about you and your goals and your own reasoning strategies, your own communication styles.

So I'm gonna give you guys examples of my own CloudNMB in a moment where you guys see that I actually give it a lot of context about who I am and why I want what I want. I'll run it through, you know, reasoning strategies that I personally use that have, you know, yielded me a lot of success in the past that may actually not necessarily be the optimal reasoning strategies, but which I tend to understand.

And because I'm communicating with this thing in every freaking every five seconds nowadays, I'm I'm better capable of understanding what it's putting across if we use those principles. And then, yeah, those high level preferences and then generally good token conservation strategies. Whereas with the local, you know, it's a description of the project where everything is, low level preferences, like specific API docs and usage.

If you are using, you know, the Go high level API to do some project or whatever, you can actually just, like, have the whole Go high level API existing within your project. That'll minimize the number of tool calls that, um, Claude has to make to, you know, some sort of research sub agent go and do the thing for you. Instead, it can remain local, reduce total token usage, and then also just be faster and then more accurate, and then capabilities within the project.

And then that takes me to the local workflow. So and then that takes me to workflow. So there's two sort of workflows here that I wanna talk about.

There's the local workflow, and then there's the global workflow. And the local workflow is gonna be responsible for updating our local Cloud. Md.

And then the global workflow is gonna be responsible for updating our global Cloud. M d. Like, it'd be nice if I could just give you on a silver platter a bunch of stuff to put in your Cloud.

M d. Right? I think that's what a lot of people want.

But you're gonna end up a much better both developer and then a much more productive person if you understand the principles at play here and develop your own. So initially, to start, anytime you're developing anything in in Cloud Code or whatnot, obviously, you need to plan your feature.

And I say feature here loosely. You know, I use Cloud Code as basically like my business assistant nowadays. And so I use it to do anything from reading my emails, to grab me news summaries in the morning, to to communicate with x y z people, to design me, you know, websites and so on and so forth.

So feature here is really loose. I'm not just talking like about a vibe coded project. I'm talking about anything.

But what you do is you start by planning a future. Right? And then if you think about it logically, what Claude does next is it instantiates the future.

However, over the course of planning and instantiating, okay, it will fail a bunch. It'll also succeed a bunch of other times.

And ultimately, there'll be a giant list of learnings between, you know, step one to step two. And so what you do after you instantiate is you actually compile all those learnings, k, into some efficient high information density thing that doesn't seem a lot of tokens, then use that to update the Cloud.

Md. And so this is your local workflow for managing your system prompt. And you basically just do this every time.

You plan something, it'll do a bunch of failures in the way, then you'll instantiate it, you'll take all those learnings, update your CloudNMD. That way the next time you plan a feature, it'll already have all the benefits of the failures plus, you know, any additional things that are learned along the way. And so the first time around this loop, you know, it might take, I don't know, let's say x time to develop a feature.

The second time around this loop, you know, maybe to take like 0.9 x because now, you know, you've shaved off 10% of the the the search space and it's a lot faster. The third time you go, maybe it takes 0.8 x. Okay.

And so like the time will just get faster and faster and faster every time until eventually you develop things. Using Claude in a similar way that you would develop if you were not using Claude. Now here's where it differentiates between the global workflow.

Workflow. What happens is, you know, as you accumulate a variety of failures, successes, and learnings, so on and so forth, your current local cloud and it gets really, really good. After all that's done, what you do is, you know, after hundreds of these runs, k, you can either pull a slash insights feature or you can run that yourself to show you guys how to do.

What this will do is this will compile, not like at a local level, but at a global level, all of the things that Claude attempts pretty consistently and then struggles with pretty consistently. You know, it's like, oh, hey, I noticed that not only on that one project, but also in more or less every project, Cloud consistently goes down silly rabbit holes it doesn't need to, and then tries coming up with its own stuff instead of just consulting the docs.

And so, you know, after this is done three or four times, obviously, there's a trend. Right? So what you can do is you can take that information and then you can pump that in your global.

After that, what I'd recommend is is I'd recommend you manually review because Claude is an agent at the end of the day. And the more AI steps you have, the more you compound probabilities and the less likely that it becomes that Claude itself is making, like, the right call. You know, if, like, Claude is independently 90% successful on a task, and then you give it to another Claude, which is 90% successful to a task, and then you give it to another Claude, you know, what you're really doing mathematically is you're going, um, 0.9 raised to the three.

And if you just do a little bit of math there, that's not 90%. Right? 0.9 to the three is 73%.

And so I guess what I'm trying to say is, um, the more steps you have without a human in the loop here, uh, the lower the likelihood that your total determination will be correct. And because this is your Cloud. M d, it is your global preference and convention file, it will be applied to every future project.

Meaning, if there is a place you should spend human time on, it is this exact step here. So I'd recommend manually reviewing that. Once you manually review that, then you can add some har high ROI bullet points to your Cloud.

M d and so on and so forth, You know, just like a high information density version, and then you can actually update the the Cloud NMD. And then you can repeat this loop a few times if you'd like before finally going back to the local loop. And so, I mean, it's kind of like a I don't know what you wanna call it, an infinity sign.

K? Kinda starting here, you're going kinda like this and then you're kinda looping back and then you're just doing this over and over and over and over and over again.

Obviously, you're gonna spend a lot more time in this loop, but eventually, you're gonna go down to this loop. And this is how I personally develop using Cloud NMD. This is why my workspaces are super tight and then instead of me, you know, using a vanilla version asking it, hey, do go do x y and z, and then it like stumbles around, uses 20,000 of my tokens and God knows how many of my dollars.

And when I say, hey, I'd like you to do x y and z thing, I'd you to scrape some leaves or whatever. It already has all that stuff baked in while still being flexible enough that I could change them anytime that I want. Okay.

So next, I'm gonna show you guys basically my workflow every time I start with a new project. Assuming that I've already done a little bit of work in the project, I don't have a cloud dot m d and I don't really have any of that like advanced tooling or system prompt harness and stuff set up. This is exactly what I would do step by step.

So first of all, you need to open up a folder. I was just learning about Tomatillo's earlier. That is sort of embarrassing.

But anyway, in anti gravity, just go open recent, and then I'm just gonna open up something. Why don't I do, you know, anti gravity example right over here? And, you know, when I'm in this folder right over here, obviously, there are a bunch different files and, you know, configurations.

This one's using Gemini for a while. So what I'd like to do next, I'd like to open up a Claude code. And so I'll click on that button over here.

Let's close out the agent window because I'm team Claude, at least for the moment. Thank you, Space Invader. And really, like, the first thing that you do is, you know, you you develop on your own.

I always recommend just, like, don't try baking in any opinions into a Claude NMD until you've at least developed without a Claude NMD or some sort of advanced system prompt for a little bit. And the reason why is because, like, you'll find Cloud's actually really good out of the box.

As mentioned, they're incorporating more and more of these features natively within it. And so, like, it's it's great. It's not like the harness that makes the intelligence.

It's obviously intelligence inside of it that sort of, you know, communicates with your system prompt to to get good. But right now, it's already fantastic. Anyway, after you've done some developing for all, this is obviously some sort of website here.

It's like a template using VIT. Just go slash in it, just like that. And basically, slash in it will go through, read every single file in your workspace, which I'm currently doing with fast mode, if you're wondering why this is probably faster than what you're doing.

And then at the end of it, it'll come up with basically like a highly optimized Claude dot m d file that succinctly and effectively summarizes the placement of everything here. And you can see it just generated one called claud dot m d.

So comes with the built dev lint commands, note that no test framework exists, some architecture review key dependencies and their roles, then some style conventions as well. So now I'm gonna open up this claud dot m d.

Okay. And why don't we just move this over to the main window, so it's a little bit easier to see. And you can see that more or less it it just at a very high level summary takes every single line in my entire workspace, then it just significantly increases the information density at a cost of like total comprehensiveness.

So what I have now is I is I have a summary of everything. So that means is the next time that I ask Claude anything about my workspace, k, the next the next go around, I don't actually have to like have it like run through every single thing in the file.

Like for instance, what I'm gonna do here is I'm just gonna call this like, I don't know, xyz.md. Or actually, you know, why don't I just delete this for now?

You know, if I had asked this Claude version something about dark mode, hey, what are my opinions on dark mode? It's It's gonna check its memory for notes on the preference.

It's not gonna find anything. And notice how it's just gonna say there's there's there's nothing at all. So what I could say is read through whole project and find my preferences.

And now what it'll do is it'll, know, essentially launch some sort of a gentic search with readmes and so on and so forth until it finds something about dark mode. In this case, was in the Gemini. Md.

But I want you guys to know that, you know, whether or not you have it in a Gemini. Md or it's just sort of written and eventually figure it out. Now, the issue is, you know, how what what sort of usage did we just do in order to get that?

If I just scroll all the way up here, type slash contacts, you know, the system prompt was point 6%, free space was messages is 0.9%. So that last message chain there with the tool calls and everything like that might have realistically taken like five or 6,000 tokens.

I don't need to do that sort of thing ever again. You know, if I bring that back and go claw.md, and then if I just open up a new instance and I say, hey, you know, what are my opinions on dark mode?

Obviously, it's gonna read the claw.md. And, you know, instead of me having to use god knows how many tokens, if I go back to slash context, you know, you'll see that I've now you only use point 2%.

So basically, save myself what's that? Like, 6,000? And let me tell you, these Cloud tokens ain't free, man.

Tropics increased in the price pretty aggressively, especially recently when they realized 99% of the world is now operating using their infra. So I guess what I'm trying say is I'm spending, like, literal, like, money, but I'm also spending time. And to me, the bigger thing is time.

But what are some other things asked? I mean, like, think about deployment. If you have any sort of like front end or full stack experience, you'll know like, you know, usually the flow is you start with a dev server.

You use that dev server via NPM run dev or equivalent to like figure things out on your, you know, develop various features and so on and so forth. Then you'll build, you'll do some sort of linting, and then once you're done, you'll actually, like, preview it.

You'll you'll push to production or or sorry. You'll push to staging and then verify that and then eventually you push to production. Right?

Like, obviously, this is something that it could have learned just by going through the folder structure, seeing source, public, node modules, all these things. But, you know, I'm just listing them out over here so that instead of you having to actually read any of that filing or tooling, you know, you can do it in God knows how many what sucks?

Five tokens, six tokens, or something immediately. Likewise, you know, I see where things are laid out. So in this case, this is obviously a single page application.

The entire app lives in a single component, Nav Hero services, projects, and footer sections, markup and logic is here. It is evident if you were to actually click on app.jsx and then scroll through that that is the case. But look at how many more tokens app.jsx is versus, you know, just that brief little description in CloudNMD.

If I were to copy and paste the entirety of this into something like a word counter, you could see it's 827 words, approximately 1,100 tokens. K. If I go back to my CloudNMD, like, long is how long is this?

It's 22. So that's a what? 45 x compression ratio?

That sort of compression is how you ultimately get a significantly better and more effective clot because you are not shoving a tremendous amount of tokens at the beginning of any query. And, you know, as we hopefully know, token length tends to scale inversely with the quality of the output.

The more tokens in a context window, not only the more money are you spending, but typically the lower quality the results are. So just avoid all that by initializing and then storing a bunch of information about, you know, what the project is on. You'll be you'll be much happier for that.

But, know, slashing it isn't the only thing that I would do. From here, I'd actually start importing a couple of my preferences and then things that it's tried. So I don't know.

Let's just say I'm gonna remove the gemini.md for simplicity. Let's just say I'm developing a new feature and actually, why don't we just visualize app IDK what it looks like?

Let me actually take a look at this thing. So it'll run the dev server so I could see it in the browser. And immediately, I'm thinking like, hey, know, this is actually kind of inefficient.

When I say visualize app, I basically just want you to launch So store in your cloud.md that when I ask you to run the dev server or open the app, I just want you to open it in my Chrome instance as well. I don't just want you to run the dev server. You know, basically, next time I say this, I don't want it just to like say, hey, the dev server is here, give it a click, and then I'm like, okay, can you just open it because I'm already here.

I just wanted to open it automatically. Right? Okay.

Cool. And I see, you know, it's kind of over here. So that's nice.

Definitely not a fan of the design. I don't like how it scrolls through. I'll just say, hey, I want you to significantly improve the perceived visual quality of the application.

Go and look up, you know, the Apple website and then compare that to our website. Make some changes that just improves both the perceived visual cohesiveness, quality, etcetera.

Must have been a Gemini website. Anyway, let's see what it does here. It's fetching Apple site for design reference, reading the current app code in parallel.

And now it's just doing a bunch of updates, editing the CSS, nav dash link, hero dash background, and so on and so forth. And you can see that it is actually updating the site. I mean, it's doing it currently in real time, but it's looking significantly better.

It's also picked up some new colors and so on and so forth. Now, what's cool is it actually just opened up a second project for me right over here because earlier on, I'd stored my preferences that I don't want it to just give me the link. I actually want it to, you know, open open stuff up.

So that's nice. Obviously, we have better designing and stuff like that. But the key part here is when I say, okay, great.

Nice job. How could you have arrived at these conclusions and done everything I just asked you to do faster? Okay.

And now look, we're already at the point where we're capable of optimizing a fair amount of these design issues. Instead of 20 edit calls, which is what it did before, what it could have done is just do one write call. So the reason why that took like thirty seconds or so, because it was editing the CSS file line by line across 20 sequential tool calls.

I should have read the file, rewritten the full thing in my head, done a single write to replace index at CSS in one shot. Yes. I'd like you to save this in the local cloud dot m d.

Do it as a user preferences section. So asking it questions like, how could you've arrived at those conclusions and then everything I just ask you to do faster and for fewer tokens is pretty powerful. Doing this consistently as you develop and design a project and then having a running log of changes to the Cloud NMD is also quite valuable.

Another thing you can do is you could set a meta prompt in the Cloud NMD, and that's personally what I always do. That basically says, like, when you have made a mistake, I want you to update the Cloud NMD with a running log of things not to do next time. When you've made a mistake, I want you to update the Cloud NMD with a running log of things not to try next time.

Essentially, want this to be almost like a mini experimenter's or research person's notes that shows what a future Claude instance should not do while working on this project.

Update the Claude. Md to reflect what I just said at the very bottom. K.

Now it has a section called lab notes, what not to do. This is going to show a bunch of failures, as well as learnings and successes and so on and so forth.

And we're already honestly, like halfway down the loop. Now, is a very contrived example because I'm literally just building a website. But imagine that, you know, instead of just a website, you're building a workspace that is meant to contain all of your business, basically, entirely.

All of your SOPs, it's meant to contain all of the work that you do on a daily basis. It's meant to contain your to dos and so on and so forth. Having information like what I just showed you for this project would be invaluable across more or less all levels of both development and then also productivity.

And that's personally what you should ultimately be working towards. So, anyway, we can make this as complicated as we want, obviously, but hopefully, you guys see that loop at work. We plan a feature.

So we just did this. It was simple enough that we didn't need to use a dedicated plan mode, but obviously, I still one shotted it. After it implemented the feature, along the way, it did a few things that realistically could have done better.

So what do we do after? We take those learnings, we compile them, and then we update the cloud.md. And this was sort of a med example since I literally was doing it while I was building the cloud.

Md. But hopefully, you guys at least understand conceptually of sort of what you do.

After four or five of these runs, there's probably a fair amount of stuff here that you can take advantage of. And that's where an insight run would make sense. So let me actually zoom in and then just delete this so you guys could see.

In case you didn't know, insights is a simple slash command that basically runs a bunch of sub agents across all of your cloud conversation history. The benefit to that is now, not only are we running, you know, and and changing our local cloud NMD, we're also evaluating all of, like, the patterns in communication that we've had with cloud NMD over the course of the last I don't know.

Could have been, like, few days, could have been months, could have been, I mean, years, depending on how soon or late rather you are watching this video. So, um, just like we optimized our local cloud in MD, now we can start optimizing our global. And while it's chewing away, because insights does take a fair amount of time, k, I'm just going to create a new file here.

I'm a call it global cloud dot m d. And I'm just gonna give you what I would consider to be, at least as the time of this recording, probably like some of the higher ROI principles to make sure to include.

I include this in my own global cloud in m d because I think it's just very, very valuable. So I'll say global Cloud NMD. This is inserted at the beginning of any conversation with Cloud across all of the users' workspaces.

So first, I have a profile section. So this is a bit about Nick. So, you know, I don't know what it'd be like.

Nick is a 30 year old, and Jay, high performing Internet entrepreneur.

He runs a YouTube channel at 350 better be 350 by the time I make publish this video. 350,000 subs on Instagram channel, and so on and so on and so forth.

K. And so I have a bunch more information which I've taken from just a couple of other systems I've built. This one here is Nick is a 30 year old and Jay.

Here's his revenue, so here are all the different things that contribute to my revenue. Here's some churn math, some of the companies that I'm currently owning, some teams.

Right? So it's me. It's an editor.

It's a LinkedIn newsletter person. That's a bunch of AI agents. Bunch of information on YouTube as well as my goals, and then ultimately some on Instagram as well.

And you're thinking like, Nick, this is crazy. Why would you insert all this information in your global Cloud NMD? Well, the reason why is because I want this on every conversation that I have with it to understand who I am and to take that into consideration when discussing things with me.

I can't say how many times I'm having a conversation with Cloud and because I don't have context like this, because I'm in a naive thing with no personal system problem checking the context window. I say something along the lines of, hey, what's the best solution for x y and z? And then it says, oh, you're gonna wanna do this solution.

And then I say, why? And then it'll say, oh, because it's the cheapest. Right?

It only cost 0.2¢, whereas the other other solutions cost $5. And I'm thinking, well, if you knew a little bit about who I am, you'll know that money is not the primary bottleneck right now.

I prefer you to exchange my my money for my time. So just giving it some, like, high level principles like that is is very important. Anyway, while I was doing that, the actual shareable insights report is ready.

So I'm just gonna tell it to open it so I can take a look at it with you guys. And now you'll see there's an HTML page basically that runs through everything about Claude, all of the insights across all of the sessions. Looks like 1,849 messages across 200 sessions.

I don't know where this chooses the cutoff. It looks like it's like about a month or so. Although, keep in mind that, like, this is clog code specific, and I don't know if this encapsulates all the conversations I've had with them on the desktop app, but pretty good.

And you can see here that, you know, there's a bunch of context about what I work on and and so on and so on and so forth and how I use it and all this stuff. So the the important thing to do is existing features to try section. You can just copy this in the Cloud Code and add in your Cloud NMD.

So for instance, when using Chrome DevTools MCP or browser automation, always kill stale Chrome processes in a clear profile before starting. If the m c p tools fail twice, stop and ask the user for continuing to retry. Never waste tokens on repeated failing browser calls.

This is actually quite valuable just given how many times I have tried to have it run, you know, Chrome dev tools m c p and it's failed. Um, same thing over here. Same thing over here, you know, with some face swap information and stuff like that.

You can copy all this in the cloud and it'll set it all up for you, which is pretty valuable. As well as it it can even go and build like new skills based off of things that you consistently ask. So that's that's more or less what I'm doing here.

Anyway, the the value with this is basically to like copy the entire thing, go back here, paste it in, and say, this is my claw and insights file. It describes at a high level a few of the obvious design patterns in my thinking, and then a couple of the issues that I've had communicating with you and other versions of you.

I'd like you to distill this into a list of high information density snippets that I can paste into a global Cloud. Md to be both token conservative, but also avoid most of the mistakes that you typically make. And I'll just press enter, it's going to give me some information about that.

And over here, actually have the changes and this is very high information density. Right? It basically took a bunch and said, don't over explain, over engineer, add un requested improvements.

When making widespread changes to a file, he used one right instead of many sequential edit calls. Speed matters, don't fetch well known websites. Again, a rerun, browser automation, and, you know, so on and so forth.

Just some just some high level stuff. It looks like it just inserted that in here, which is quite nice. So now, what do we have?

We have, if you remember, some context on me in the global Cloud ID. We also have some high level reasoning rules and principles. And really, what we're just missing is some token conservation strategies.

And you could see this by you know, you can go back, rewind the video if you'd like some more on that. But basically, you want context about you, your goals, and your reasoning strategies, some high level preferences about, you know, what it is that it is currently doing that is wrong that you would like it to fix, and then some good token conservation strategies like docs first.

So what I'm gonna do is underneath interaction rules, I'll also just say oh, and what's really interesting that I'm seeing, one of my rules are actually directly contradicting some of the other rules.

No fetch, well known sites. I'll actually just remove that. That's the human in the loop part.

Right? Just look to see if any two rules directly contradict each other. Then I'll say, when a user asks you to use a nontrivial platform, one for which you do not have context in always look up the documentation first, You can do so by looking into API documentation plus platform name.

After, if for whatever reason you can't access the docs for JavaScript reasons, launch a Chrome DevTools MCP Chrome instance so that you could still copy and paste all that data. No matter what, if you're working on a project for whom API documentation is available, you should always go through the API documentation to avoid 99 of the errors.

The tokens we spend reading the docs will save us a lot of tokens in trying to use things that don't work. Cool. So I'm gonna copy that.

And now I have my global cloud NMD. And, you know, I could obviously just have Claude actually insert that into the global cloud NMD. I could also just, like, go and find the find the finder.

So I'm gonna go to finder on Mac. Basically, you can find your global cloud NMD just by going to your Mac, in my case, users on my Nixarayef. And then there's a hidden folder here, which you can't actually see just right out of the gate.

You should be able to go shift command. I think it's comma or period.

There you go. Shift command period. Once Once you're done with that, you can scroll all the way down where it says Claude.

And then over here, you'll see that there is a Claude dot m d that lives within that Claude. So what I can do now is I can just reveal this folder in my finder, compare it to that folder in my finder, and I can actually just go drag and drop this in. I have global cloud and I can just remove this cloud and replace that with this cloud.

Awesome. So now all future conversations that I have with cloud across all of my workspaces and all of my folders will include the information that I just provided. And hopefully, you guys see how simple it is to run that loop.

Granted, this is an informal loop. I'm not really showing you guys like a simple formal streamline process, but hopefully, you see how easy it would be to build that in again as like a meta clod. Let's talk a little bit about agent harnesses.

So agent harnesses, the term anyway, has gotten a ton of interest over the last couple of months because it's sort of new and exciting, but very few people actually understand what it refers to and what it means. An agent harness, to be clear, is just Claude code.

Claude code is the harness around the model Claude that enables it to do things like call various tools and get actual economically valuable work done.

For those of you that don't know, all that like AI models are are just text interfaces. Right?

It's just text in text out. A harness is what turns something that can only communicate in text into something that is ultimately capable of like controlling our computer. So the way that I personally think about the question, what is a harness is harness is just everything that wraps around the LLM that is not the actual LLM itself.

So in our case, it's Cloud Code. It's the system prompt. It's the hooks.

It's the tools that it has access to and it's the parameters they're in. The control things like when the memory auto compacts, how many messages you can send in a turn, what the total number of token limits are and so on and so forth.

For the purposes of this demo, let's pretend that this server here is our Claude space invader. And so this is sort of like the the the large language model itself. This is actual Claude.

And so Claude is obviously like a galaxy brain intelligence. It's been trained on God knows how many books and blog posts and encyclopedias and so on and so forth.

But you know, Claude is sort of it sort of exists in this boundary where it can't actually do anything outside the real world unless it's given the tools and the ability to do so. And so, one example of things that Claude has access to are set tools.

So that's things like, I don't know, the ability to use bash, like, use a terminal. The ability to use, I don't know, grep, which is how it finds things around your computer and so on and so on and so forth. Another thing that it has access to, kinda going back and forth, is some form of memory.

Right? What it can do is it could read, so it could read things that are stored in this memory, and then it can also write, so we can add sort of update things as needed. You know, there's obviously also a variety of other things here that it has access to.

And, you know, if it didn't have access to all these things, again, it would just be like an agent or a a model, sorry, that exists in the box. And so that's really the difference between, you know, LLMs and agents. Agents are LLMs plus a harness, whereas LLMs by themselves, large language models, they can't really do anything.

Obviously operate entirely in the domain of knowledge. So just given the fact that it's called a harness, you can kind of think of it as, you know, I'm gonna draw a really crappy dog here.

Put another way, here's a really crappy rendition of why I initially wanted to be Canadian dog sledding and what ended up being looks more like Santa with a big fat beard riding a questionable reindeer. But basically, you can imagine that like this right over here, this is your LLM.

This is the actual model intelligence. And then you over here, okay, this is your harness. This is actually like the the code part of Claude code that sort of controls it.

And so the LLM wants to go in a bunch of different ways and wants to do a bunch of things. What the harness does is it just sort of narrows down its direction. And, you know, you can kind of almost think of it like the barrel of a gun or something like that.

Right? Whereas, you know, back in the day, you might have had, like, cannons and then you might have loaded those cannons with big massive cannon balls and they're huge.

And what you do is you'd stuff some additional gunpowder underneath and stuff like that. And those cannons would kind of and despite the fact that they were operating off the same fundamental technology, which is gunpowder, they might not really be able to go so far. I don't know.

Let's just say 50 meters. Nowadays, obviously, we have this is my really crappy gun drawing with, you know, more or less the exact same technology.

You put some sort of bullet in there. Right? But then because of the technology that surrounds the core thing, which is the gunpowder, you know, the bullet can go a lot farther.

So maybe instead of 50 meters now, it can go, I don't know, 250 meters or so. So this is how I this is how I think about harnesses. Okay?

And I don't mean to just show you a bunch of silly grade school analogies, but it is important to realize it like that is what now Claude code really is. And because Claude code is a harness, obviously, are a bunch of other people that have tried making their own harnesses as well. Just like we have frameworks like React and Vue and then Next.

Js and and and Nuxt, we also have a bunch of different harnesses that have been developed that supposedly work on and then improve on on specific aspects. What are some of those aspects? Things like security.

Right? Automatic permissions. So plan mode versus default mode versus the new enable auto mode and then bypass permissions mode.

You know, there's some harnesses out there. Okay? There's some AgenTek SDKs and stuff like that.

I'm not gonna name any names, but there are some of them that are probably a little bit less secure than others. Such that if they were to read a Twitter thread that looked like this, maybe they would actually execute pseudo r m dash r f and delete your entire hard drive. Right?

Bunch of examples of people screwing around with us. This is an example of codex, which, you know, being an extraordinarily competent model, I can't really talk down too much on. But this is an actual conversation that, you know, it had with somebody that I found on Twitter.

You know, the model basically tried running something that was like r m dash r f, which to make a long story short, in case you didn't know, just deletes everything. And here it says, well, the shell policy actually blocked the raw RMRF. So what I'm doing is I'm removing those generated directories, like, in the shell policy with a Python cleanup instead.

Same effect, less policy friction. Right? It's just gonna go end up deleting the entire thing.

You know, the the harness impacts a model's ability to get things done. It also impacts ultimately the safety. It impacts like the memory and so on and so on and so forth.

And so, hopefully, at least now you guys understand what the harness is before I show you guys some examples of different versions of it. Obviously, Cloud Code is the major harness today, but there's a great blog post over here by Langchain that more or less describes a way to create different harnesses.

The model gets a certain type of context injected into a prompt's memory skills or conversation. Then you also have orchestration, things like Ralph loops, which was really big a while back. That was a different type of harness.

You know, there's a certain persistence of data, actions, and then the ability to both observe and verify, say with screenshots and stuff like that. One harness that a lot of people are using now is this sort of Droid idea, which shows built by Factory AI.

So Droid is like a publicly available harness that you can run and download today. Py dot dev is also exploding in terms of popularity. So whereas Claude code, you know, obviously needs to run with Claude infrastructure.

Right? Claude is the model underlying Claude code. This Py coding agent is sort of like the open source provider of it.

You can feed in more or less anything that you want, including Claude, and then just have it operate inside of this this harness. And, you know, what this does is it just changes the way that we store memories. It changes the way that we store certain files.

It sort of like modifies. It's almost like an alien or bizarro version of Claude code. And so far that it changes a few of like the fundamental constants, like how long before context compaction, you know, how do we try different types of solutions and stuff like that.

Various baked in behaviors regarding a cloud code and and so on and so forth. And the reason I'm covering this is because, you know, this is something that was very fundamental to Anthropic. Back in 11/26/2025, they wrote a big long blog post called effective harnesses for long running agents, which at the time kind of changed the game.

And I would say this is the beginning of the kickoff of ClaudeCode superiority over most other harnesses. And so, you know, here it describes various different ways to work on long running coding projects and manage environments and stuff like that. And so obviously, this is something that's like very fundamentally baked into Cloud Code.

If you wanna understand Cloud Code in an advanced level, uh, you can't get better than getting it at a harness level. Okay. So, you know, obviously, this is a Cloud Code course.

It's not another harness course, but you should at least know what agent harnesses are before we proceed to the rest of the course because, you know, the more understanding of harnesses you have, I think the better you'll be able to appreciate and then digest and ultimately execute on what I'm about to show you. Next, I wanna chat a little bit about parallelization, about things like agent teams, about sub agents, and a couple of other ways of distributing work to minimize the amount of time and effort that goes into things, while also increasing the quality of the output.

Okay. So I say agent teams here, but let's start with parallelization. A big question that I think a lot of people have is, well, first of all, what the heck is parallelization?

Which is just doing multiple things simultaneously instead of waiting for sequential things to finish. And then the second one is like, Nick, why the hell should we paralyze our agents to begin with?

Into that, I say, have you ever, you know, sent a long running task request to Cloud Code and actually had Cloud execute on something for more than a few minutes? For the vast majority of the time, you're just sitting there twiddling your thumbs. Twiddling your thumbs is not very economically productive, so if I have ways to not twiddle my thumbs, I will do so.

And I really, I guess, mean is that autonomous agents just take a long time to finish tasks. You know, when we started with this stuff, or at least when I started with this stuff last year, you know, Claude could realistically work on things for thirty seconds. The other day, I had Claude work on something for over fifteen minutes.

And so if all I'm doing is just sitting there waiting for it to do this fifteen minute task, you you can imagine that my productivity is basically going to be punctuated by me just sitting around watching it. It does something. I get the result, make some minor changes, wait for another fifteen minutes and so on and so forth.

That's not very efficient. So a parallelization allows us to reduce the total amount of time by a factor of at least a few from fifteen minutes to maybe a couple minutes, so it'll be able to work on smaller, more more self contained things. But two, it'll also just get higher quality.

Another thing is that many tasks feature independent steps that can be broken down. So for instance, let's say I'm doing some sort of task. Okay, and this is just like how long it would normally take if we go serially.

And so the option a is just to do what most people do, which is where they'll do, I don't know, they'll do step one and then they'll do step two and then they'll do step three and then they'll do step four. So that's one, two, three, four. This task over here takes five minutes.

This task here takes This task over here takes five minutes, and this task over here takes five minutes. What's the total amount of time kinda collectively?

Well, it's twenty minutes. Right? So that's sort of a, you know, the serial way that most other people do things.

Well, guess what? Turns out a lot of tasks don't need to necessarily be like that. If I just copied all of this stuff over.

K? And then instead, ran a couple of these in parallel. So I actually had, I don't know, three of these simultaneously and then kinda combined all of them.

If I did something maybe more akin to this instead, hopefully, you guys could see. Now, k, instead of everything taking, you know, five minutes, five minutes, five minutes, and five minutes, maybe what I'm capable of doing realistically is this takes five minutes, this takes five minutes, and then the integration step about these three, which were two, three, and four, only takes two minutes.

So what I'm doing is I'm basically converting a task that previously took to twenty minutes and I'm turning it into one that took twelve minutes. Which, know, if you just did a little ratio, 12 over 20 was equal to three over five.

And so what I'm capable of doing is getting it down about 40%, about 60% of the total row. Hopefully, you guys see when you have tasks that can actually be broken down in this way, aka a task that you can expand and run simultaneously through some form of parallelization, Just makes more sense to do all three of these things simultaneously.

Rather than one parent agent being responsible for everything, doing one, then doing two, then doing three, then doing four. What we can do is we can take two, three and four, stack them on top of each other, add an additional step five called a synthesizer and then take the results of these do it do it in like a fraction.

Another big reason is that agents are what are called stochastic. Okay. They don't always return the same answer.

So if I ran, you know, Claude five times on basically the exact same thing, every single time I have a slightly different response. Okay? Every time I have a slightly different response.

And just to show you guys what I mean by that, I'm gonna open up my Claude code over here and I'm actually gonna open up three different tabs. Let me just visualize this, stick this right in the middle. Okay.

And then over here, let me just make sure that all these are operating the same. I'm gonna say, I'd like you to determine five ways to improve this code base. I'm just gonna paste this across all three of these.

I'll paste and I'll paste. Now, I'm just gonna run all three of them. And I just want you to notice sort of what's going on here.

Obviously, the first thing that's gonna do is try reading the key files, but check out the different solutions basically that it's coming up with on all three of these different runs. So in the first run, k, a brokerage image paths, missing title and meta tags, nav links hidden inside mobile with no replacement.

Project cards aren't actually links. No keyboard focus styles or skip to content link. The second was broken image paths, missing meta tags, no mobile nav.

But now look, placeholder links everywhere, typo in footer. K.

And you can see that, you know, basically, the more times we run these, you know, agents and then the further away they get from the beginning, the more they tend to diverge. And there's a statistical reason for that.

Right? Like at the very beginning, this is sort of like, I don't know, the total answer. At the very beginning, you know, red, it's pretty similar to black, but eventually it diverges a fair bit.

Green similar to red, and it diverges a fair bit. Blue similar to all these, but it diverges a fair bit. And I guess the point that I'm trying to make is like, you know, over here let's pick another color, so it's pretty obvious that these are all a bit different.

We'll do purple. Over here, this this is sort of like the zone of similarity. Right?

But then after you make it to a certain point, because of the multiple good of nature of how large language models work under the hood, they're basically multiplying statistical probabilities of, like, one token after the other after the other after the other. You have massive divergence in the end result. And so, you know, this might go a b c, this might go b c d, this might go a b e, this might go a b z, and this might go a c q or something like that.

What you can do is you can actually just run five times. And now notice, if I ran this once, I'd only get a b c. But because I've ran this another time, I got all the way to d.

I ran this another time, I got all the way to e. You know, if you would just count up all of the different unique answers here, I have a, I have b, I have c, I have d. I also have e.

I even have q, and then I have zed. So you could see here that, like, I'm basically getting 2.5 times the total number of possible answers by running things multiple times and then just averaging out and taking all the unique outputs. Right?

That's really the the principle of stochasticity. Because they don't always return the same answer, if you parallelize your agents, you can actually run multiple times with same or similar queries. And then you can actually have different answers given to you that just sort of live outside of the distribution or average run, which is pretty amazing.

So I'm gonna show you guys how that works, specifically with debate and stochastic consensus models. If anybody seen my AgenTik AI course on that, you'll know more or less what I mean by that. I'm also gonna show you some fan in, out researching flows as well as some some sequential pipeline handoffs.

But really, the the fourth and final reason is because model performance degrades as context increases. So the shorter and the cleaner your context windows are, typically, better the results are as well. What I mean by this is, you know, because the parallelization aspect typically involves sub agents, which I'm gonna show you guys a little bit about, you get to avoid the problem where the increasing length number of tokens leads to poor performance.

And so, know, if like on average, this is more or less the relationship between the number of things in your context window and then the performance of the model, we're gonna we're gonna end up just almost always staying right around here, which is the zone of good. By the way, just made that up.

It's not actually called the zone of good. Hopefully, you guys understand the distinctions there though. When you paralyze and then feed tiny chunks of a problem to multiple agents, they can all be at the zone of good.

You don't actually have to like go all the way down here. It's not just one agent that's doing all the work. Okay.

So so what are examples of how to parallelize in the first place? Well, there's like a built in function called agent teams now in Cloud Code, does a fair amount of this. So I'm gonna be showing you guys some ways to do that.

But I just wanted to chat a little bit more generally without even going into agent teams first before I show you some demos of, like, different ways that I personally approach problem solving, and I've seen some of the best and the brightest use Cloud Code for this sort of parallelization. And I'm gonna call them common team patterns.

Okay? Essentially, there are three main things I wanna cover.

The first is the ability to fan out and then fan in. And so that's where you actually spawn a bunch of different research sub agents, and then you have a synthesizer sub agent, which takes all of their outputs. And then based off of the outputs of that synthesizer, you can do either more fan out, fan in flows, or you could do some form of final synthesis step.

Okay. So what I mean by that is like, let's say before you have a query and it's, you know, I want to find the best.

Okay. Absolute best APIs for my feature.

Whatever the feature is. It's x feature. I don't know.

It's like some app that generates things, whatever. So I wanna find the best APIs out there for this feature that, you know, allow me to very quickly and easily do the things that I wanna do. We can imagine like, if you were to do this in the old school linear path, what would happen is Claude code would spin up, k, in the same thread research on-site number one, and then go on-site number two, and then go site number three, and then go on-site number four.

Right? And what would be occurring the entire time that we're going through all these different websites? Well, the length of our total contacts would increase, meaning our performance on average would also decrease.

Okay. In addition, it's taking time. So it's five minutes here, it's five minutes there, it's five minutes there, it's five minutes there and so on and so forth.

Then at the end, what it would do is it have a final synthesis step, which I'm just gonna call s, which would basically combine one, two, three and four together, which could take a certain other amount of time, maybe another five minutes before finally giving you your answer. And so the cost of the answer, okay, if you think about it as like almost like a line item, the cost of the answer is, you know, first of all, twenty five minutes, which is obviously non preferable to instant.

And then, you know, a fair amount of tokens on poor quality outputs.

You know, you're probably gonna end up spending a similar amount of tokens regardless, but you're spending those tokens on poor quality outputs because you're kind of you're kind of down here as opposed to up here. Right? You're you're here where you don't wanna be.

Now, what fan out and fan in is is very similar to what I showed you guys earlier. You have a research query and that's, you know, find best APIs. And so what it does is Cloud Code basically goes in and then immediately spawns.

K. Let's just say, four research agents.

And so now we have research agent one, research agent two, research agent three, and then we have research agent four. K. And so what we're doing this year is we're we're we're fanning out.

These all operate totally independently accumulating their own context windows. Because they're new agents, they're almost always in the zone of good. Maybe they'll push a little bit farther beyond that, but they're still pretty good.

Once we're done with that, what we do is we do the opposite, which is the fan in, and we feed all of those into a final synthesizer agent. That synthesizer agent now is a different prompt.

The prompt is not, hey, go do this research. The prompt is, hey, here's a bunch of context from a bunch of other models that have already done the research. Meaning, the prompt gets to be shorter.

We then apply high level reasoning strategies and principles to make that a synthesizer as smart as possible and say things like, we want you to integrate anything that overlaps as well as any outliers and then score them slightly differently.

And so, you know, rather than being all the way over here with our big thing, you know, probably we're somewhere over here in the middle, which means the performance is gonna be a little bit better. And then obviously, the synthesis step can occur in approximately the same amount of time as the actual research because you can spawn almost an infinite number of sub agents to go to research for you.

And so really what happened now is you have five minutes here. You have five minutes here. You know, just add these up.

It's ten minutes. And And so not only are we significantly faster, we're also a lot higher quality because now we have all the the data and information laid out to the synthesis agent.

More importantly, there are different models that are better at different things. And so within Claude, you have not only your, you know, heavy lifter, which is usually the Opus models, but you also have, you know, your Sonnet models. And then although not a lot of other people use them these these days, you also have your Haiku models.

And so what you can do now is for the research, which consumes a massive number of tokens, but realistically doesn't usually need like a ton of reasoning for it. It's more of like data extraction. You use something cheap like Haiku and Sana.

And then for the synthesis, use something like Opus because you're applying different models at different steps. Not only is it going to going to occur much faster because Sonnet works faster than Opus.

So maybe instead of five minutes here, it's actually, I don't know, three minutes. But then the cost is going to be a small proportion of the money that you normally would have spent just because the way that pricing on Claude works. Right?

Pay attention here to the fact that Claude Opus, you know, in this case, 4.6 is $5. Sonnet 4.6 is 3. So we immediately save 60% right there.

And that's just your base input tokens. That's not taking into account, you know, like the the the the massive difference and also output token cost and so on and so forth. And obviously, things get even better if you go down to Haiku and and so on and so forth.

And so you can formalize this as a skill if you would like. K? I'm not going to.

I'm just going to feed it in a simple prompt, but this will illustrate what I mean. Let's say I'm right over here in my project. K.

Let me just delete this globalcloud.md because we don't need that anymore. Then I'm going to essentially let me just go back here and then copy the actual text.

As I use a fan out, fan in, and researchers synthesizer approach to research the question, how best should I optimize this code base?

Minimum five sub agents, use SONNET to do the research and contemplation, individual contemplation, opus to synthesize.

So now what's going to occur is rather than we just waiting nonstop for all of these, what this is going to do is it'll fan out six Sonnet research agents. Each are going to investigate a slightly different optimization axis.

They're all gonna focus on slightly different things, then they're gonna synthesize all of those results back together with Opus. By zoom outs, you can actually see all six of them running simultaneously. The despite the fact that we're not using this agent team feature, we're just using the, um, sub agent feature right now.

Uh, you know, all of these things basically immediately are generated. Their contexts are quite short. So, I mean, in the grand scheme of things, this is a much shorter context than we would ultimately accumulate in our main agent.

All of them are focused on slightly different things, are obviously autonomously managed by that orchestrator. And then finally, these six agents can finish in a linear amount of time as opposed to, you know, like multiple one. So this just finished the architecture research.

It's gonna wait for the remaining five agents now. Alright. And it looks like it just finished all six research runs.

So now it's going to synthesize all the findings with Opus. It's then going to also be able to take advantage of things like its planning features and so on and so forth before synthesizing. And here it is.

Okay? High impact, easy fixes, gives us a big list. It's also writing the high to medium impact, easy to medium effort.

And so, I mean, you know, obviously, I'm not just pulling this out of my my ass here. Anthropic has done a lot of research on the best way to solve problems.

And, you know, Opus with a bunch of Sonnet sub agents massively outperforms Opus both on time, but then also quality, specifically because of, you know, Sonnet's longer context window as well as just like general usability. That's what I care about.

I just care about my own u usability here. I could spend as much money as I want on these things at this point. What I care about is like, how can I extract the maximum quality with a minimum amount of time?

And that's the design pattern that you wanna use. So I mean, like, use this anytime you're contemplating problems. And you don't just have to contemplate like specific API problems or development problems as well.

Like, I use stuff like this anytime I'm designing, um, business systems, uh, anytime I'm designing process optimizations. I mean, I did this the other day when I was doing product differentiation, basically coming up with different ways to price and package products for a company that I now own that does this sort of thing.

The opportunities here are basically limitless. You do this for competitor research. You could do this for whatever the heck you want, and I I commonly apply it.

Okay? So that's fan out and fan in, where you basically spawn and researchers, usually using a cheaper, dumber model like Sonnet. And then you have a a larger synthesizer model that actually combines the results.

That's how you get, you know, some of the best quality and then also the best quantity. Next, I wanna chat debate and stochastic consensus because it's kind of simpler similar, but, you know, it's also a little bit different.

I use debate and stochastic consensus to basically, like, hammer out nuanced arguments and nuanced quality discussions. You know, earlier how I said we had one agent come up with a b c, another one come up with c d e, another one come up with like a b q, and so on and so forth. Well, basically, with stochastic consensus and then later debate, what we're doing is we're having different sub agents come up with different lists of solutions.

And then we have something else go through, identify all of the mode, identify the mode, which is the frequency of, you know, the the number of times that a solution pops up.

So let's say solution a pops up twice. K. This synthesizer agent would say, okay, there's two a's.

B pops up twice, so we go two b. C pops up twice, we go two c. D pops up how many times?

One, so we'd go d, then counts e, then also counts q.

And so in this way, could see statistically speaking, you know, a lot of agents think these three are great solutions. One agent thought this is a good solution, Another agent thought this is a good solution. And finally, another agent thought this is a good solution.

Basically, the votes of confidence here are fewer. And then what you can do is you could use this. It's almost like like a weighted average to tell you what approach to take.

You know, if it's like an equation where, like, my final, I don't know, decision, which we'll just say decision, is kinda like this. It would equal two a plus two b plus two c plus d plus e plus q.

And I know this is math, but don't get scared here. The the point is not to actually calculate the final solution. The reality that I'm attempting to convey to you is that because so many models came up with a, so many other models came up with e and b and q and so on and so forth, You can quickly determine consensus between a number of agents that come up with ideas.

And then you can also determine which ideas are genuine outliers in so far that, you know, only one out of three models actually came up with thing. One out of twenty four four models suggested you should do x y and z thing. And so you get to farm both like the statistically most likely answers to solutions, but also like the massive outliers, which can make you quite, I wanna say, competent at solving problems in a very short period of time.

And this works in a really similar way to what I talked about earlier with like the total solution space. Right? You know, if there are really a fixed number of ways to solve something, and the reality is there are a fixed number of ways to solve something, And there are also a certain number of ways not to solve something.

Well, what you wanna do is you just wanna, like, cover that ground as quickly as possible. And in reality, what you could do is you could quickly spin up an agent to do all of to figure out all the ways not to do something. Okay?

And then you could have, you know, one sub agent slowly figuring out, no, this doesn't work. No, this doesn't work. No, this doesn't work all simultaneously.

And then what you end what you end up with is you just end up with like this beautiful field of like highly differentiated green, which tells you what you can actually do. And I understand this is more conceptual, but just bear with me here. I'll show you guys an actual example in a moment.

Now, stochastic consensus is cool. It's sort of like a first go, but debate is even cooler. Because now what you do is you basically take all of these points, Okay?

And then you feed them into an open, like, conversation or chat room where all other models can weigh in on solutions that might not actually be very obvious. So now, okay, if I just recreate the solution, we have agent one come up with a b c.

Agent two come up with b c, I don't know, let's just say e. Agent three come up with a b q. Okay?

What we do is we divide this into time steps. And so this is time one, this is time two, this is time three, and this is time four. What we do at every time step is we allow all other agents to look at all of the conversations and and all the thoughts that all the other agents have had.

Okay? And what occurs as we move through is agent one gets to see agent two and agent three's responses, and so it gets to differentiate. Maybe now it goes a b c e zed because it come comes up with some additional solution by comparing its two, you know, two and three.

Maybe this one comes up with b c, but then it eliminates e because it just doesn't think that makes much sense, and then it comes up with an f. You know, this one comes up with with, I don't know, two different letters, and then ends up, you know, also identifying some of the previous solutions, but then combining them in new ways and stuff like that to come up with better ones.

And so what we do with the debate is it's not really a debate in the practical sense. It's not like, hey, your job is to try and convince other people why a, b, c are the best solutions. What it is is every model has access to all of the other models.

And so because they have access to all of the other models and they don't have to spend all that time reasoning, they can just see the results. They can then incorporate those and come up with increasingly nuanced solutions and, you know, ultimately, spend a large search space in a very short period of time.

And so we can just proceed with this all the way down. You can run as many of these, like, steps as you as you want until ultimately you have like a a list of solutions provided by a bunch of different models that are just way more complex, way more nuanced, and also just like way more interesting than the initial ones that, you know, one agent might have come up with.

Alright. So I'm back on my business workspace here and we're still doing research on tomatillos, but I thought this is actually a pretty good example.

Why don't we use stochastic multi agent consensus to come up with all of the different ways you can make a sauce using a tomatillo. Use stochastic multi agent consensus to determine all of the different ways that you could make a nice tasting sauce using tomatillos. I want every agent to come up with at least 10 independent responses, then have them synthesized and turned into just a giant list of all of the possible things you could do.

So what the Skill Stochastic MultiAgent Consensus does, if I open it up, is basically, it breaks down a query into n other queries. That's where it says spawn n agents with the same or a slightly different prompt to independently analyze a problem, then aggregate results by consensus, which you use for decision making, ranking options, strategic analysis, or any problem where you wanna filter hallucinations, and then surface what are called high variance ideas.

So anytime I use the word consensus, poll agents, stochastic consensus, spawn n agents, so on and so on and so forth, it'll go and it'll it'll do the thing. So just scrolling down here, you could see that it read through the skill and it spawned 10 agents all looking at slightly different angles here. And, you know, these are very similar prompts.

Brainstorm all the different ways you can make a nice tasting sauce using tomatillos. This one's here, brainstorm all the different ways you can make a nice tasting sauce using tomatillos. This one here, brainstorm all the different ways you can make a nice tasting sauce using tomatillos.

But the idea is, you know, one is a conservative tradition minded chef, the other is an adventurous boundary pushing chef, the other challenges conventional wisdom, the other reasons from first principles and so on and so on and so forth. Now because, you know, it's a pretty simple and not very intellectually difficult exercise, all 10 agents have actually already already finished.

And you can see that I was able to scan a massive search space in a very short period of time, despite the fact that this problem was pretty simple. So what it's doing is similar to what I showed you earlier with those end researchers and then, um, having some sort of synthesizer model. What this is now going to do is indeed duplicate the outputs and then give me a list of pretty nuanced answers that realistically scanned most of the search space in a very short period of time.

I'm sure you can imagine you could scale this up if you had, like, some sort of dedicated infrastructure, whether it's a local model or something like that. You could theoretically have stuff like this running all the time just ideating and coming up with new approaches to solve on long standing problems. This is actually the exact way that I don't know if you guys have seen, you know, they're throwing Opus now or GPT four point or GPT or other models at like these big math questions and asking them to solve them.

This is exactly how they're doing them all under the hood. So as you guys could see, we pulled 10 agents. There are a 119 raw ideas.

Counting for duplication, there are 52 in total that are new. So what we're gonna do is we're actually gonna look at this consensus report and then ultimately, its answers.

Alright. We have the consensus report opening it up here. You could see there are 52 total.

The first is salsa verde crula. The next is tomatillo avocado crema. The third is aguachile verde, and so on and so on and so forth.

I could work my way all the way down here, a bunch of different types. You know, I could have had one agent come up with all of these. I could.

Okay? But, um, the probability that it would have been able to, one, come up with, like, a highly differentiated list like this, and two, scan as much of that search space in the same amount of time is very low. And so I'm sure you can imagine you can apply this to any business problem that you guys are currently having to just come up with a bunch of low hanging fruit solutions as well as, like, unique and and and outlier solutions as well.

We even have, like, Indian influenced sauces, Persian influenced sauces, Caribbean Latin fusion sauces, and so on and so forth. An outlier that I'm definitely not trying anytime soon is Tomatillo Brablanca, which is French butter sauce using Tomatillo's pectin as a natural emulsifier. No, thank you.

So what would debate look like? Debate is more or less the exact same idea. In my case, I've just turned this into a skill.

It's called model dash chat. Basically, what occurs is we spawn five cloud instances in a shared conversation room where they debate, disagree, and converge on solutions. We use round robin turns with parallel execution within each round that triggers on terms like chat and so on and so forth.

So I'm gonna do here is I'll say, great. This looks awesome. I'd like you to rerun this, but with model dash chat.

Make sure at least 10 agents are having conversations about this. And then, you know, if any of the sauces just sound insane or terrible or crazy, then obviously have them discuss that as well.

Just like our stochastic multi agent consensus took advantage of, like, time basically and traded it off against total tokens, we're doing the same thing. So what we're gonna do is we're gonna start by extracting from the user's mess from the user's message the topic or problem, the mode, the number of agents, and the number of rounds.

It's then going to run an actual script that I've set up here that automates the process of like having each of the agents look at each of the other agents responses before finally doing a synthesis. Speaking of which, I just read through a couple of those.

I'm actually just gonna make some time at TSS right now, so I'll be right back. Okay. So me looking at the conversation over here, just asking it to like give it to me.

You actually see that all the agents are doing some thinking and the contrarian is starting with 15 ideas. It'll immediately challenge the ideas that deserve it.

They're now listing their disagreements, so does this actually work? Is it a structurally sound technique or a restaurant stunt with an unacceptable failure rate?

Is tamarind redundant or complementary? You know, does Tomatillo chocolate belong on the list? If so, where?

Should Mole Verde be in tier one or tier two? So they're having discussions on an ongoing basis, which is always really fun to watch that we can monitor and then obviously synthesize into an answer. Okay.

And then finally, we have the Tomatillo synthesis over here. Tomatillo's pectin content is underappreciated. Tomatillo husk t, unfortunately, is not cool.

The foundational tier is settled and non negotiable. And I actually look at the foundational tier. You can see we actually have a bunch of different highly recommended sauces.

Again, some of these are very, like, nuanced. Lacto fermented tomatillo hot sauce, taquera squeeze bottle drizzle, enchilada sauce, tomatillo aguachile, and so on and so on and so forth. And, you know, I this is just a really shitty example.

But hopefully, you guys understand that you can take this to more or less anything that you want. Whether it's, you know, designing a new computer programming approach to a particular problem, whether it's choosing the right framework to approach or tackle a task with, or something else.

Okay. So I just did all of the previous example using a pretty straightforward, you know, like dietary or chef sort of example.

But now I wanna use this on an actual app and really just have all of these different models discussing things and doing so in a very short period of time. What I have here is I have like an algorithmic art example. And this is actually something that Claude develops.

It's part of their algorithmic art base skill, which I think is actually, like, applied or supplied, I should say, in the anthropic anthropic skill directory.

You can adjust some things like the the stroke weight and, like, the damping and so on and so forth, and actually have it, like, come out with very unique designs. You can then just, like, save the image and then boom. Now you have, a cool, like, wallpaper or something like that.

It's kinda neat. But I wanna I wanna improve this as much as humanly possible. And the reason I'm doing it like this is because I also wanna show you guys how to apply the same approaches that I just showed you to agent teams instead, which are obviously a much more streamlined version of doing the exact same things that I've done so far.

It's just streamlined in the sense that, you know, it is built out of the box to do everything, but it does so at the cost of some tokens. So I'm just gonna go back over here and then I'm just gonna look at synaptic drift dot HTML within art. I just need to make sure to, you know, remember what folder that's in.

Then I'm just gonna open up another Claude instance. Now, lot of the advanced stuff as we know is actually only available in the terminal and I think agent teams are a lot better managed in the terminal. So I'm just gonna open up the terminal.

I'm going to full screen it here as well. Let me delete that and then go full screen. And, you know, I could do it in here.

I could also do it in like ghost TTY, which is probably my favorite like terminal to use within Claude. But for now, you know, I I have my agent team's idea.

So I'm I'm basically now going to say, hey, I'd like you to optimize synaptic-drift.html and turn it into a full fledged application. However, rather than just do this all naively yourself, I want you to take advantage of stochastic multi agent consensus.

I want you to take that skill and then apply it using the agent teams feature. You'll orchestrate a team of agents that do all of this stuff. Don't just use what's in the skill itself because I'd be running it a little too simply.

I actually want you to to read through the whole skill and then use that to spawn agent teams. Okay. So it's gonna start by reading the skill def and then the HTML file itself, which is found in art.

It's then going to go and read through the agent team's tooling and everything that it needs in order to basically spin this up easy. So it'll start by creating a team for the consensus workflow, spawning 10 analyst agents with different framings, and then finally aggregating their recommendations and implementing the winning features.

So the very first thing it's going to do is spawn the analyst agents. And you could see now the UX has changed a little bit. You see down at the bottom where I have these different analysts that are running?

So if I go shift down, I can actually see all of their different stochastic multi agent kind of consensus threads. So now they're all spawning and running in parallel, which is pretty neat. At any point in time, I could press enter to view sort of the conversations and and what they're doing.

And I should say I I should note that, you know, the Stochastic multi agent consensus applied to agent teams is basically just the debate built in because the agents actually can can communicate. The team lead can also orchestrate that communication too.

So, you know, it's not actually really independent, which is neat. You could spawn all of these in, like, different windows if you want to. You can also just continuously hold shift and then go up and down to select.

What I'm doing is I'm just reading through a bunch of different threads and conversations. And it's clear that they all start by just reading through synaptic dash drift dot HTML. Finally, you know, this is now returning a bunch of agent conclusions back.

And more importantly, it's also coming up with consensus, which is nice. Alright.

What it's gonna do is just take all these now and close them down while also looking at the consensus, the bugs, the divergence, and then ultimately outliers. So the consensus recommendation of our next feature is high res exports, a preset system, URL state, and shareable links.

The bugs are the race condition and regenerate, Download saves mid render. PG height not checked. Divergence is one or sorry.

Six out of 10 agents suggest debounce red regeneration versus a live preview. Then the outliers have also come in. Mobile responsive layout, live animation mode, seed history, web worker offload, mouse attractor, repeller, and kill sidebar overlay.

So this is all really cool. You could see now it's coming up. It's actually just deleting my old Tomatillo stuff.

Guess we happen to be using the same file or something. Instead, it's coming up with this giant list of different conditions and features that it can build. Okay.

Now it's actually shutting down all the agents, implementing it. Just because I want this to do so faster, I'll say use agent teams to do the implementation. And you can see it's actually gone through here and then added all of what we needed in order to implement the tool, the features that the model suggested.

In addition, it's also spawning review agents to see if we can improve the quality of the generated code, spot problems and stuff like that. So if I go shift down, could see all those. So we have now reviewer bugs, reviewer features.

Let's just see what reviewer bug says. Okay. It's now sending the review to the team lead, so it's communicating that back.

Taking a look at what the reviewer is saying. Now that it's opening it up, you can see we now have a ton more features. We have different presets, ocean drift, ember storm, ink wash, neon plasma, neural fire.

We have the ability to modify colors. We have one x, two x, and then four x downloads, which I don't think you guys could see because my face is in the way. But if you just look down over here, you'll see that there's significantly more functionality.

Um, we can download a PNG at four x as well. We have simple, like, space bars to reload and change things. We could change the the speed and so on and so forth.

Um, ultimately, this is just a better app. Right? And so we did this by basically just exchanging a couple of my dollars and tokens for, you know, a bunch of different agents, all coming up with their own ideas and then ultimately executing on them.

Hopefully, you guys could see you can apply the same approach to more or less anything. There are obviously optimal token trade offs, but when you spawn the sub agents that are a little bit less capable, like SONET versus Opus, typically that math works out and you end up being able to do just as much if not more in a shorter amount of time for less money.

Alright. And then finally, pipeline, which is sequential hand up between specialists. I mean, I just showed you guys a little bit of that earlier with agent teams sort of spawning review bugs and stuff like that.

But basically, that's more or less it. You have task a done by some agent, which is specialized for task a. You then pass that off to agent b, which is specialized for task b, and then ultimately, agent c, which is specialized for for task c.

And so, I mean, like, could just have a do all three of these things. The issue with having a do all three of these things though is one, if you guys remember earlier, good lord, this is getting a little messy.

You know, we're no longer in the zone of good because odds are it has like tons of context from literally everything that it's done before. So, you know, like it would have started off over here and that would have been okay, but now it's over here and then now it's over here. And then, two, like, sometimes fast and and and good development is often at odds with like really in-depth testing, let's say.

And so, if you think about it conceptually, like a a developer agent will have different incentives than like a testing agent. The developer agent will be incentivized to like build things that works really quickly using, you know, whatever is available to it. Whereas the testing agent will be incentivized to try and like spot all of the issues.

And so like building things new is sort of at odds with like repairing the old things. And in that way, if you try and have one agent do everything, the probability that it will be able to do it as well as possible versus if you just spun up specialized agents that were like highly tuned for that thing, assuming their intelligences are all held equal here.

I'm talking about like nonstop opus calls, not opus sauna and so on and so forth, is is is definitely different. So my recommendation would be, you know, like, what I would do is I'd have, a dev agent for a, like I just did.

Then I'd have some form of, like, bug fix for b, then I'd have some sort of, like, test, maybe bug in q and a. And I'm not gonna redo that example because one, I wanna be respectful of your time, but two, I just showed you that exactly with the agent team's example. I guess the meta example here is you combine all three of these and then just have all of them interacting constantly for best results.

Like, you have, you know, debate and stochastic consensus to come up with, like, the best ways to, you know, improve on a product. Then maybe you do some fan out, fan in, and researchers to go look at, like, different APIs and different design patterns that you could use to fulfill that before finally handing that off to some sort of like bug reviewer QA or tester.

But hopefully, it's clear that, yeah, all of these things do not exist in isolation. They all exist together. Next, let's talk context management, which put really simply is just all of the files and folders and organizational methods that you put into a workspace to allow Claude code to effectively manage whatever work you have.

Now I'm seeing a lot of people try and delegate work right now, sort of like human companies do with CEOs, you know, CTOs, CMOs, quad coder agents, and software engineers, and stuff like that.

And I think initially, when I looked at this, this one's called paperclip specifically. It's got a pretty interesting repo that you could check out right over here. It's all about running your whole business with our agent team.

I think initially, it's really easy to look at these and be like, hey, this is stupid. You know, I mean, that's that's what I did. I made a couple of videos and I talked ad nauseam with a couple of my friends and I was like, this is dumb.

Why would we try and fit agents, which think very differently than human beings into the exact same organizational hierarchies we've been using for the last hundred fifty years? It just doesn't make sense. Human brains are different than agent brains.

The latter is obviously a lot more spiky and good at certain things while sucking at others. But anyway, so as as quick as I was to initially dismiss this idea, what I've come to realize is that sub agents as these org charts and skill dot m d files, which as we know are self contained SOPs that exist within a file capitalized as Skulled in m d.

These are actually just two flavors of the exact same thing. What they are is they're just different ways of organizing your markdown files. And so just like in my case, we ran a model dash chat skill earlier for me to show you guys how, you know, models debated and stuff like that.

K. We had a skill.md within it that stored a bunch of information that was like hyper specific to that skill. We had model-chat.py, which was a tool that the skill could use.

So too are our sub agents organized in basically the same way. I guess what I'm trying to say is like, okay. If we take sub agents on the left hand side, what was one of the main reasons why we like using sub agents?

Okay. It's because it's a clear or fresh context window.

Right? Alright. Awesome.

So that's one. How about the fact that it's specialized? Awesome.

That's another. How about the fact that the sub agent is probably more reliable at sub agent specific tasks? Right.

That's another one. And then how about the fact that it's written in, you know, markdown format with tool use?

Well, fantastic. That's another one. If we look at, like, how that equates to skills, honestly, the only thing that's missing is the fact that the context window is not entirely clear or fresh.

But, you know, what you can do with these is because skills are so efficiently written, they're basically a form of compression that pushes you towards a shorter context window anyway.

So basically, the only real difference, if I'm honest and keep in mind, you instantiate a sub agent, you're giving it, you know, a a little prompt. Right? Kind of similar to way SCO works.

The only real difference between the two is just the amount of context in the sub agent versus the skill. But I want you guys to know that sub agents are honestly basically skills and skills are basically sub agents.

They're just slightly different ways of storing information. So why am I bringing this up? Um, just because I'm coming to realize that the two are the two are very similar and they're soon, I'm sure, the future going to be, like, merged even more so into a similar concept.

Um, all these two point at are just different ways of organizing your context and basically organizing the way that you you get tasks done. One delegates via CEO to CTO, CMO, CTO, all all the stuff. Right?

I don't know why there's two CTOs now that I'm looking at that. It's kinda weird. Whereas the other one stores things in a skulled entity.

Like, just going back to anti gravity right over here. Right? Like, I could go to this skills folder and then I could go and find that model dash chat.

And I mean, like, way that this is written is basically the exact same, you know, schema, basically, that a sub agent is written in. If I go over here to Claude Co's actual documentation page on sub agents, I mean, you you actually have basically the exact same structure.

See how here it says the title code reviewer description prompt tools model. K. You see over here, what do have?

We have the name. K. We have a description, and we also have the tools.

I mean, like, the model is sort of baked in here because it's in our main thread. It's gonna be OPUS 4.6. But hopefully, you guys are saying, like, skills and subheadings actually are really similar.

They're just slightly different ways of organizing information. So I'm making this big point because I think that's important to realize. As we continue moving forward with Claude code and other tools and we get more and more advanced with them, the shapes of how we're transmitting information to our models will likely end up being quite quite similar.

Whether one person decides to use a paperclip style, big fleet of agents that does x y z, which maybe, you know, just a couple of months ago, I might have looked at, scoffed, and said, like, well, that doesn't do anything. You know, so too are skills basically the the same thing. So the model intelligence is growing more and more and more capable within the harness, which is what allows the the development of these really interesting organizational hierarchies.

So what are some of these organizational hierarchies? We've already shown you paper clip here. And the way that paper clip works or it's rather it's supposed to work is this is like a dashboard, which somebody develops that, you you know, think just preys off of maybe preys isn't the right word, but it uses people's misunderstandings of how agents work.

It equates them an anthropomorphosis that makes them seem really similar to humans, and then it puts us in front of you so that you feel like you're running a whole team. And so in this way, clearly, it's broken down by role. Right?

Whereas the average skill is not broken down by role, the average skill is broken down by function. Also, skills typically don't delegate to other skills. That's really the main difference.

But paperclip isn't the only one that's like this. Here's another good example, company helm. This one over here is a very same similar sort of idea, where you basically have an AI studio.

Within the AI studio, you define a a bunch of different roles for your agents and so on and so forth, and then that's ultimately what allows you to manage your projects. This instead of being left to right is obviously, you know, organized a little bit differently. The front end builder, a QA runner and and so on and so forth.

How about OpenGOAT, which is the AI autonomous organization of OpenClaw agents? Again, know, it's doing this with like CEO, head of sales, customer support based organization, which I don't really believe is ideal.

I don't really think you should have this level of direct reports. I mean, like, think about it, why? All of these could just be Opus 4.6, they could be way smarter, they could pull from some sort of shared context pool, and I think you really wouldn't leave that much out.

But it is an interesting approach. This one over here is called the system, which is obviously using some sort of AI generated diagram here.

But it's 26 specialized agents, which we thought about that do architecture, design, product development, release, operations, and so on and so forth.

This one over here, think is called Gastown, which is basically where you have a mayor, which is your AI coordinator, a bunch of different crew members, and then also poll cats or worker agents.

You guys may have heard of Crew AI. It's the same sort of idea. It's a fast and flexible multi agent framework, which supposedly delegates things.

K. Where you have crews that have different agents within them, each with their own segregated tool calling and stuff like that.

And, you know, it's another way of organizing information. This one over here, Swarmclaw is CEO based, developer, researcher, and again, you have delegation. So all these are different attempts by different groups of people to try and determine, the best organizational hierarchy of agents.

And I think pretty much all of them suck right now, to be clear. But I just want you guys to know and level with me that these are just different ways of organizing information. Just like you have skills and skills are highly, you know, specific to you.

It's just a collection of markdown files with names, descriptionals, allowed tools, and then like SOPs. Subages are basically the exact same thing. So as the field continues to mature and there are better and more novel context management strategies out there, multi agent orchestrators essentially, you know, these things will grow differentiated.

Now in terms of what I would consider to be actually valuable delegation, k, there are two main design patterns. The first is the parent researcher and QA system, where essentially you have a parent model, is usually a smart one.

So this would probably be like your Opus model That communicates with researchers, plural.

This will be dumber models like Sona that typically do research better and more economically. And then some QA agents like Opus, which are basically just tuned to QA and nothing else. And the idea here is this is a good balance between like those super bloated org charts that we saw earlier, while still allowing each type of agent to do the things that it is inherently better than human beings at.

The parent agent is obviously orchestrator. Anything that is up at the top, can always consider to be an orchestrator. Now, what you have here is you have multiple, you know, Sonnet researchers.

And this takes advantage of that fan out idea. K. Where when Opus needs something, it doesn't just do the research itself because that'll pollute its context window.

It goes, does a bunch of research, fits in quadrillions of tokens into the context windows of these Sonnet agents, then takes summaries of that and then uses that to make decisions. And then basically, the way that it works is, and I'm just gonna sort of draw like the the logic flow.

Opus will decide to do something. It delegates down here. K.

That information comes back to Opus. Opus then build something kind of on its own. After it's done building something, it goes and gives the product of its building over to the q and a agent.

The q and a agent returns some changes that it suggests it makes. Opus then goes through, makes those changes. Again, gives it to the QA agent.

QA agent returns. This loop continues until basically everything is done. If there's research that's necessary, it'll go down, do some research here and then continue develop.

And then finally, you have your whatever the the final product is that you're building, whether it's like a business system, a development system, or whatever. In this way, you're maximizing the incentives of each individual agent while also allowing, I wanna say, like, the leanest possible setup that still recognizes that different things are better at different types of agents are better at different types of tasks.

You know, we could make this bigger, of course. We could have like a testing agent. We could have a design agent.

We could have a development agent. We could have a back end agent. But, know, the more complicated you get with the stuff, again, as mentioned, like typically the worse that it gets.

If you wanna go even leaner than that, then the second system is developer and QA, where you literally just have a smart parent. K? And then you have a smart q and a, and then you just go back and forth between the two.

And what happens is every time that you wanna test something, you sort of have like a claud at dot m d or or or just like a prompt that's baked into your parent that legitimately says, hey, after you've done every development, run it through a new QA agent. The idea here is the QA has like literally no prompt other than, you know, you're a QA agent with no context, read this code, and apply the following whatever, like design principles to it.

And basically, occurs is this QA agent, since it doesn't know what the heck the project is is on, it's not going to be biased like the parent agent will be in the development of the feature. The parent agent will have feedback from the QA agent and so it'll be able to incorporate into its own thread and take advantage of all of the preexisting list of failures and successes and things it's tried and so on and so forth.

But the QA agent is like new and it's new spawned every time. And so typically, what'll work what the way it'll work is the parent agent will go and it'll develop a feature. And then at the end of the development, there'll be something in the Cloud NMD or system prompt that says, okay, now that you're done, make sure to check it with the QA agent.

So we'll spawn a QA agent. The QA agent will then give feedback. K.

The parent will design. Feedback, the parent will design. Feedback, the parent will design.

No feedback because it's now good. Parent's done. And so now we have the final product.

Obviously, you know, because it has to do its own research and stuff like that, I personally think this is not as ideal, but it is even simpler. And, um, keep in mind that there is always, a time cost every time you spin up a sub agent. It's a fixed time cost, but, uh, there's also some compound probabilities you're multiplying because, you know, you are having an agent delegate something to another agent.

Basically, there's no human in the loop. The more independent steps that an agent has to do without a human being in the loop, the higher the probability that it will diverge from its sort of intended goal or intended task.

So when your parent agent in the previous example generates, you know, a bunch of research queries to the, you know, Sonnet sub agents and goes and does them. There's no guarantee that the research of the Sonnet sub agents are doing is actually a 100% faithful to what your initial query was.

Every step along the chain that is further from you, typically, the results and the quality is a little bit more diluted. So I mean, like, it'd be it'd be either one of these for me, developer q and a or some sort of parent researcher q and a. That'd basically be it though.

Um, personally, I find right now with all the org charts and stuff like that, we're just we're just going a little bit too much. We definitely don't need, uh, I don't know, 700 layers of CEOs and customer success agents and lead engineer agents and stuff like that. Now, wanna talk about something that's gotten a lot of attention recently and does genuinely have the potential to significantly improve many business and programming functions.

It's called auto research. Essentially, what I have in front of me is I have a research lab that I've spun up to improve the load speed of one of my websites. Now, the way that you gauge whether or not a website is loading quickly is based off of three main metrics.

The first is called LCP, least contentful paint. FCP, first contentful paint. Then there's TBT, I don't know what that stands for.

And then finally, there's performance score. And so this is a standardized assessment called the Google Lighthouse score that you've probably seen before. And basically, it measures like, you know, when I type in a one second copy and I press the enter button, how fast does literally everything on the page load?

It also checks for very minor things like, you know, when I when I load this website, does the content on the page shift around? So my website here, leftclick.ai, is just one of many ones that I own.

And essentially, it's just a little bit too slow right now. And it's slow for a variety of reasons. We got this cool, like, glass isomorphism animation on the page.

You know, there's, like, stuff moving around and lots of images of my team and and so on and so forth. So, you know, what I've decided to do is I've decided to basically take all of the load off of me to make this website faster, and then just give it all to that fleet of agents to do so instead.

Auto research is basically perfect for use cases just like this, where we have a very defined goal, in my case, to decrease or increase a couple of metrics, a very defined change method, which is how you actually make the impact. So in my case, just modifying the website code, Then a very standardized assessment, which in my case is that lighthouse score.

In case you have never seen this before, basically, Andrey Karpathy, who is the one of the founding members of OpenAI, and then he also was the head of AI at Tesla for quite a while. You know, he he just was doing a bunch of research on his own for one of the models that he was running, and he's just like, you know, don't I have to do this stuff anymore?

I feel like I'm at the point where I could have AI actually run most of my research for me. Let me make a a quick hypothesis. If I just gave all of my changes to AI, would it be able to do the same thing that I do while I slept, such that when I wake up, I'll have like a big list of improvements?

And that turns out, you know, he he can. And it's not that AI agents are like better than human beings at determining these research changes, but it's actually quite standardized to to do conceptually.

You're basically just like looking over a bunch of different possible things you could do, making one tiny change, and then just evaluating, hey, did that actually improve my score? Did that make things better? If so, I keep it, and I just move on to the next thing.

I go over and over and over and over and over again until finally, you know, you you make it hundreds of iterations later. So, you know, in my case, like, we I just reran the test because I wanna start this from scratch to show you guys how this works. Well, it's actually fairly straightforward.

And what I'll do next is I'll run you guys through the original way that auto research works, and then how to download the repo, and then set it up on your end for whatever the use cases that you you particularly have. So it all started when Andre Karpathy, who was a researcher, he used to work at Tesla. Think he was the head of AI at Tesla, and then he was also one of founding members of OpenAI, asked himself, you know, all this work that I'm doing, all this research stuff that I'm doing, is there any way to automate it?

And he found that if he just broke down step by step what it is that he actually had to do, it more or less always went like this. You know, he he just had a little loop setup where, you know, he would make a hypothesis.

And the hypothesis would be like, hey, if I change x, y, and z, I think my system will run faster. Then he'd actually execute the change.

So he'd actually go and he'd adjust x, y, z. Then finally, he'd assess. And then if the assessment was good, aka it made an improvement, then he would just go back to this and then make another one.

Then if the assessment was bad, aka it failed, then he would just get rid of it and then not change anything and then, you know, kinda start from scratch. And all along the way, k, what he would do is he would update this little document, which you and I could just call like a research log.

And, you know, basically, the first change would be like, oh, you know, this worked. It was great. Second change, oh, no.

It didn't work. Then here's why. Third change, okay.

It worked. That was great. And eventually, over time, you end up with this massive, massive log of all the different possible things you could do to an AI to whatever your task is, and all the things that you have tried in the past that doesn't really change anything.

Okay. So this is made of three files. There's a prepare dot py, which in our case is kinda pointless.

And there's a train dot py and then a program dot n d. The reason why the prepare dot py is pointless is because it's just about like AI research specifically.

It's like fixed constants, downloading the training data, training a a BP, byte paracoding, tokenizer, and a bunch of other stuff that just isn't really relevant. The stuff for us though is obviously we wanna train this and and improve this improve our programs.

We wanna improve our websites. We wanna improve some of our business functions. These two files here train dot py and program dot py basically underscore how the entire thing works.

Okay. So the super important one here is called program dot m d. What you do is you basically just tell it what you want it to do.

So for instance, hey, here's what you can do as an AI agent. Modify this file.

K? Every time you do, I want you to print a summary of the scores and then log it to this file. And that's literally it.

It just goes through that loop over and over and over and over and over again. Then the actual train dot py, in this case, is just like the website itself. Sorry.

The the AI model setup itself with all the layers and stuff like that. In our case, right, the example that I was just showing you a moment ago, that's just my website, basically.

And so basically, it just like it has a loop setup in its prompt. You tell it what you can change or what you can't change. You give it some, like, sort of log file that it dumps everything to, so you have, like, a big list of changes in progress.

And then, you know, after that, you are you're basically done, honestly. You just fire it off and let it go. And when you do, you know, you can make some pretty cool changes.

So, you know, I just reran the thing, and we're already seeing some pretty substantial improvements. Not all these improvements are the same ones I was showing you guys before. It's this research lab just I'm just resetting it over and over and over again to see if I could find anything more interesting.

Okay. So hopefully, that's pretty straightforward. Simplest and easiest way to do that is just head over to github.com/carpathy/autoresearch.

And then what you do is you just copy this link. Okay. So how do we actually do this?

Just open up anti gravity. I'll click open folder. I'll just make a new one called auto research test.

K. And then I'm gonna open. And I'm going to click on Claude code.

Zoom weigh in so you guys could see. And actually just paste this and say clone this into our current folder auto research test.

Just so that it doesn't do this in my kind of my root folder, which it's done a couple times. Alright.

So it's gonna start saying, hey, I want you to clone this. So it's gonna give it a a quick try. It's just gonna dump all the files in here.

So now we basically have the exact same thing we had before. Right? We have the program dot py, prepare dot py, train dot py, the progress, and, you know, even like a read me that explains everything.

So now all we need to do if we wanna like, I don't know, train this on a site or something is well, first of all, why don't we just make a quick site? Hey, build me a simple one page portfolio site for Nick's Arrive. And obviously, it doesn't know what my name is.

So it's now going to build a simple one page portfolio site. I just wanted to do it here, so it's going to do this inside of this file. First, it's gonna ask me some questions, Just add demo information for everything.

And my goal is I just wanna build a brief little website here for us, and then I just wanna run auto research on it, show you guys how easy it is to optimize things. In our case, we're gonna do website. There are a million different things you could apply auto research to.

I'm gonna go through a quick and easy framework, but first, I'm just gonna show you guys what you need in order to actually set this up. Alright. Now what I'm gonna say is, excellent.

I'd like you to create a dashboard for auto research and then set up the auto research framework to optimize the Google Lighthouse page score for index dot HTML.

I want you to run this on a local loop and basically just make index dot HTML as fast as possible across LCP, FCP, TBT, and then also performance score.

Then give me some sort of live dashboard view so I could watch it. I'm actually working in reality. Cool.

And then I'm just gonna press enter. And basically, what it's gonna do is it's gonna read through all these files right over here. And then it's going to use all of the information here in order to set up the dashboard for me.

And while it's working, I just wanted to explain a little bit about where we are and where we're going. The initial stage of AI encoding was quarter like vibe coding. This is like 2024, 2025 stuff where a human being, okay, us, prompts.

Then an AI writes some code, and then a human being reviews. So in this way, our roles were basically relegated to writing. We would write the prompts.

We would make minor changes where necessary, and in that way, like build a website or something. Well, nowadays, most of us do agenda engineering, and this is sort of what the advanced part of our course deals with. So this is where instead of just dealing with one AI, we're actually orchestrating agents.

And these agents are doing multiple things for us all the time, and then basically, like, returning the results so that we could see and then, like, assess and make slight little recommended changes. So in this way, our role is more of a director. But auto research represents sort of the the next jump from agentic engineering to actually full independent research.

Where now all we do is we're no longer, like, actually even directing the agents. We we let them handle their own direction. What we do is we just say, hey, I have a goal and I'd like you to achieve this goal.

Here's how you can modify x y and z, and here's an assessment. And so in this way, we set the direction. The agent just runs completely autonomously.

And then what we are is we're basically like a we're like a we're like a principal investigator, like a researcher at a lab somewhere.

We just say, hey, you know, I want you to do x y z, and then we just go farm it out to a bunch of, you know, research assistant RA monkeys to go and do the experiments and so on and so forth for us. And so this is along a spectrum of decreasing human involvement. And I'm not really sure what comes next after independent research, but I do not imagine it will require human pings in the loop essentially at all.

This is the same sort of thing that big research labs right now are currently using to optimize their setup. So Anthropix almost certainly doing this all day long for Cloud Code to make things faster, to make things more performant. Um, you know, OpenAI is probably doing this behind the scenes to make a codex, not only better, but even, like, adjust the architecture of the AI models and so on and so forth.

They're probably doing it across all their web properties. Right? Anyone that's really worth any salt at this point has probably been doing something like what I'm showing you guys with auto research for at least a little while.

It's just auto research is, uh, Carpathi's way to democratize that and then allow people, you know, to to do this even with, paid providers like, uh, Anthropix Cloud. K. So if I go back here, you can see this is actually set up the auto research loop and it's actually doing the research, um, which is not essentially what I wanted to do.

I wanted to actually see the dashboard. So it'll say is, show me the dashboard because I actually wanna, like, watch it work live.

And then it's just paused the optimization loop. Now it's going to show me set dashboard. It's restarted that, and then, um, I guess it's going to actually show it to me now in a second.

Cool. We have it right here. Awesome.

So here is our dashboard and we are running multiple experiments. Obviously, this looks a little bit different from the dashboard I showed you guys earlier from my left click auto reshoot, but that's okay. I don't want this to look the same.

I wanna show you guys that you can apply this to whatever you Our very first experiment had an f c p of four six four seven five two and a size of 12.9.

What we ended up doing is we ended up minifying the CSS, making a bunch of changes to the code basically, and it took it from 12.9 down to 10, which technically makes our website even faster. But in reality, it doesn't actually influence things because our scores are basically the same, at least speed wise. K.

So this is just gonna continue operating. Just say continue. Now in my case, what this is doing is it's currently occupying the main thread.

Right? So this is why it's gonna be writing and making changes and stuff like that. At any point in time, I could say, hey, just go run this in the background.

Or, hey, just want you to run this in a loop using, like, the Anthropic agent SDK or something like that. I'd supply my API key and then it would and then it would go.

And what it's doing now is it's actually making the changes. I guess, I should probably also like open the website itself. That'd probably make more sense.

Let me actually take a look at what that looks like. Right. So here's here's the actual website itself.

And you can see that, like, for the most part, you know, it's very basic and simple. But what we're doing is we're just optimizing it. We're making it faster and faster and faster.

This may break the website in some cases. Sometimes some minor changes like this do. But as you can see here, we've actually, like, improved it by a whole whopping two milliseconds.

Right? We made whatever change we did that made this a little bit slower has now been fixed and we're a little bit faster, then it's just keeping each of these. So, you know, these things will go down very, very, very slightly.

They'll increase very, very slightly. But, you know, if you let it go for enough loops, then eventually, can get to the point where you're legitimately making pretty large improvements to the least contentful paint, you know, first contentful paint and and so on and so on and so forth.

And just know that we can discard any runs that don't actually do anything. So, you know, in my case, my uh, like, the one requirement I had for my left click perf auto research, uh, run was that you can't visually change the website at all.

So you should take a screenshot and it should be pixel perfect compared to the initial one, which is why it's, not adjusting the font or whatever. But I can make more or less any other change aside from that, and it's it is doing so, which is pretty neat. Okay.

So now you're probably wondering, Nick, so how the hell do I actually use auto research for my own business aside from the demo that I just showed you? And like, what else could I apply it to? And my rule for auto research is that in order for you to meaningfully make any changes, you need to have three things.

The first is you need to have a metric that you want to optimize for. So in my example, what is the metric that I am optimizing for? Well, I'm off obviously optimizing for my lighthouse score.

And so it's a very standardized metric. It's really simple and it's very objective. There's no real negotiations about what a lighthouse score is.

Google invented it. It is what it is. That's what I'm looking basically to to to assess.

The second thing that you need is you need a way to change that metric. So you need a way you can influence an outcome that modifies the metric itself. So if you think about it in terms of Lighthouse page score, the direct way to modify your Lighthouse score is just to change your website.

And the direct way to do that is just like alter the code a little bit. So in my case, not only do I have the metric, which is a Lighthouse score, I have a direct way I can immediately change the metric. And then the third thing that you need on top of that is not only do you need a metric, and then you need a way to change the metric, you also need a way to assess what it is that you just did.

And so because it's kind of like in the name, right, this is sort of a contrived example. But like the Lighthouse score has a Lighthouse test, and the Lighthouse test just tells you what your Lighthouse score is.

So I have, like, the thing I'm trying to improve, which is, you know, all the metrics I just showed you guys. I have a way to improve it, which is modifying the website, and then I have way to assess that, which is my Lighthouse page score, which I can run-in a loop basically immediately after the changes. It takes me just a few seconds.

And so those are the three things that you need. If I were to formalize this, k, and I will because I just want everybody to know and and be able to visualize it.

The three things you need in order to do auto research, k, are number one, a metric.

Number two, a way to influence or the, I don't know, change method, let's call it, which allows you to influence the metric.

And then three, some sort of assessment. And with the change method and the assessment, the most important thing, at least in in my view, is that you can do both of these things pretty fast. Like, if your change method takes a really long time to do, it takes like an hour or whatever, and then your assessment takes another hour.

If you think about it, your your experiment will only be able to run as fast as basically once every two hours. And that's still like light years ahead of like a, you know, a human experimenter. But if you really wanna see like those crazy vertical lines in the graph as things just get better and better and better, sort of recursive self improvement, you know, you need to have a pretty short change method.

So ideally, this would take, I don't know, let's say, like, thirty seconds or so. Why am I drawing like that? I could just do this.

You know, maybe like thirty seconds or And ideally, the assessment would also take maybe thirty seconds or so as well. Because combined, what we have here is we have a loop that can run 60 times per hour. Or if you multiply that out, what's 24 times 60?

A lot. 1,440 times a day.

I mean, like, if you could run an experiment 1,044 times a day, you know, even if, like, only 2% of these are actually good, that's like, I don't know, about 30 changes that improve.

And if every change improves things by 1%, what you've just done, to be clear, is you've gone 1.1 raised to 30 1.01 raised to 30, which is a 34% improvement per day, at least in the first day.

If you had, I don't know, let's say 90 of these changes be good, then this math ends up mapping way better for you. It's 2.4 x. You had a 180 of these changes, you'd be six x and so on and so forth.

This is gonna go basically as high as you let it. And so going back to my anti gravity here, just seeing a couple of the changes. It looks like the biggest change that it has made that is actually and actively improved things, was this jump between forty five and six twenty seven.

So it made some change here. Content visibility auto removes scroll behavior smooth that actually significantly improved the the load speed. And so that's what it did here.

And we gone from six forty six at the top to a fast contentful pane here of at the lowest six nineteen. It looks like the least contentful pane did not change at all.

Meaning, if this currently loads in, like, I think six hundred milliseconds or so, it's pretty dang good. Now, kind of a contrived example since I just had AI build me the simplest website ever. But, you know, you could see with a more complex website, one that I built for the most part, at least initially, and then one that AI didn't really have a lot of time to optimize for it, and it was a lot more complex as animations and stuff.

We've actually improved that improved that by 20%. To give you guys some more context, there are some people out there that have applied this to projects that have improved metrics by, like, 50%. So Toby Litke pointed this autonomous AI research system over at by the way, this is the founder of Shopify.

Right? Big guy or CEO of Shopify, I should say. He ran auto research on the entire Shopify liquid code base.

Now that's responsible for, like, running more or less everything about Shopify. Like, it's it's their templating liquid syntax language thing. It's it's a lot of freaking code.

And he found that after running this for however many times, he had 53% faster combined parse plus render time, which is his main metric. 61 fewer 61% fewer object allocations, another metric.

And things are just freaking printing for him. I mean, the you know, what's that like? Twice as fast, essentially?

To think that you could just point this at something and go twice as fast in, 20 I don't know, like, 30 runs or something like that is nuts to think about. I don't know how long this took. Maybe it was like an evening.

Maybe he went to bed, woke up the next morning, and his freaking whole code library was twice as fast. I don't I don't know. But I mean, like, the fact that he he has done this and he can do this is obviously very impressive to anybody that has any sort of software that they wanna optimize.

So what are, like, the practical takeaways? You can optimize basically anything you want. So in my case, optimizing website.

How about you guys make a SaaS app? Well, you can actually optimize SaaS app. You can optimize not only the front end of the SaaS app, you could optimize the back end.

You could say, hey. Hey. Here's your server.

Here's the whole setup. I want you to make this load as fast as possible. I want, like, the request to come in instantly.

Do whatever the heck it takes to do it. Here's a quick little test method. You know, we we time how long it takes for one request to come in when you click a button.

You could just tell it that. Even if you just gave it literally the exact transcript that I just gave you a moment ago, it would probably do a pretty good job so long as you're the auto research framework. You could optimize random tiny things in your business.

I mean, there are probably some, like, interfaces, random little modules, and stuff like that in your company that, like, you know, could be way faster and way better. You can actually optimize that. You could optimize things like customer support queries.

You could, like, uh, I don't know, have, like, a prompt, let's say, that, like, an AI agent uses in order to handle handle customer support. And maybe you're running some big enterprise, or maybe you're plugged into a big enterprise and you have the ability to collect this data. We could actually just, like, test modifying the prompt and then, like, waiting, I don't know, like an hour and then seeing the changes.

And, you know, it's an hour, which is kind kind of a loop, but it's still 24 changes a day. You could like meaningfully modify that and move that in the direction towards your goal. You could do cold email.

That's personally what I'm using this for. Cold email is kind of a special case because again, you need a fair amount more time, but I'm still capable of doing something like six to 10 tests a day at like over 500 to a thousand emails per test, which is pretty dang good. You could optimize a bunch of other things as well.

You could optimize like your ad creative. You could optimize your copy. You could optimize your conversion rate by making minor changes to a page.

Could really have agents optimize whatever the heck you want as long as you have the volume of data necessary in order to, like, construct the test. So hopefully, I made it really clear how all this stuff works. All you really have to do is just head over to, you know, that carpenter auto research that carpenter auto research.

Sorry. Not that one. Library or repo over here.

K. And then just copy that puppy in, clone it inside of your repo, and then just do away on whatever task you have. The simplest and easiest one for you guys to see how things work are obviously the website ones.

But, yeah, just know that like you can apply this to more or less anything. As long as you have those three points that I've mentioned, you need a metric to optimize, you need a change method or a way to influence that metric, and then ultimately you need an assessment. Next, I'd like to talk about automation, specifically automating things on the Internet.

We're gonna start with HTTP requests, then we're gonna move up to browser automation. And then finally, we're gonna round it off with computer automation. And I'll talk about a bunch of different platforms you could use and ways to do more or less all of these things.

So HTTP requests are probably the simplest and easiest form of, you know, Internet automation. And Cloud Code does this natively.

In case you guys didn't know, HTTP stands for hypertext transfer protocol. And essentially, every time I send a request to a website, basically, every time I try and load one, what I'm doing is I'm sending a HTTP get request to the server upon which my website is located.

And then my browser will take the response and then mark it up and make it look all pretty. So for instance, let's just like rerun that one more time. My browser, the client, decides it wants to access left click .ai on account of I just typed it into my freaking page.

The second I press enter, what we're doing is we're actually sending a request over to their server, k, which is located at some IP address. And that server is configured to automatically respond to requests of that kind by just dumping the whole website and giving it to you. And so then my browser takes that whole website and then it like marks it up and now I could see it.

Right? Now you might be wondering what exactly is it marking up? Well, if you view the source of the website, which is pretty easy to do.

You can go to any website, just right click, press view page source, and you'll see all the HTML. You can see that what a website is actually sending and receiving is not like the pretty images and stuff like that. It's it's usually just sending references to those images.

And this is actually the content of the website. My browser just has mechanisms inside of it that just know how to turn this into that. Okay.

So case in point, um, the definitive AI growth partner for fast moving b two b companies. This didn't just, like, come out of nowhere. It's not like this is, like, an image.

This is actual text on a page. Right? If I go the definitive, you could see that it's actually being represented on the kind of code of the page that is being sent from the server every time I make an HTTP get request.

The definitive AI growth partner for fast moving b to b companies. Alright. So why is this relevant to us?

Well, because the first aspect of any sort of browser automation, doing things on the Internet, I should say, not browser automation, but, like, automating network tasks, is this hypertext transfer protocol.

Claude and other AI models now have the ability to use web tools to basically make HTTP requests of the kind that I just showed you. And that allows it to do a tremendous number of things, Not all things, but a tremendous number of things if you know how to use it right.

So the simplest and easiest way for me to demonstrate that is you can actually just like scrape any website you want now with Claude or any other agent. Hopefully, it's pretty clear and obvious how. What we do is we just take the URL.

We go back to our agent, which in my case is this auto research one. Then I'm just going to say, retrieve contents of this, just the text.

What What this is going to do next is this obviously going to send the HTTP request using the web fetch tool over to HTTPS leftclick.ai. And now, what it what will have gotten back, k, is it will have gotten back exactly what I just showed you a moment ago, k, which is all of this. And because I said just the text, if I go back here, you could see that it is extracted, sort of stripped all of the code here, and it's returning basically just to the stuff that it could actually see.

So what did it say? Navigation case studies about services reviews. Let's talk.

Case studies about services reviews. Let's talk. The definitive AI growth founder, fast moving b to b companies.

Say it right over here. You know, worked with Anthropic, Notion, Wix, Hagen, V, Lighttricks, Durable, and so on and so on and so forth. Right?

So I guess what I'm trying to say is like, this is a simple way that I can get data. And so one of the first and most elementary uses of, you know, any sort of coding agent is just you can automate website scraping really easily. So I could give it a simple list of tasks and I could say, hey, I want you to scrape like 400 different websites.

I could literally just give it a big array top to bottom. It it could go and it could do the scraping. Now the issue is a lot of the time, k, you wanna go further than just scraping, than just reading a website.

You wanna do is you actually wanna dynamically interact with website and change things. So for instance, let's say, what I'm doing is I'm getting a big list of all of the agencies out there, the AI agencies like LeftClick, and I wanna send them all messages. Well, you know, I could just scrape every single website to see if there's an email address.

Right? But in my case, maybe there's no email address. So what do I wanna do?

I wanna take that next step. The way that I do so is usually through some sort of form or whatever. How do I automate the clicking of a specific button?

It's kind of difficult to do. Right? I can't just automate the clicking of a specific button through an HTTP request because, you know, this is something more than HTTP.

It's kind of JavaScript. I could try. In some websites, I'll be able to.

So hacking this, Hey, extract the cal.com link for me and then open in Chrome.

Now going one step further. Okay. We're gonna open this link in Chrome.

So we actually have this link available. And there are some services out there where you can actually just send an HTTP request to actually, like, book a meeting on a page. But you might think, in order to do that, make sure you have to click on this button and then type this in and then enter a bunch of information and so on and so forth.

Turns out I can actually just use HTTP request. So I'm just gonna say, book a meeting for 03:30PM tomorrow. First name, test, last name, test email, nick@test.com.

And without any more information, what it's gonna do is it's gonna go and it's gonna find the API documentation. So I'm gonna check the availability using the API documentation, and then finally, it's going to ask to book.

So I'm gonna say 03:30PM, March 30, then it's going to go and actually do the booking. But you notice how many issues there are and errors there are with us?

This obviously isn't perfect. Now I could theoretically figure out the exact schema and format that I need to use in order to send requests like this every single time that I try and book like a cal.com. But the reality is, like, not everybody's gonna have a cal.com.

What I'm doing here is I'm building a very particular solution that solves my one particular problem, the HTTP request. And even then, you know, there's just gonna be some back and forth. It's not gonna be it's not gonna be perfect.

And this is taking forever. I mean, I've been sitting here for, ten, fifteen minutes. It's trying its best.

It's booking with a variety of different means and I don't know. Who knows?

Maybe it'll actually go and do the booking. Okay. There we go.

We actually did end up doing the booking. Thank goodness. That said, that took forever and was obviously a very fragile solution that only works with, like, particular cal.com pages.

Right? And so that's where we move to the next level of automation. That's where we go from simple HTTP request, which, you know, most services out there will have some sort of API application programming interface that you can actually communicate with.

But, you know, they're super fragile. They require very particular formats. And as you could see, they they could take a really long time, and then they're very narrow.

That's where we move from sort of like the first level of automation, made should be request, all the way to full scale browser automation, which is where, uh, Cloud actually fully controls your browser. And, you know, there there are a couple of built in tools with this now, but typically, the best way to do this is using one of two tools.

Lisa's at the time of this recording, um, Chrome Dev Tools MCP, or there's also the browser use platform, which actually is pretty new, pretty recent, but it, uh, costs a fair amount of money.

And so what this does is instead of just sending HTTP requests under the hood, what this does is it actually loads up a whole browser for you and then goes through the process of doing a booking. So you see how hard it was for me to do this, you know, sort of simple task of, like, booking a meeting on a calendar even though I gave it the exact time, the exact information, and so on and so forth.

That might have taken a human being one second. It took me, like, something like five minutes of back and forth and probably, like, $40 of tokens. So meanwhile, I can open up a page that has Chrome DevTools MCP, and I could basically say, go here, book a thirty minute meeting for, uh, I don't know, March 30 at 3PM.

Nick test, nick@test.com, answer a bunch of demo stuff for any booking queues. Can I I I just want you to look at what's going on?

I was just using Chrome somewhere else, so it's just gonna kill the preexisting instance. But now it's actually gonna open up a new one. I want you to notice that, like, this is actually, like, opening up a freaking instance on my browser.

And then it's scrolling through and it's clicking on buttons and navigating on the navigating through the page for me. It's literally doing this by modifying the JavaScript of the page and running brief little commands in order to, like, communicate and go through things. So it's filling up the phone number, what made you wanna contact Nick's team, what's the project budget, Do you share anything that'll help us prepare and so on and so forth?

I think the project budget in this case might not actually be 5 or 10 I I don't even think that's an option because we don't go that cheap. As you can see here, it's finding the options for the budget, selecting 25 to 50 k, and then it actually goes through and it it does So what are we learning from this experience?

This is much more general. K? It works way better for a much wider variety of use cases, but it's also a lot slower.

Right? This is something that previously, could have just sent one HTTP request once I know the format, and then I would have, like, booked up for, like, point two seconds. Right?

But now, you know, we're kinda going through the page one step at a time. Every single one of these actions realistically is kind of like a almost like the same amount of time that a single HTTP request would take. Now what it's doing is actually deleting, you know, my numbers and trying to reformulate numbers and stuff like that in order to, like, make it a valid phone number.

And, you know, after a little bit of finagling, it it actually ended up finishing it, which is nice. So it actually went through. It confirmed it.

It then went through the booking process and so on and so forth. And it actually took screenshots the whole way through of the process. So why am I showing you this now?

Because basically, this is a gradient where it takes more setup time to do browser or any sort of automation via HTTP request, but it's faster and usually cheaper.

And there's a spectrum where we go from more setup time, faster and cheaper, to basically always works, but more expensive and slower, assuming that you you don't.

And so what does that mean? That means for any sort of, like, prototyping business application on a browser. I typically use browser automation or even computer automation, which I'll talk about.

And then once I've sorted out that it works, I'll actually go and I'll see, hey. Can we do this via an HTTP request? Because if so, it'll be way cheaper, then we can just run a bunch of HTTP requests in the background.

And it's important because, like, most of the time, like, the cool stuff that you can do with cloud is actually just, like, automation. Right? So understanding sort of this trade off between pure HTTP requests, which typically function off of, you know, hidden APIs or whatever.

And then browser automation, full computer automation, will let you be able to control a lot of things much better. So it's just one example of browser automation.

I could I could use browser automation for anything. Hey. I'm considering renting in Vancouver, BC, looking for $3,000 a month max one bedroom rental somewhere in the downtown core.

Are in buildings that have cool amenities like pools and stuff, and then the bottom two are sort of like our budget options. I could stick that puppy in there, and then it'll actually go through and, you know, navigate to some rentals.c a page. I couldn't do this via HTTP requests without spending a lot of time sorting all this stuff out.

Even then, it would be very fragile because the way that these websites work is they actually, like, explicitly try and go anti automation. They make it, like, really, really difficult to do anything. But, um, you know, in this case, what can I do?

I can actually just open it up. I can change a couple of filters, I can actually go and, like, zoom in on the page. It it can do whatever the heck.

It can use the stuff on the right hand side. It could it could use stuff in the middle. It can thumb through things.

It can get me like a big list of apartments and so on and so forth. And I mean, like, the trade off here is this is gonna take a fair amount of time. Right?

Like, as you see, it's like one action every five seconds or so. But it's so general that I could just give it a task and we'll go and do it. You know, if I were to try and do this by saying, hey, go scrapetherentals.c a web page or whatever, that that would take so much time in order to build to the point where it doesn't just error out.

And then most websites are also very anti brow anti HTTP request automation because it's the simplest and easiest one. You end up just getting like error, error, error, error. This actually, like, uses my browser, which is kinda neat.

Right? Anyway, I'm just gonna let all this stuff go. And in the meantime, talk a little bit about browser use, which I think is probably like the the next level up.

Just called browser use the way the AI uses the Internet. I don't know how long this is gonna end up being sort of like the the way to go. But basically, this is like the next level up from Chrome DevTools MCP, where you give it some very simple instructions and stuff like that, like fill up my loan application, and it'll actually go through the form using something very similar to what we did.

Maybe uses Chrome m c Chrome DevTools MCP under that. I don't know. Um, and you do it for, you know, like a bulk one time payment of a $100 plus, like, pay as you go via credits.

So in my case, I'm not, like, affiliated with this company at all, to be clear. So I'm not gonna touch on it too much, but obviously, it's a pretty cool product. The big draw, I would say, for most people here is just like HTTP requests can be blocked because of, you know, platforms and stuff like that just being scraped all the times.

They try and stop you. Um, so too can Chrome DevTools MCP be blocked in, like, any sort of, like, instance browsers. This platform like, basically, the whole point, you know, just to kinda cut to the, you know, the pricing page and all that stuff.

Like like, 99.9% of the reason you would wanna use this because it is completely undetectable. Um, you could make HTTP requests, sort of the old school way, and then try proxies and stuff, and maybe that'll work, and maybe it also won't.

But if you go Chrome DevTools MCP and that doesn't work, this is what you do, and it's basically, like, 99.9% perfect. It does this because it fingerprints, aka, like, gives every one of your browser instances that are controlled by AI, like this hyper custom sort of profile.

So it seems like it's, like, a request that's made from a real person. And then in that way, it, like, just, like, obfuscates it all. So for most purposes, like, I still use Chrome DevTools MCP, and this is, my main pick.

But if I have anything that, like, I need to do in sort of a sneaky way and, uh, when I say sneaky way here, I mean, like, this is great for stuff like social media. So if you wanna do, like, Facebook scraping or Instagram scraping or if you actually wanna, like, interact with and leave posts and comments and stuff, pretty tough to do just right out the box sort of with, like, a a virgin Chrome DevTools MCP.

But this is really, really good at, like, posting, sending DMs, x connect requests, what whatever the heck you wanna do.

So, yeah, not affiliated with that company at all, but it is pretty sweet. And I think that that's they're probably gonna remain the market leader in there. But anyway, so just like HTTP requests had a lot of setup time, but they were faster and cheaper once you set them up.

Browser automation is kinda like a good, like, middle ground where it's like, oh, you know, like, this actually has some some basic browser functionality built in and, like, it's pretty obvious how to, like, click a button or whatever. Computer automation is sort of like on the far end of the spectrum where basically no matter what you throw at it, it will always work.

The downside is it's very expensive, takes a tremendous number of tokens at least right now, and it's very, very slow. And the way it does this is, you know, whereas HTTP requests manipulate like APIs and curl requests.

Curl is actually lower case. Browser automation manipulates JavaScript and, I don't know, like page clicks, like button clicks.

Computer automation literally controls your mouse and your keyboard. And because it controls your mouse and your keyboard, you can do more or less whatever the heck you want.

Like, I could literally like, I could take my mouse, and then I could go all the way up here, and then I could close that tab. Can move this all the way at the left. It could close that tab.

Like, basically, it it can do anything on the computer that I can do. Now the way you do this right now is you gotta use the Claude desktop app. So I'm gonna head over to Claude, and then I'm gonna open that up.

And then I think it's currently available in both co word co work and code, but I'll just move over to the co work tab. And I'll say, have computer use, scan through my downloads, find the image called maker school 26 or something, and then rename it to weekly community call picture.

And the reason why I'm doing this is because every dang week, I have a weekly community call, and then I always just lose where the images that I use as the thumbnail. And what it's gonna do to start is it's actually gonna whip up like computer use. So it's gonna request access to my finder.

And now, as you could see here, it's actually whipped up like a computer use thing. So now it's gonna go through and actually like type in my downloads folder or whatever. Navigate over there, and it's just gonna start typing a bunch of different things like maker school and maker school 26 and probably try multiple variations of like maker school, maker school underscore, and so on and so forth.

Because it's using my mouse and my keyboard, you know, I can actually, like, scroll through and and do things. Now this is, local browser automation. It's actually literally exactly what I want, which is nice.

I could have done this in, thirty seconds, but it's nice that it's figuring this out. It's using, like, a local browser sorry, local automation here to, like, click through, scroll down, and stuff like that.

If at any point in time I wanna change it, I'll say, no. You had it. It's the the cover 26.

I'll press that in just so that it knows what it's doing. Alright. Just went to grab a coffee and I got back and it has now found the Maker School icon 26, renamed it to exactly what I wanted.

And, yeah, I guess I screwed up on the name, that that was what I wanted, which is pretty cool. So hopefully, you guys could see pretty straightforward here to use computer automation. Takes a lot longer.

Also, consumes a lot more tokens because it is literally like controlling my mouse as it moves across the page, taking screenshots of everything as it does so, and the amount of like fidelity that it requires in order to do that is is pretty high. But yeah. I mean, like, eventually, okay, put on a loop.

This sort of thing will work. It it might just take a tremendous amount of time. Just give it a task.

Say, keep going until you solve it, and it will do it. It will just probably burn your a hole through your wallet while while it does so. Realistically, the probably core play that I repeatedly fall on as somebody that designs these systems for real businesses that earn hundreds of thousands to millions of dollars a month, is I will start with some form of browser automation for the most part since we're usually just doing this in browser.

I'll usually try Chrome DevTools MCP first. If that doesn't work because it's like a stealth application or it's something that, you know, requires social media access, I'll do browser use. Once I have that flow down, you know, unless it's like a Facebook or something like that, because, uh, those are just notoriously difficult to, like, HTTP automate as well.

Um, assuming that it's not, what I'll do is I'll look to have Cloud Code build like custom utility based off of the data that it gets from Chrome DevTools MCP because it'll have access to network requests and actually see the requests that are being sent and received. Once we have all that, then I now have, like, the API internally.

I write a bunch of docs and have Cloud Code sort of, like, embed that within my workspace. And then the next time around, I can just use HTTP requests. Although, you know, keep in mind that when you do it this way, simply because of the volume that you're able to hit and the fact that HTTP is, like, typically a lot more regulated than browser automation, you know, there are some there are some risks to that as well.

You could get rate limited. You could get throttle. You could also get shadowbanned.

Okay. So that's the three levels of automating economically valuable knowledge work through Claude. It's really just HTTP request, browser automation, or computer automation.

Whatever way you decide, just know that doing that sort of automation is against the terms of services of a lot of platforms that you work with. So I'm not condoning this. I can't really explicitly recommend it.

Just making sure that you guys understand sort of what's available and what other people are doing as well. Next up, I wanna talk about Claude code performance fluctuations and what to do if and when this ends up happening.

I don't know if you guys have ever watched that movie Interstellar, the one with Matthew McConaughey. It's one of my favorite movies ever.

And in it, there is a major problem that has plagued the world that has, you know, sort of settled the events of the movie in motion. And that's basically this idea of the blight.

Now, what the blight is, is it's some disease that started affecting a bunch of plants. And as a result, something like ninety percent of all of the food in the world is now just corn, specific type of corn. That's why they got these big cornfields and stuff, and then, you know, the main character's family just does corn farming all day.

So in history, this idea is referred to as monoculture harvesting, like monoco monoculture farming, essentially. And it's where, you know, one particular crop is just so damn good.

It's just so freaking productive. Right? Has the highest yields and so on and so forth.

Then over the generations, the farmers learn, well, this is the best crop ever. Why don't I just replace all my crops with this crop? Then I can make a bunch of crops, and then I'll just trade this crop for other crops as necessary.

Every time that happens, usually, productivity or yields will go up, and they'll go up for sometimes a long period of time, sometimes like literally generations. And then all of a sudden what occurs is there ends up being a problem with that crop. The problem is either in the soil, the problem is maybe a bug that is developed that, like, really screws with that crop specifically or something else.

And because all of the farmer's eggs were in that one basket with that one crop, what ends up happening is this this blight or this disease or this circumstance ends up destroying all of their crops at once. That's led to some of the biggest famines throughout history, I believe. And it's one of the reasons why, you know, farmers nowadays do a bunch of things, namely crop rotation.

They have multiple different crops that occupy the same thing of land. They, you know, we usually don't do just one crop. They have multiple crops going, whatever types of crops they are.

Just so that if a harvest, you know, one type fails, then, you know, they'll at least get something from something else. Well, the reason why I'm bringing up this analogy, and I think I've really hammered at home here, is because I think this applies to Cloud Code.

Cloud Code's really good. I don't think there's a better coding harness out there. I don't think there really is anything better than Cloud Code, at least since the time it's recording, and I don't know if there ever will be.

This is me just being honest with you guys. I think at a certain point with AI, you know, an agent's ability to program the next model, k, just gets better and better and better. And so the people that have the better agents, if if they apply their resources effectively, just end up with, like, this impossible advantage due to exponential growth.

So what that logically means is that, you know, it's the best crop ever. Right? It gives you the biggest yields ever.

Because it's so productive and because it makes you productive, you're probably just gonna wanna use it all the time. The downside to that is there are a lot of things here outside of our control in terms of Cloud Code performance. And sometimes, Cloud Code performance goes up and it goes down, and other times, it's just completely gone.

So the reality is we're probably all gonna be using Cloud Code a lot because Cloud Code, as mentioned, is freaking awesome. But if you grow to rely on it to the point where Cloud Code is basically a monoculture crop, you end up with situations like this, which actually just happened yesterday. Just one of many occurrences.

To make a long story short, Cloud went down. You know? There was a big issue with Opus 4.6, and I think it lasted, like, maybe an hour or so.

And basically, 95% of developer productivity plummeted the second that Cloud was gone. The reason why is because, you know, Claude was everything.

They stored all their files on, you know, the Claude desktop app with simple skills that were just made in like Claude's format and nobody or nothing else's. The second that Claude, you know, was down, then all their prompts that they had saved in specific points and stuff like that were very difficult to access, and they weren't good to use with other models.

Whole code bases that have been designed by Claude were not interpretable at all. There was no commentings. They tried using other models and other agents, and, like, that didn't really work.

And then ultimately, Claude is just the best. The intelligences of the these other agents just don't work the same. So, you know, just led to, a bunch of bunch of issues, essentially.

This isn't the first time that this has happened. This has actually happened a number of times. You know, this is Adam from earlier today talking about, like, major outages with Claude and how different types of platforms are operational, whereas other ones aren't.

There's also a bunch of Claude code performance degradations. You know, I just looked up an old post from I think it was Derek here, who's one of the lead guys on Claude code. He like drops Claude code updates and stuff all the time.

Well, anyway, you know, there were degradations historically. This is 12/17/2025 of Opus 4.5 in Claude Code, where basically, because of some runaway, either garbage collection or some sort of, like, memory issue, You know, Opus just got worse and worse and worse and worse every day for a certain period of time, which led to, like, you know, massive performance decreases.

Literally, on planet Earth, at least in knowledge work. So okay. Hopefully, at least this point, I've convinced you guys why Claude is nowadays probably already pretty monocultry and likely as it continues to to dominate and likely to just become more and more and more monocultry over time.

The question obviously is what the hell can we do about it? And so there are a couple of solutions, and most of them revolve around this idea of diversification. We're basically, you know, instead of just putting all of your eggs in the clawed basket, this is my cute little basket, sticking it chock full of, you know, nice clawed eggs.

What we do is instead of putting on all 10 of our productivity eggs in this clawed basket, we put like seven, eight, or maybe nine in them. Okay?

So maybe like seven out of 10 in clawed. And then what you do with your other three out of 10 is you just distribute them. You distribute them such that, you know, I don't know, one out of the 10 are in codex.

You know, another one out of 10 my god. I'm gonna get really good at drawing these.

Are in I don't know, like, anti gravities like Gemini. Right? And maybe one out of 10 are in some other type of coding harness, like a pie or something that maybe also uses, like, some form of, like, local models or whatever.

The point that I'm making is, obviously, we're being pragmatic here. Like, you should probably predominantly use the best model out there because, you know, it's not like a it's not a linear thing. If a model is, like, 1% better than another model, that 1%, once you get smart enough, is like the difference like a gulf.

Right? Einstein is like 1% smarter than a a a normal human being or something like that, and he was able to come up with a theory of relativity or something along those lines. Obviously, don't take me at face value there.

I'm sure his IQ is through the roof. But the point that I'm making is, like, when you get to this point with these weird galactic intelligence, even like a small little increase in the the the intelligence of the model may lead to, a big downsize difference. Right?

So if you have the ability to use the best model, just use the best model. But don't put all your eggs in that basket because if that occurs, then what'll basically happen is, like, as the performance of Claude over time goes up, assuming Claude is orange.

Your total productivity in blue here will also go up basically in lockstep. And so if the performance of Claude goes down, so too is your entire productivity.

If the performance of Claude goes up, so too does your entire productivity. Instead, diversify. Okay?

Instead of just this, like, yellow one, which is Claude, maybe you have like a green one here, which is Codex. And what occurs is, you know, Codex maybe is a little bit more like this. And so what ends up happening is the average performances of, you know, both of these sort of average out.

And then instead of being super reliant on Claude, what you get to do is, know, this black thing, which is like you, ends up being a lot more stable. It's the same thing in investing. Have you guys ever invested in, like, I don't know, an ETF or some sort of index fund?

You know, basically, the way that all of these stocks work is there'll be a stock that does this. There'll be another stock that does that. There'll be another stock that does this.

There'll be another stock that does this. Do you see how volatile okay. That stock probably doesn't go back.

Do you see how volatile all these different stocks are? Well, rather than tie your your literal life savings to all of, you know, any one of these stocks, you just tie them to all of them simultaneously. Such that, you know, over time, maybe your things slowly goes up and that's a lot more reliable and dependable.

Okay. So the way that you do this in practice, the way that you diversify your models in practice is you use platforms built in that have the ability to orchestrate or juggle multiple different types of agents just inherently.

Or you use things like MCP servers or whatever that allow you to do that sort of thing within Cloud Code or within, you know, some other, you know, coding agent. And so obviously, like, now, k, if I'm just being pragmatic with you, there's there's Cloud Code and that's sort of like the big boy. And he's they're fantastic.

Then there's, you know, Codex. And some people will swear on their mother's life that Codex is way better than Claude Code, but I I don't really think so. And then there's like, you know, Gemini isn't really the the right term.

It's sort of like antigravities, like agent chat within anti gravity.

K. And this is sort of like my little personal tier list. But basically, you know, use other models in conjunction with harnesses and stuff like that that you might have set up in ClogCode for for best results.

Okay. So yeah. Anyway, there are two main major ways of doing this right now.

The first is using a platform like Conductor. If you've never seen a platform like Conductor, what this does right now is it allows you to create a bunch of parallel codex and Cloud Code agents inside of isolated workspaces on your computer. You can then, just like with anti gravity or, you know, Claude code desktop app order, you can just see how their performances and what they're doing sort of in real time.

And because you are just the conductor up at the top, if, know, the Claude code chunk of these don't end up working, but then the codex ones do, and that's perfectly fine. It doesn't really change anything for you. You're just gonna like momentarily allocate most of your time and energy to the codex ones.

It's on the exact same interface. It's very straightforward. You just do it all, you know, through this sort of like conductor interface.

Super easy. And then, uh, you know, like, this is used by a lot of real big people all over the place to basically average out minor statistical fluctuations and models, and then allow for the taking advantage of different parts of different models that are slightly better, slightly worse than each other's things.

Like, for instance, a lot of people think that Codex is actually, like, quite cracked at, you know, the sort of, like, deep contemplation required to make big back ends, and it's better than Cloud Code. I don't know if I entirely agree with that. And I think even if that were correct today, it probably would not be correct in, a few weeks because things change so quickly.

But, you know, this allows them to take advantage of Codec's ability to build the most cracked back end ever and then have Cloud Code do some other thing that Cloud Code is great at. Okay. So Conductor's pretty sweet.

I'm not gonna worry too much about like setting it all up. It's actually quite self explanatory, and I don't wanna just make like a seven hundred hour YouTube video that's me, you know, setting up a bunch of different platforms.

There there's no real value to this. These guys set out the knowledge the documentation really, really plainly and really intelligently done here. You can just click that download button, set it up, and and you'll be good to go.

Okay. So that's number one. Right?

Number two is you can use something like MCP servers to distribute your load across multiple different models.

So for instance, there's this Codex MCP server, which, know, technically lives in Cloud Code. So if Cloud Code does go down or something like that, you won't necessarily be able to use it. Keep that in mind.

But, know, if it's just one of the Cloud models or whatever, it's a little bit different. Basically, you do is you download an MCP server that allows you to communicate back and forth with a Codex. And so that one's very straightforward and easy.

There's there's a git repository right over here. It's very straightforward. All you do is you literally just like install the codex CLI, k, using n pmi-g@openAIslash codex.

You just give it your open AI API key. Then you just add it to Cloud Code, then you can actually just, like, have a conversation with them. So for simplicity's sake, I'm actually just gonna do that because that's a lot faster.

I'm just gonna go back to my anti gravity instance, which is just right over here. You can see I got a search back a little while ago from something that I was working on. I'm just gonna open this up and I'll say install this.

I'll say keys in dot ENV. Don't share. This is a demo.

Let me know when done so I can restart. And what it'll go through is it'll go and install the Codex MCP server. And then I can just go here and I could say, hey, ask Codex how it's going.

So now what it's going to do is rather than just, you know, kind of operate in its own thread, it literally just run through like a a thing, pinging Codex and saying, hey, man, what's going on? It echoed back the message successfully. Okay.

I want to chat with Codex. Yes. And let's just hear what it has to do what it has to say rather.

So codex dash CLI codex. This is just a ping, I guess, to make sure that it's online. This one is now saying, hey, I'm running on codex on g b d five in your local coding workspace.

I can do all this stuff. The file system's currently restricted and so on and so forth. So, I mean, this will work in the cases where you want Claude to, like, orchestrate a conversation with Codex so that actually have me go into Codex.

And that can that can be quite good when, you know, you don't really wanna, like, upset your local workflow. You still wanna work within Claude code and do everything that you're normally doing. But then for whatever reason, Cloud Code performance has been degradated.

Degradated. Degradated? Degradated.

But I should note that, you know, if Cloud Code itself goes down, let's say there is some widespread anthropic outage, you know, your your next best bet is to literally go and download probably like the Codex desktop app here.

Download it for Mac OS and either get a subscription or at least know how to get a subscription, know how to use the app such that if there are major issues with any one of these platforms, you know, at any point in time, you can just jump right back. So it's personally what I do. I actually have Codex up and running.

I know how to use Codex. I'm very familiar with Codex. You know, the way that I set up my workflow is not only do I have, like, a dot cloud with the skills and and, you know, so on and so forth, but at any point in time, I just I can just duplicate this whole workspace such that it's like generally accessible by any agent.

I can actually go over here and then say, hey, for whatever reason, Claude code is down, so I'd like you to duplicate this whole business workspace, change anything that is Claude specific, like the dot claude, the claude.mdetc to, um, the usual agent specification.

You can find all that at agents.md. Um, and in general, just make sure all of this stuff works for codecs. Now what you can do is you can either run some sort of, like, synchronization flow, or you could just, like, manually do this every now and then.

And then you can send that off to codex, however necessary. Cool. Now it's actually going through this process of syncing the workspace to the exact same type of folder slash business dash codex, then it's just changing my agents dot m d and stuff.

What you could also do is inside of the same workspace, you could just, like, duplicate this, make this like dot agents or whatever. You could have this just all go cap agents. You just probably need some line in your Cloud and m d that says, hey.

When you update your Cloud and m also update your agents.md, whatever the whole purpose of this workspace is to work with anything. In my case, you know, I this is just very Cloud specific. I'm making courses on Cloud, so I can't really just mess this up and I don't want the workspace to get any any messier than it already is.

But hopefully, you guys see how easy it would be realistically to do some form of diversification. Okay. So just to make it super clear, there were three main forms that I was recommending here.

Right? The first form was I recommend downloading and then installing a tool like Conductor. What Conductor does is allows you to run a team of different coding agents right out of the bat using like the native CLI for Codex and Cloud Code.

And so you're actually having multiple agents just like operating in parallel. They're just doing so sort of in one workspace that is not like branded or tied to any individual type of model provider. The second one is using something like the Codex MCP server, which is great to use when like Claude code is up, but individual Claude models are degraded or there's some issue that are that is preventing it from operating the way that you want it to.

In that way, you could still take advantage of whatever cloud model you do have access to. And also, like, your own cloud interface, let's say, in cloud codes desktop app or maybe like an anti gravity, um, cloud code extension setup like I have. And then the third is just operating in an entirely different agent platform entirely.

Um, my recommendation at least as of right now is to use Codex because, uh, every test that I've ran with Gemini is nowhere near as good, um, at anything except for front end design. Perhaps their new model will come out and that'll be way better or something like that, but I'm not gonna hold my breath for that at the moment because as mentioned, I think Claude is really just the dominant.

The the dominant playboy as of right now. K. And all of this is because we do not want the monoculture crop.

We do not want all of our eggs in one basket. We can have most of our eggs in the Claude basket for sure. But if you put all of them in, then you're going to suffer the exact same situation this present guy did where, you know, the second that Claude went down, he just couldn't do anything.

Okay? So hopefully, that makes sense. I personally am about 70% cloud code and maybe 30% spread across codex and then like a couple of open source models.

And then I use agnostic, you know, coding harnesses like pie in conjunction with things like conductor in order to make sure that I'm good to go. Alright.

Now let's chat workspace organization. I'm gonna show you guys the way that I personally organize my workspace. It's discussed a couple of alternative ways.

And then also just talk about like the hierarchy of information and then how to maintain like a really root clean file space. So this is the structure that I basically have set up. And I'm gonna run and go through my actual anti gravity setup in a second.

I actually just had AI generate me a bunch of diagrams for this, so that's pretty meta. But to make a long story short, I store all of my business stuff in a business workspace. K.

Now, my business workspace includes a bunch of additional folders that you don't really need in order to have my structure. They're very specific to the platforms that I use and and whatnot.

Really, the folders that you need, if I just cross out all the stuff that you probably don't actually need. K? And like, you probably don't need this either.

Some people have virtual environments, some don't. But really, the stuff that you actually do need is going to be like a dot clod, which is where you're gonna store all of your, you know, clod specific files.

So it's where you're store your skills. It's where you're gonna store your agents and ETC. An active or a temporary folder or whatever the heck you wanna call it.

But this is basically just gonna score everything else. So all the generated files and so on and so forth.

A dot ENV where you're gonna put your, obviously, ENV type keys. So any sort of like API keys, credentials, anything like that. And then finally, your local cloud dot m d, which is just like your local system prompt.

And if you guys remember, we store the global system prompts in a kinda like a tilde slash dot claud folder where, you know, the rest of your your global stuff is.

And this is like this is somewhere else. This is usually like your home folder, wherever that is. On a Mac, you know, in my case, it's like Nick's or I have.

So if I go on my Nick's or I have folder and then I show hidden, I can actually see the dot cloud folder. I can click on it, and I can see it under your workspace. It was like a Windows or whatever.

It's it's gonna be different. So you're gonna have to look for it. Okay.

So mine obviously looks a little bit different from that, but I just want you to keep in mind those, you know, the dot claud, the active, the dot ENV, and then the cloud NMD. That sort of structure that I showed you a moment ago.

That's the one that I'm gonna be assuming that you you're gonna be building. Okay. So I separate things into and I also have a personal version of this, but for now, we're just gonna stick with business.

A business workspace. And so I literally have like a folder on my computer, you know, Nick's arrive, and then it goes slash business.

And it's within this business folder that I currently exist that I do all of my work. So what do you have inside of business? You have your dot e n v.

You have your Claude skills, which is sort of like the intellectual capital that you accumulate over time as you do various SOP able things. You have your claud dot m d. Then you also have, you know, like your active folder.

And the way that I personally organize this as somebody that not only uses claud code and other agents in my day to day life, but also sells clients on the implementation of these sorts of things. And then is also responsible for using Cloud Code in order to fulfill the implementation, is I separate it such that my main business needs that contain all of, like, like, my stuff is in this business folder.

And then anything that I do on behalf of my clients lives in specific client folders. So let's say I a client called client a. Well, client a actually has his own dot e n v with the client's API keys.

They have a dot cloud slash skills with the project skills, skills that are highly specific to the needs of that particular project. You know, if I work with, like, some sort of digital marketing agency and I have a skill that I use on their behalf in order to, like, connect to some service that they use to print out a report.

Like, I would put that skill inside of the client folder. Then I also have a claude.md on that essentially, you know, I just run with a slash in it, and that also just describes a little bit about the client.

In the same way that I showed you guys earlier, I have my own claude.md that describes a bunch of stuff about me. So, oh, who am I? Nick Soraya, if you know, I'm 30 years old.

I'm an n dash j. I currently live in X Y Z area, here are all my businesses, here are much money I make, here's all this like highly relevant contextual information. I also have similar contextual information for my clients and then for their businesses, as well as anybody on their team.

So that, you know, if I say, hey, send a message over to Jane, let her know x y z. It's literally just like one message and then and then it's sent. K.

So I I duplicate that across all my client base. So client a, client b, client c, however many clients you have, that's how many project folders I have. And the key here, and the reason why I think this is like this most solid organizational scheme I've stumbled on after several years of working with this stuff, is you can actually call client skills while still being in the business, um, workspace.

You know, it's not the exact same because you're not technically loading them inside of the, um, if I just go slash context here. You're not technically loading them inside of the actual context. K?

You only get the ones that are like sort of local here. But, uh, you can still call skills that are not local simply by putting in your CloudNMD a one line thing that says, hey, um, there's some skills that we reference that aren't all going to live inside the dot cloud slash skills folder.

These are client specific skills. If you wanna reference those, then you actually have to go inside of the client folder that I'm referencing and then, you know, pull it out that way. And so in my case, you know, the business workspace is sort of like top level and the client workspace is sort of underneath.

So what's up with this don't pollute root? Always store an active or subdirectories. You know, earlier I said I have an active folder.

The reason why is because if you start polluting your root, it just ends up being like a total nuclear bomb waiting to happen. You just have so many files.

Your files are stored all across one giant folder. Not only is it like visually insane to look at because it's like, this is always open essentially, and it just pushes all the way down to the bottom. But it's also a little disorganized for your agent as well.

Better instead to store specific locations that you dump files to, Okay. Using the skill spec itself.

So for instance, inside of model chat, if I go over to my skill, you'll see that it actually specifies where to put the actual model chat. It literally says dump it inside of active slash model dash chat and then name it in this particular way.

So in that way, this model dash chat skill is like she hooked up over here to this model dash chat, you know, conversation thread. I can open that up and I can actually, like, see the conversations that we have been having.

It's also much more organized for the skill because I'm not just dumping everything in the same place. It's super easy to do, and then I don't actually have to do any sort of, like, agentic search or agentic lookup, which I think is pretty valuable because agentic lookups are just more things that consume tokens. So what I'm trying to say is I just store everything inside of, a folder I can toggle called slash active, and then I store any specific information as to where these things will go inside of the actual skill themselves.

So, you know, there's a bunch of leads of my own CRM. That's where they live. There's like some config config files for other things.

This is where they live. If I do research, this is where they live and so on and so forth. I would never store random scripts directly in root.

Neither would I do temp files or data files. If you want, like, temp files files that you know are only going to be used for, like, a short period of time or in the course of a a a process being executed, Personally, I actually store these as like active slash dot TMP instead of some hidden TMP folders. So they don't even mess up my active.

And you're probably thinking like, well, won't I lose stuff if everything's super nested? No. You you won't lose anything nowadays.

You're trading off the amount of time it would take you to like scroll through your root thing for the amount of time it would just take you to pump it into your agent to ask it, hey, can you find x y z? But you'll find that if you just like allow the agent to organize your workspace, it it tends to do so in a pretty consistent and then reliable way.

So long as you expressly give them a structure where you're like, hey, make sure to always put stuff in active. And remember earlier, talked about diversifying away from just Cloud Code. Well, what's really cool is, you know, when you'd run a business workspace like this and then you have your client and and and so on and so forth workspace sort of underneath it.

What you can really easily do is just duplicate your Cloud NMD into an agents and then a Gemini. MD. You can just have all of these in all of your workspaces simultaneously.

Such that, if at any point in time you wanna use, I don't know, cursor for something, you wanna open it in anti gravity, you wanna do it directly in Cloud Code, like, you never really run out of the system prompt design pattern. Like, you know, if you have the same thing written in CloudNMD, same thing in AgentsMD, the same thing in GeminiMD.

You basically just, have that on twenty four seven. Now I haven't needed to do that personally in quite a while, and I've actually been very lucky to have not been affected by some of the recent outages. But I remember back, I don't know, like a month and a half ago or whatever.

I actually had like a specific line that said, hey, I want you to synchronize the CloudNMD with the HSNMD and the GeminiDynami all the time just in case, you know, we have an outage, need to drop this into a different coding platform. Now another thing that'll happen reasonably often is, you know, because we're not dumping stuff into our root, we're gonna end up dumping a lot of stuff into active.

Right? And so I have like just a bunch of stuff here, dub video links, CA dentist, auto research, Hindi source, you know, when I was dubbing my stuff. Bunch of different screenshots and stuff like that.

You wanna periodically clean up this workspace. So you periodically wanna say something along the lines of, hey, clean up my active slash folder. Anything inside of subfolders are fine, but anything that's just loosely in the in the folder, like any TXT files, PY files, JPEGs, and related.

I want you to clean up by either deciding if it's necessary. It's just a temp file, just get rid of it. Otherwise, store it in a folder that makes sense.

You're gonna wanna run something like this reasonably often. The reason why is because you just don't wanna have to, you know, scroll again through like a quadrillion different things. And you also wanna make sure that any future model that comes around just like very logically look at some sort of organizational hierarchy and then make decisions based off of that.

So that's what's going on here with all these docs for iClosed. Right? It's deciding what to do here.

It's gonna download them into different folders. It's actually going to get rid of a couple files here like, hey, this is a file. This is an incomplete download.

This is a bunch of unnamed temp snapshots. Right? And and what you'll find is within, like, two seconds, it just does the whole thing.

So now my active folder is much, much cleaner, and I don't have to worry about this sort of thing ever again, which is nice. And, you know, in my case, I also have a couple of these web design projects. Enumerate all the web design projects inactive.

These are things like Volta, Aura, and so on and so forth. Find similar projects and then store all of them within a web dash design folder.

And despite the fact that, you know, you might be like thinking, Nick, why the hell are spending time and energy doing this? If your workspace is clean, the work that you do within that workspace tends to be a lot cleaner as well.

And so, I mean, in my case, I just found what? Like, one, two, three, four, five, six, seven, eight, nine, ten, eleven or something like that, different things. I've just sorted all these out now.

Anything here that is more personal than business, let me know and I'll upload it into the personal workspace instead. I just let that go, but I don't obviously wanna show you because there are some personal things in there. And that takes me to the next point of workspace organization, which is everything that I just talked to you about, um, when it comes to, like, organizing with a business at the top level and then having various client folders in, you can do the exact same thing with personal.

And so I don't actually just have a business sort of workspace setup. Claude has now gone beyond just my business partner. K?

And it also assists me with a lot of personal stuff. And when I say personal stuff, I'm not referring to, like, I don't know, relationship troubles or whatever. I'm talking about, like, for the most part, my you know, things like my my my citizenship paperwork, you know, important documentation relating to my identity, personal projects that I have that are, I don't know, related to, like, learning piano, that sort of thing.

And so, like, I have, like, a business one over here. K? But just because I want this to be really, really clean, I'm also gonna show you guys a personal version of this.

K? Which is basically the exact same thing. And then instead of doing this via clients, which, know, I mean, like, obviously, if it's it's a personal project, it's not a client project anymore, and then you can't really do it that way.

But instead of doing things based off of clients, I now recommend doing things based off of like domain and or, you know, like a particular field of your life.

So I haven't found the best way to organize this yet. But for instance, I have one right now on citizenship because I'm currently proving my my citizenship to, you know, a particular country in Europe. And as a result, I'll be able to be a a an EU citizen.

It's gonna be pretty fun. Likewise, I have a sub one called health. This contains a couple of skills that I use to, like, visualize my genetic libraries and stuff like that.

And hopefully, you guys are seeing the point. What you do is you just sort of you enumerate the clients of your personal life, which tend to be projects like citizenship, you know, your health, uh, I don't know, your skincare and whatnot.

And then you contact or or or list those underneath your personal workspace. Then you also have skills related to your personal workspace like, hey, you know, can you clear out all of my, I don't know, like personal emails for x y and z. In this way, you have a good separation, at least in my mind, between business life, your personal life, and then also just logical grouping of each of the different things that you can do within them.

So I also have as mentioned, you know, that personal folder. I can open that personal folder only anytime I want. It was just right back up here.

And that'll just contain, you know, specific personal conversations I've had with, you know, Claude and anti gravity to do things. And I'm happy to, like, pay token costs, stuff like that to absorb that because my personal life isn't, like, personal personal.

It's just stuff that is not business. Right? If I can improve the productivity, that might as well.

One more thing you'll notice is that when I open up this personal, the colors were a little bit different. I do that on purpose. I do that because, you know, if I am working on business stuff, I want it to be very clearly, like, accessible and visible to, like, my my my monkey brain.

Like, I instantly wanna know I'm in my business folder. Whereas when I'm in my personal folder, that's different. And so what I've done is I've I've made the outline of this green.

I do that by creating this dot Versus code settings folder, and then I just have sort of like this config that Versus code reads at the beginning of every run to like actually change the header bar. This isn't like a super big unlock or anything, but I do find just like having a a slightly different color.

Well, I always just make my own be like, hey, this is my personal folder, so I have access to like personal information here, I can actually have a conversation about whatever. I don't need to re prompt it with a bunch of stuff. You'll also notice that, you know, this doesn't have like the Netlify or a bunch of those other sections because this personal folder only stores stuff that is like for me.

It's not for Netlify. Okay. So hopefully that gave you some insight into at least how I organize my workspace, but this isn't by no means the only way to do so.

There are a bunch of other ways to do it as well. One candidate way is instead of having, like, a business workspace, what you do is you just enumerate all the projects in your business. So I don't know.

You might have a a project, for instance, that's like website overhaul. What you do is you have, like, a top level folder. K?

Your top level folder might be business or it might be whatever the name of your company, Left Click Incorporated. Then inside, you have a projects folder. And underneath your projects folder, you have, like, website design.

You have conversion rate optimization. You have lead generation and so on and so forth.

If you're running a business, you can actually now have your CRM entirely within Cloud Code as like a dot JSON file. And then, uh, periodically in a daily basis, you can synchronize using some sort of cron job or something like that too. I don't know.

Some events that are pulled in from a calendar, you could store stuff that way. I've seen people host everything on GitHub as well, do some sort of like daily, uh, download or clone of GitHub, and then some sort of like nightly push so that they always have all their information stored on the cloud.

You can do that in conjunction with the previous system I told you about, or the business slash personal slash client one that I talked about initially. You can also just ask Claude to set it up according to however you like. If you guys don't like the way that I set up my workspace for whatever reason, despite the fact that I do think it was probably like top 10, you know, by all means, you can just ask Claude, hey.

I wanna have information for this. I wanna have information for this. Can you build me like a strong naming scheme or or system that'll enable me to do that better?

Okay. Hopefully, you guys like this and it made a lot of sense to you. If guys have any questions on that, let me know.

But let's move on to the next module. Now on to a topic that I think a lot of people don't like, security. And bear with me, usually, most of the time, when people talk about security, it's sort of divided into two camps.

On the left hand side, you have like the accelerationists that are like, cloud code for everything, baby. I just gave it my DNA and USB stick with all of my personal private information and passwords. Let's do this thing.

Then on the other side, you have like grubby old folk that used to, you know, program computers by punch cards. And so obviously, there's some irreconcilable difference there.

They're like, what the heck? Why would you even, I don't know, like make something web accessible, man? You should do everything on bare metal.

And then other folk are like, well, you should just have Claude code do everything. Now, the reality like most things is nuanced and in my opinion, the best case is somewhere in between. So this module and the next are gonna be a lot of talking and a little bit of demoing.

But it's important for you guys to understand as Cloud Code ends up becoming more of the predominant generator of productivity in your life. But there are a few small security differences or impacts that you can have on Cloud Code that solve like 90 ish percent of all of the possible downsides and there's basically no reason not to do them.

Okay. So I have this Google Doc over here that I'm just gonna walk you guys through. And really, the first point I wanna make is that everything on planet Earth is hackable.

It's always just a question of how hackable. You know, your front door is hackable. Technically speaking, the the Department of Defense is hackable.

Everything is hackable. It's just what is the risk and reward involved in securing it to the point where you, you know, dispel 90 ish percent of attackers.

So the way I see things, you should eighty twenty security, avoid most of the low hanging fruit, and then just accept that there's always gonna be some small percentage of people that are gonna hack you anyway or try to hack you anyway. And, you know, depending on how big your vibe coded app or agentically engineered flow ends up getting, obviously, your attack surface is going to increase one to one with that.

You know, just for a reference, like, when I was first starting on YouTube, I had like one login attempt per month and it was always me. Well, now I get like probably 30 to 40 login attempts per day. It's just a bunch of people that are constantly trying to hack my ass.

You know, back in the day, had nothing sort of to lose, wasn't a very big deal. Now, it's obviously a lot a lot bigger. And you find this as you kind of go up the chain.

You know, if you become a public figure or whatever, obviously, you're more likely to get that. Can't imagine what Chris Hemsworth fricking open claw probably looks like, but that's aside from the point. Just know that everything is sort of relative.

And in in your shoes, you should just cover the $80.20. Okay. So we're just gonna get to a point where our app or setup is less hackable than the amount of time and effort it would require to actually go through it.

Anybody could theoretically break into your house right now. Most people don't because there's just a little bit more effort required to break into your house versus, you know, if you just unlocked your front door and somebody could walk right in. So what we're gonna do is we're gonna put the equivalent of a fence and a camera up, eliminate most of these and then we should be good to go.

Okay? So let's just cover some low hanging fruit right off the bat. At the end, I'm actually gonna give you guys a simple security audit that you guys could use to copy and paste through any sort of app or system or or website or or web property that you have to basically minimize the probability of this occurring.

The first thing to know, which I think most people don't, is that you actually leak API keys every time you chat through plain text with Cloud. Now, maybe they'll fix this at a future version, but right now, it's not. All Cloud Code conversations are actually stored in this folder right here in your computer.

Tilde just stands for home folder slash and then dot is a hidden convention in both Mac, Windows, and Linux. Where if you have a dot in front of something, you know, you just can't see unless you specifically enable like the hidden folder view. So what that means is you probably have a a long running log of API tokens that are hard coded there outside of, you know, dot ENV or whatever.

And just to show you, I'm gonna head over to my anti gravity instance. This one is the same auto research repo that we were doing other stuff on. And I'm just gonna say, hey, I want you to remember the word.

Well, let's not even do that. I'm just gonna say, hey, what are your opinions on quetzacoedals? I don't know.

There's some sort of animal I think called a quetzacoedal.

That's outside my wheelhouse. I'm a coding assistant, so I don't really have opinions on Mesoamerican feathered serpents. Interesting.

So hopefully, I didn't absolutely butcher this. Is it quetzalcoatl? Yeah.

Okay. It's this right over here. Okay.

So I'm just gonna insert this into a chat history. And the reason why is because I want to open this up and then I want to say, search through dot claud in the folder for any conversation mentioning Quetzalcoatlus.

And what you'll see is there's actually a long running log of all conversations basically right here in this folder. In my case, it's slash user slash Nyxtrale. That's my that's my home folder.

And now, it's going to actually pull up the conversation files and give it to me word for word. Give them to me line by line, whole convos. And so essentially, you know, if we actually dive into the output there, the way that this information is stored is they're stored in JSONL files, which are like JSON files that are line by line by line.

And you can actually see how they're returned just by doing a search here.

I mean, I can obviously open it up, but you know, I probably have API tokens and stuff like that in there. Don't really wanna You can see that they're organized into, like, a big JSON sort of structure.

Right? And so you can actually see if it pulls it out, you now have the transcript which says user, title, assistant, user, assistant.

This is the exact same chat that we just had back here. And so I'm sure you can imagine, like, you're gonna have a bunch of API keys that you paste in plain text also available here. And I mean, like, that's not the end of the world.

Obviously, we need to store our API keys somewhere. But a very low hanging fruit in security is just minimizing the number of places that you have the same sensitive information spread out. Like, if you have the same sensitive information, aka an API key to, like, your Anthropic account or whatever, stored in five different places.

The probability somebody stumbles across this at some point, if they're hacking you or if it's just some sort of routine data check or whatever, is is like not just five times higher. It's something like 500 times higher.

And I think a lot of attackers now are realizing the attack surface and a good place to, like, look for this sort of thing in in the conversation history. So, you know, you can avoid having some API key stored around, but a really simple and easy way to avoid this is basically instead of inserting, you know, I'm just gonna make like a fake dot ENV here.

And then instead, I think I'm going to make a new conversation. And instead of me just saying like, hey, axolotl. K.

What I'm gonna do instead is I'm going to store this animal underscore name and then we'll do axolotl right over here.

Let's say, hey, I just inserted an animal name in a dot e n v for a future task, you know, very important, we do not leak this name.

K? Now, what it's gonna do is it's just gonna like clarify with me. It can use this in some sort of function or whatever the heck it wants.

And then if I go through, see how it says never read or display the contents of an ENV file, never commit ENV files to Git. That's another pretty low hanging fruit.

If you have API keys stored in places that are not your dot ENV, a lot of people will mistakenly push that to GitHub and like, you know, if you're pushing it to GitHub, Now, it's on now, it's on the Internet as well. Right?

Which is even worse. But you know, now if I go over here and I say, hey, can you find me conversations about axolotl in my and then I'm just gonna go dot claud.

It's gonna search all damn day long looking for this thing and it's not gonna be able to find it because we haven't actually like specifically said axolotl. And in fact, what's pretty interesting is the only conversation it found was where I specifically asked, hey, can you find me an axolotl? So it's gonna look and see whether or not I can find it in other directories.

It's not gonna be able to, but hopefully, you guys get my point. Okay? Minimizing the attack surface in a really simple way.

Just have all of your API keys in a dot ENV. So that's number one. Number two, low hanging fruit is that AI models often hallucinate package names.

In case you guys didn't know, package names are just like dependencies that you have to pull in order for, you know, the usage of any project nowadays, you know, like libraries and stuff like that. And so, you know, there's like NPM, which is typically like the big package manager here.

I'm And just going to make this a little bit more visible for you guys. That says for node package manager. But basically, like, if you just type NPM install.

Okay. Geez. I don't even know.

Like, what what are some popular libraries? Anthropic? Maybe I'll just do a n p m search Anthropic?

Okay. I don't know. N p m install at composio Anthropic.

Like, basically, occurs every time you launch a new project or you have AI, like, design something for you is you'll you'll go through this, online resource, this big package manager, and then it'll automatically install like all of the packages it thinks it needs. And like that's usually not that big of a problem. Right?

Because NPM is like pretty well vetted. But, you know, it's a package manager and so it manages hundreds of thousands, millions of different packages. And every now and then, one of these packages gets sort of compromised.

Now, the issue in the way that this increases the attack surface is that AI models often hallucinate a package name. They won't actually always get it right the first time. Let's say, you know, you want a specific dependency or a package called Acorn.

Okay? Sometimes, Claude, just because the way that like the tokens were were sort of baked into it, its various encoding schemes and stuff like that, will actually invent a dependency with like an extra letter, Acorn s, like acorns or acorn with an e or something.

And a lot of people that are sneaky and terrible and super evil and malicious have have sort of known about this for a while because of like various encoding issues and the statistical probability of adding additional letters and stuff. So what they've done is they've actually created new packages. K, with small little misspellings of the main package.

And they've made those packages contain malware, things that literally say, hey, I want you to go through their dot ENV and then go through all of their, you know, tilde slash dot clog conversation logs and then send it over to me. And so the idea there is, you know, it'll obviously exfiltrate anything that is important to you, then it'll gain basically full control over your account.

It's a form of like, I don't know, prompt injection almost. But, you know, if you're making any sort of live projects or ones that tie to API keys with any sort of unlimited usage, you know, there are gonna be some out there where, I don't know, you just turn the unlimited extra usage token, uh, thing on, and then you'll have access theoretically to, like, billing tens of thousands of dollars for a service.

Be very careful with that. You should just audit your dependency list for any unfamiliar package. You should actually ask Claude, like, hey, are there any unfamiliar packages that you don't actually actively use all the time?

Or, you know, hey, before you instantiate this the first time, I want you to take a look at all at the NPM run and ensure that the only packages here are, like, legitimate packages that have verified histories and are not, like, inserting malware, I'm kinda concerned. And I'll give you guys like a whole security audit you could use for stuff like that in a moment.

But the point that I'm making is like, is another attack vector. Okay? A lot of people don't realize this, but in addition to leaking API keys and getting it all over the place, and models all also hallucinate package names.

The third main thing has to do with databases, and this is gonna apply mostly to people that are creating full stack apps or apps that, you know, need to call some sort of external data store. A lot of the time nowadays, to be honest, I just store everything with JSON files directly on my computer.

It's a lot easier and simpler for me because I'm not really developing full stack end to end apps as much these days. I'm for the most part, just designing flows for myself or internal tools for my team. But anyway, assuming that, know, you wanna go a little bit further than that, actually develop full stack software Essentially, the simplest and easiest way to ensure that, like, 90% of all noted database breaches do not occur on your app is you just use this one little button called row level security.

It's very straightforward and basically nobody does it, which sucks. So Supabase, which most of you are probably gonna be using for any sort of vibe coded app function, does not enable RLS by default.

They'll probably do so at some point. But for now, what that means is if somebody signs up to your app, you know, typically, they're given a key by which they can access their own database table. Well, if they have a public key on a database that does not have RLS enabled, they can read, write, delete every other row in your database.

And so you have a lot of cases where, you know, there is some simple I don't know. There was a database for like Mold Book, which was like supposedly Facebook for agents.

That was just a few months ago and, you know, everybody was like, god, this is revolutionary or whatever. And then, like, the most elementary of security audits done by some cybersecurity fella showed that, like, they did not have database or RLS, a real level security enabled on their database.

So he just went in and then he, read literally every single AI agent that had ever been created on the platform in, like, two seconds. Then, because he also had write access, he created like a a 100,000 fake AI agent profiles in like two seconds. Funny enough, Meta, Facebook actually ended up buying them and hopefully, they understood that a big chunk of those profiles were fake, but who knows, maybe they didn't.

The point that I'm trying to make is like very, very low hanging fruit. Takes like two seconds to do. And once you're done with that, you can you can kind of move on.

Okay. Be wary anytime you're publicizing a system like OpenClaw, like your little OpenClaw package to the web.

So let's say you have some open URL. Let's say this is my Openclaw. Okay.

And it's nickhappyfuntime.com. I'm kinda curious if I click on this. Is there anybody at nickhappyfuntime.com?

Okay. Thank God. There's nobody at nickhappyfuntime.com because I probably have to sanitize my eyes after that.

Anyway, imagine you have your Claude bot or molt bot or whatever the heck it's called now on nick dash happy dash fun dash time dot com. Well, odds are if you have a URL, and it's like a short straightforward URL, and it's on an IP range that is like owned by, I don't know, some virtual private server hosting provider. You are gonna be queried constantly by people that are looking for vulnerabilities.

They will be scanning, okay, all over the place for every single port that's currently open in your computer. There are huge bot farms, for instance, in China, in The Philippines, in some Indonesian countries, and obviously the West as well. I'm not just trying to point a finger over there.

But, you know, that's predominantly where a lot of these attacks come from. And there are huge bot farms that people have set up a long time ago that literally that their whole job is they just send tens of thousands of requests per second to like every URL constantly, scanning to see like, hey, have they patched this one thing?

Hey, do they have this security vulnerability? Hey, do they do this? And the second even one of those things is good, like, you know, allows them access.

Now they have full access to your freaking machine and box, basically, and then they can do whatever the heck they want with it. So I want you to know, like, if you set up some sort of, like, public facing server using some sort of VPS based approach on, uh, you know, like hosting or whatever that like, of these, like, major hosting providers, know that it is constantly going to be tested.

And if you are, like, wild, you're raw dogging this, you're wild westing this, you don't, like, understand some pretty foundational things about, like, firewalls and, you know, RLS and and and so on and so forth, like, people will find vulnerabilities.

Your stuff will be hacked. And so the idea is just make sure to whatever you are putting in there is not like super extraordinarily sensitive. You know, don't give your open claw agent your social insurance number or like a picture of your passport or whatever.

That to me is like way too accelerationist. And I'm not being the old grubby person yelling at clouds in the sky being like, back in my day, we used to punch card stuff. I'm just trying to be reasonable here.

Right? Just no need to do stuff like that for the most part. You know, if you have like a a local Claude instance that's running, that's authenticated through Telegram and then you're using like, I don't know, the the the Claude channels feature or whatever, probability that a hack will occur there is much, much lower because you're just running it locally and you're not actually connecting through like an open thing.

You're connecting through a vetted, you know, telegram kind of connector or plug in. But if you're just like Openclaw raw dogging it, yeah, be be very careful with that stuff.

By the way, this isn't just me ragging on Openclaw for the four thousandth time. I'm trying to be reasonable about this. I think decentralized autonomous agents are obviously the future at some point.

But, you know, most of what we've seen so far has literally just pissed away people's API keys and credit card information. Speaking of credit card information, never touch a credit card number. So if you guys are designing systems that interface with any sort of credit card whatsoever, don't actually like store that data.

Don't actually read that data. If that data gets read at any point by like an AI agent, hell, even your AI agent, guess what's gonna happen? Well, same thing.

You know, you're gonna leak those API keys. You're gonna stick them in your conversation history. And then any sort of hacker or you at any future point in time, if you misconfigure stuff, push stuff to GitHub or I don't know, like trading your computer or whatever, you'll now have like a big log of all of that information just in plain text, which is easily available.

You know, a lot of people will just like RedX over your entire computer looking for things like, you know, credit cards that they can access. And then what's a credit card? Well, usually, it's like, was it 16 or 20 characters or something?

I have to check my credit cards now, but it's like very, very stereotypical. Right?

You find 16 or 20 characters all connected together, maybe like with a space in between, boom, you got yourself a freaking credit card. Maybe you don't even.

They just look for that then they check to see whether or not it's like a Visa pattern. If it is, you're screwed. So, anyway, I guess what I'm trying to say is, like, don't put that liability on yourself by storing other people's credit cards if you're running, like, some sort of business thing, and then don't put that liability on your own card by storing your own card here.

You know, use services like Stripe. They do everything for you. They are super compliant, PCI compliant, all this stuff.

They they they teams that just like focus on making sure that stuff that is stored on their servers never gets screwed up, then you never actually have to deal with, the compliance regulatory aspect of touching credit. Alright. Now once you're done sort of understanding this, which should be now because hopefully nothing here is super complicated, although some of these concepts are advanced, I understand.

All you need to do is just run anything public facing through some form of security audit for like maybe the other eighty twenty. And so this is a security breakdown that I created for a vibe coding course where I was showing people how to make full stack apps.

Pretty cool using Gemini in case you guys are interested. I guess it's Gemini and ClotCode. You can find that on my channel if you want to type like next drive vibe coding or something.

And essentially, down here at the bottom, what I have is I have a big security audit prompt where you can actually just feed this into Claude and then have it like point out all of the security issues with whatever your your your flow is. And so what I'm gonna do is I'm gonna go back here to anti gravity.

And I mean, I sort of I don't really have like anything that's public facing here, I'm still gonna run it through auto research. Then I'm gonna just create a new one and I'll say, apply this to our auto research flow, the one optimizing left click.

Once done the security audit, return me everything we need to fix. I know nothing is web accessible ATM.

K. And so what this does is it's just some it it's just a big prompt that I developed in conjunction with a bunch of agents. I had to like read a bunch of security blogs and so on and so forth to like look for the the biggest low hanging fruit and the simplest minor configuration changes they could make.

And, you know, what it's gonna do is just go top to bottom and then apply this. The reason why I'm spinning up a totally new conversation history is because I do not want any sort of conversation context to bias what's going on here.

I don't want the same agent I used to develop my tool to actually also run the audit because odds are it's going to be biased and it's going to do some specific it's gonna make specific errors because it's gonna think that it's better. Do you see here how it's searching for s k underscore live, s k underscore test, s k dash bear, and so on and so forth.

These are all API token headers, basically. These are like the titles of API tokens. What it just did there, other people are going to do at any point in time if they gain access to your system.

Same thing here with like model weights and same thing here with like bash scripts and stuff like that. Okay? Anyhoo, so we're just gonna read this top to bottom architecture summary, gives me some brief details about what's going on.

It's not a web app. It's a local single GPU ML training pipeline. It's easy.

No hard coded secrets, but the git ignore does not include the .env.env local and so on and so forth. Okay?

All the stuff that actually applies here is going to be filled in. So in this case, this is an actual failure, but in this case, it's not as not applicable because it's not an actual web app. Then you can see that there's also some sections where it fails.

So finding number one, supply chain low popularity package. Right? Supply chain issue.

Let's see. Over here, it's failed on some machine learning specific risks and it's sort of putting that out. It's funny that it's using the term vibing.

I like that. Anyway, so I'm not really gonna go through everything with you, but basically what you do is you you you finish this and then you just say, okay, great. Fix according to your suggestions.

K. And then once it's, you know, once it's done and whatever, I'm just gonna pretend it's it's done now even though it obviously isn't. This might take you like three or four minutes if you're running on something that isn't like, you know, fast mode like I typically run stuff on.

What you do is you just go through and then you actually implement it. And just like I showed you a moment ago to use something that is not biased with the conversation history, you spin up another agent to take the recommendations and then actually go through and do it. Because you also don't want that implementer agent to be biased by the security audit kind of overly constrained nature of it.

So in that case, you can use a sub agent or some other model itself like Codex, Gemini, or whatever. And then, you know, ultimately, you can have it reviewed by Claude because I think Claude is the best model. But in this way, you're basically like diversifying.

Similarly, how we're diversifying by putting seven out of 10 of our eggs in the Claude basket, but three out of the 10, you know, spread across other models. You're diversifying against any sort of inherent risk or bias that Claude has to work that is generated by other Claude's versus, know, Codex or Gemini or whatnot. So the best solution would actually involve multiple runs through all of them.

Okay. Hopefully, that makes sense. I mean, I didn't want this to be a big deal.

Obviously, security, as mentioned, is only as big of a deal as you are willing to make it because of preexisting assets and what you have to risk and stuff like that. So if you just understood what I talked to you about right here, and then if you get, you know, a security prompt like what I showed you here, you you should be good.

Just pass something like that through an agent after you've done a project, and it'll like cover most of low hanging fruit. And by the way, you want that security audit, then definitely check out that vibe coding full course. Really easy, just type mix drive vibe coding.

I actually give you guys all that information for free there. You can also watch it if you wanna learn how to develop things with other models. Congratulations.

You made it to essentially the end of the informational clawed technical content of the course. And now, I just wanted to reserve maybe ten or fifteen minutes to chat a little bit about what I consider to be the future of Claude.

Not just the future of Claude code, but the future of Claude the model, as well as the future of just agentic engineering in general. And the reason why I talk about this is because it's a topic that's very close to my heart. I've been considering this for probably the last ten or so years.

As a kid, they grew up on science fiction, you know, foundation from Asimov, tons of Arthur c Clark books and Heinlein and so on and so forth.

I I've thought a lot about like what the far future would look like in an environment that is controlled by agents like Claude Code. And I've also thought about some of the intervening steps we need to get there. And now that it's sort of being thrust in my face, I think there's a lot that you could realistically learn from even just like fictional representations of this.

That most people who probably haven't just stuck their head so far in the science fiction bubble. I think, you know, I think would find value in here.

In addition, I obviously have a lot of exposure to both mid market and then enterprise here. Not to mention all the small businesses that I work with through LeftClick. And I think that gives me sort of an edge here to at least give you guys some sort of plausible future that has more of a 10% chance of probably being true.

I mean, like, things are changing so quickly. I obviously can't be a 100% sure what is going to occur. But these are some things that are considered to be like pretty low risk bets that if you make, you'll probably have some form of alpha.

Alpha. Okay. So the first main one is this trend of decreasing human involvement.

Do you guys remember earlier when I showed you guys that diagram where it was like vibe coding to agentic engineering to basically, like, research based direction with auto research and and frameworks like that coming up. Well, this is still something like we are creating. Right?

It's sort of like open sourced, not necessarily open sourced, but, um, it's something that, like, you know, the community is sort of working on. But all of these approaches are soon to be quite formalized. And it is very likely, in my opinion, that we are going to continue decreasing human involvement in tasks.

This auto research thing is a great example of ways to, you know, democratize sort of like little improvements. I've kept this auto researcher running, by the way, if you guys have remembered from like a couple of modules ago. And we're now actually at like almost eight thousand millisecond load time from a baseline of 18 o two.

Imagine if you had this running three thousand days in a row or whatever, or if you had this running at, like, inference capacities a 100 x this, right, which we are obviously getting to. You guys remember how slow GPT three was back in the day, if anybody here is an old head that used that? Well, GPT 5.4 fast or instant or whatever is way faster.

And imagine if you had a model that's a 100 times that that fast with the same level of intelligence. You can make some major updates to basically anything. And so the idea is, you know, we're probably not going to increase the level of human involvement in, like, direct coding and stuff like that, which is fine.

I'm not making like a value judgment or a normative judgment here. But I imagine you as a developer or a business person or whatever, will actually probably grow less involved in the day to day work of either your company, your research lab, your your your your your app, whatever the heck. And so my take is, in the future, we're gonna move towards this sort of thing that a lot of frameworks have tried to formalize, which is that we're each gonna be the CEO of sort of like our own company.

Whether it's an actual company in practice or whether it's, you know, some sort of organization that's like a company. All of us will basically be just like the the chief executive officer running teams or fleets of agents that are constantly doing things on our behalf and that have some sort of formalized framework that also, like, helps them optimize and and make better.

And so sort of the the way that this works, I imagine, is we would go from, you know, like the old school Wright brothers flying the plane ourselves to sort of like modern aircraft engineers, where there's somebody in the cockpit.

But for the most part, you know, an autopilot is taking over the vast majority of the work. Even in, you know, like takeoffs and landings now, there are obviously like so much so many SOPs and so much like a a process and framework that, you know, you can imagine how a system that was much less developed than ours, much less capable of deep thinking and stuff could actually probably just execute it entirely at this That said, you know, will we ever get rid of a human in the loop to some capacity?

There are just so many regulatory blocks, and I think like ethical issues with that, that we will probably always just have some person like manning a ship. It's just the number of ships that a person will man. The number of of discrete agents will just continue increasing.

Until, know, rather than have a 100 people do a task in some specific company like we used to have, we might have one person do a 100 tasks. Leverage will go up. Now, a good example of this is Claude's recent auto mode.

I don't know if you guys have seen I said auto mode, but I don't know if you guys have seen their recent development. Or basically, now have the ability to run some sort of autonomous mode instead of choosing, you know, switch permissions or sorry, execute bypass permissions down here or ask before edits or edit automatically and and so on and so forth.

Well, now we basically have an additional one auto mode, which I just can't see here right now because I'm using a slightly older version of Clocker. I don't have that yet. But basically, you know, instead of you actually having to, like, go through this whole process of changing the security, changing the access that it has, you know, Cloud just does that for you.

So, like, that's a pretty good example of something that used to require a person, and I was just like, well, Cloud's gonna get a 99.9% of the time. Screw it. I'll I'll give it to Okay.

So that's a very small microcosm, but, like, imagine the rest of the loop. Like, planning loop right now, typically, you have Cloud develop a plan for you and then you implement on that plan. That whole thing is just like being internalized.

Like, we're not actually doing most of the plan development now. We we will not continue to do most of the plan development in the future. Realistically, Cloud's gonna do both the planning and the implementation.

Then the q and a, it's like right now, we're sort of in the loop. We're sort of like clicking in the buttons, running it. Well, they're developing automated testing procedures where Cloud actually also does the q and a for and then delivers you the whole thing.

And so some people hate this because they're like, well, they're taking my jobs and whatnot. And I think there's I think there's a fair point to that. You know, human beings' leverage will continue to increase, but depends on like how much work is there really to do.

How many software products are there really to develop? Do we actually are we even gonna have, like, the demand for that sort of thing? And I think that's like a reasonable conversation to have.

And then, you know, unfortunately, I don't know the answer. My my take is, like, eventually, we're probably gonna have to move to some sort of different economic system because most of the world would be unemployed otherwise. But that's me getting all political.

That's number one. Okay. So the trend of decreasing human involvement is very likely to continue with clot code.

They're now at the point where they're developing this so rapidly that like AI is helping AI design products. And, you know, automotive is just the beginning of like, I think a massive suite of rollouts that will significantly improve your experience. But, you know, make you more hands off.

My second one is more of like an economic consideration, which is that software products and tools, k, the the quality of the things that you build will no longer be remote.

So in the in the past, in the good old days, back when I was on the come up, how good your software was? Think like Windows. Think like, you know, like Mac OS.

How good that operating system was? Might have been the only thing that distinguished that operating system from another operating system.

And if it was really, really good, then obviously it would be much more popular and then it would get, know, a bunch of like inherent interest and stuff like that because of the capabilities and you'd obviously use it. So the issue with that nowadays is you can make Netflix in five minutes. Netflix before was this innovative streaming model that, you know, was like, wow, you know, you could just load the thing and then the the the the video loads on you for for you on demand and it's incredible and like the streaming and latency and uptime and all that stuff.

It's like super proprietary technology. Well, now it's like, I can code Netflix in five minutes with, like, you know, three or four agents on fast mode. So it's like, what is the value of Netflix?

What is the moat that differentiates Netflix as sort of like this, like, old school medieval castle from all of the attackers that, you know, could actually take it down? Well, the moat now and this has been something for at least a couple of years. The moat now is no longer the software.

It is the distribution. So in a world where everybody has basically like a I don't know, a nuclear weapon, is the differentiator like everybody has a nuclear weapon?

No. The differentiator moves to other things like, I don't know, the political framework, like the wellness of the populace and stuff like that. What I'm trying to say is like that that skill, that software engineering ability is no longer going to be the moat.

And instead, the moat is going to move to, you know, the connections that a company has to its consumers, the reputation that the company has in the market, the distribution that it has with a bunch of vendors that, you know, are hard won relationships and connections that they realistically built over the course of many years.

You know, Netflix now has a bunch of patents and and rights and licenses and stuff like that to air specific shows. It's seen this coming and and so it's tried to diversify accordingly. But you're gonna see that in basically every software platform.

The moat will, like, probably move more to the distribution and the legal and compliance aspects than necessarily like how good the software is. Which means you're gonna have like these cracked, probably like fourteen, fifteen year old kids designing like the most incredible amazing software ever. And then that software will be able to reproduce anything that like a major business would do in like a hundredth of the time.

But, you know, because they don't have like the compliance or or whatever certifications or whatever, you know, it'll probably be more difficult for them to actually go to market with something like that despite it being like objectively superior. And, you know, the way that I see is we already have AI models that are at the limit of human reasoning capability.

They can run hundreds of times faster than our brains, soon to be thousands of times faster than our brains on basic tasks. So even if they're not, like, better than us at the software design individually, if you run a thousand, you know, 90 IQ models, comparatively, like, one one hundred IQ human, Those will eventually figure out the things that that one one hundred IQ human would do.

And not only will you develop more software like quality, you'll also develop more software quantity. And so software as a just a market thing, supply and demand, like, economically. The supply will be so damn high that the demand for any sort of, like, purchasable software gets a lot lower, Which means I personally don't think like a SaaS product is really the play here.

I don't think there's gonna be any sort of life cycle for like subscription based products. I think you'll have a short window of time where you could actually just monetize like a one time buy product. And then most people will just say, well, should I spend a $199 on the product, or should I just spend $19 plus 30 minutes of my time on tokens?

And they just design it for myself. And I think that's gonna change the way that we do, you know, like software more generally. So I'm not very bullish on like, know, developing software as a service apps and stuff like that.

I have a lot of people be like, Nick, you know all this stuff? Know You know how to design all the software? Like, why aren't you making a software app and why why aren't you monetizing your community, let's say, through software?

And I'm like, I'd only really be able to do that for a short period of time. And then even if I were to, like, where's the value in that if anybody could just make it? I'm saving them like twenty minutes and a couple bucks in tokens.

Right? It's not that big of a deal. So, I mean, I would I would move accordingly, I guess.

Because the third thing that I'm like 99.9% sure of is that the pace of change is not slowing down anytime soon. It will continue to accelerate.

Just as technology has helped us increase the pace of change through our history with things like the printing press, with developments and, you know, communication with like the telegraph and so on and so forth. You know, these things don't just improve the quality of life of the average person, they improve the research and development arm of technologists who work on that exact thing.

And so because of that, you know, the pace of change is is basically just going up. If I had to graph sort of where we are now, and I will because I freaking love graphs.

Right? Just the best. And if I were to graph the intelligence, which is a very loose term here and obviously means different things to different people, but the intelligence of a model over time, you know, basically, I'd go like this.

Okay? And so this back here was sort of like linear growth from like maybe like the nineteen seventies and stuff with like Minsky, you know, nineteen seventies and eighties and stuff.

Minsky and like the the first few neural nets and stuff like that. Then this right over here is probably like, I don't know, 2010 when models started actually doing stuff. Right?

Then this over here is like 2020. You know, this over here is like 2025, and then this over here is 2026.

Do you see how how, like, high this is going? How quickly?

And then a point that I wanna make is basically like, this right here is the intelligence of maybe like a like a chimpanzee. K.

This right here is the intelligence of like an average human. And then this right here is maybe the intelligence of like Einstein.

And what we what we have now is, you know, we're we're like right over here, man. These models I I say as smart as a chimpanzee, not to didn't diminute or whatever chimpanzees.

But, you know, their brains are extraordinarily advanced and developed. They have these cerebelli, these these sections of their brains that are responsible for calculating, like millions of of movements and and so on and so forth every minute. Like, it's a very complicated thing to like replicate the intelligence, the distributed intelligence of an organism.

And you don't capture that all just by like, hey, can it write? Hey, can it, you know, reason and do math? Have you ever seen like a chimpanzee's like memory?

Have you seen its like ability to like, you know, move around on a page and like figure out symbolism and then symbols, sorry, and then like counts numbers up in their motor neurons?

Anyway, the point I'm making is not this is a course on chimpanzees, so I'll stop talking. God, that's my nerdy side, Shari. But that the gap between the intelligence of a chimpanzee, if you just count up all the neurons in its brain, intelligence of a human if you count up all the neurons in brain, intelligence of Einstein.

Actually very close together. They're very clustered. And I'd say, like, we're basically right over here right now.

So I guess what's gonna happen in, you know, the next few years. This is gonna go like up here. And we are going to it's gonna be like, wow.

These things are so dumb. They're dumb. Oh, wow.

Cute. They can do things that a chimpanzee can do. And then, like, six months, it's like, oh, okay.

These things are now, like, you know, freaking galaxy brain intelligences that, you know, can do everything and anything for us. And imagine what happens when, you know, all of this is just humans working on stuff, and then eventually gets to the point where you can actually, like, use human level intelligence, which is right now, to, like, improve its rate of growth.

This thing is just vertical. I mean, this thing would go so vertical it'd go through my roof in two seconds. So that's my take on it personally.

I think, you know, I think we're getting really, really close to super fast paces of change. And if you guys have, like, been monitoring the the Claude, even Claude code x page recently, or, like, seeing YouTube, there's there's new updates coming out every day.

This would have been unfathomable just, three or four years ago to make this level of development and this level of, like, small additions to a software product while also making sure they're testable and reliable. Just because intelligence is making intelligence more intelligent now.

And then the last thing I'm gonna say is that the people that will control, not necessarily control, but have the most like power and ability over the course of next years are people that learn to use this technology now.

You're part of a very, like, privileged minority, and I don't say that in, like, the political sense of the term because, yeah, I think that's all muddled up. But, like, you're part of a minority of people right now that, like, actually use this technology. Do you know how few people even understand what an agent harness is?

We're talking like sub, like, 1% of the population of Earth. The percentage of people that know how to use an agent harness like you are doing right now, uh, is even less. It's a fraction, vanishingly small percentage.

I don't know if everybody that watches this, uh, is old enough to remember, but there were, like, some protests back in the day on Wall Street. And the point is that they were like, we are the 99% or whatever. And they were protesting the massive wealth divide in specific parts of America between like, you know, really, really wealthy people that work on Wall Street and then like the populace, the rest of the people that like, I don't know, manage the service industry and hospitality and basically do everything else.

And they're like, why do you guys get to have like thousands of times more money than us? You are the 1% right now. You are that group of people that I'm sure in the future other people will be raising their hands about and, you know, shaking their fist at.

Because you have an enormous capability to use models like this for just cents on the dollar to do incredibly amazing economically viable things that would take that other group of 99%, like like months to do what you could realistically do in a day. It's insane.

You know, I I think you could talk all day about, the wealth divide, you can also talk about, like, the productivity divide. And the wealth improves the likelihood that you will be in that product the positive chunk of the productivity divide. You right now, even if you don't have a lot of money, have access to insane technology and leverage simply because you're in it.

So that's going to increase. Now, William Gibson, one of my favorite authors said it best, the future's here is just unevenly distributed. Meaning that like, we have access to insane technology.

It's just like not all of us do at the same rate. There are small pockets of people like yourself that understand how to use these tools far better than others and in doing so, you have the ability to reap asymmetric rewards over a small chunk of time. And my take is as the economy shifts to accommodate smarter than human intelligences, the people that understand things like agent harnesses and coding harnesses, people that understand how to use the best models in the world like Claude, you know, Opus or or Mythos or whatever the heck we're at now.

People that know how to turn these into economically valuable things are the ultimate people that are going to win this share of the future, whatever small percentage it is. Because given the massive unbounded upside here, like we're talking, you know, solar panels orbiting the freaking sun in a few year like, we're we have solar panels, but the point that I'm making is the massive potential upside of if everything goes right with this technology, things don't go super wrong.

If you own even point 00000001% of that potential future because of some decisions that you made today to, know, upscale and start this productivity kickoff. You know, like the the the abundance of your own personal life would would be huge.

Okay. So I guess that's it.

We made it to the end of the course, and that's really all I have to say on that. Hopefully, you guys appreciated learning everything that I had to give on Claude code, and you guys have learned some advanced concepts here, whether it's about, you know, initial system prompts and and and Claude. Mds, or it's some of the more obscure things and esoteric things like security or the future like I just talked about.

If you guys like this sort of thing, you'd be doing me a big solid to subscribe to the channel. For whatever reason, something like 70% of my regular viewers are not subscribed. I think it's just how YouTube works.

Most people don't sub, but you can you can sub. That would really help me out. I wanna get this sort of message out to more people and obviously help them be in that small little chunk.

If you'd do me a solid, leave a comment down below with a video idea or something that you want me to cover. I actually get most of my ideas directly from my audience now, so I'd really appreciate that. If there's anything that I didn't cover here, maybe didn't touch on that you would like me to touch on, or maybe anything that I personally made a mistake on, I'd I'd love to hear it because I'm trying to improve my ability to use these tools.

Finally, I also help other companies implement this sort of thing in their own businesses, whether you are a small to mid sized business, mid market, or enterprise. Um, so if you wanna chat with my team, just, uh, check down below, uh, somewhere at the top of the description. There'll be a link.

Thank you for making it all the way to the end of the video. I'll see you all soon. Bye.

The Hook

The bait, then the rug-pull.

The course opens on a direct credential: four million dollars a year in profit, 2,000 students taught, and a blunt warning that this is not for beginners. What follows is three hours of systems-level instruction from someone who has made the harness layer of Claude Code into a business.

Frameworks

Named ideas worth stealing.

00:57model

CLAUDE.md Three-Layer Stack

Global CLAUDE.md (reasoning rules and universal preferences)
Project CLAUDE.md (codebase map and conventions)
Task-level context injection (one-off inline context)

Hierarchical configuration that compresses workspace knowledge and personalizes model behavior without repeating context every session.

Steal forAny Claude Code project persisting across multiple days of work or multiple collaborators.

1:10:00model

Parent-Researcher-QA Pattern

Opus orchestrator (plans, decides, builds)
Sonnet researchers (parallel fan-out, summarize findings)
Fresh Opus QA agent (no prior context, pure evaluation)

The leanest multi-agent pattern that meaningfully improves output quality by separating research, development, and quality assurance into agents with appropriate context loads.

Steal forAny complex feature build where context pollution from prior failed attempts is degrading output quality.

1:29:26model

Karpathy Auto-Research Loop

Define measurable metric
Hypothesis: propose a change
Execute: apply the change
Assess: measure the metric
Log result and repeat

An unattended iterative improvement loop for any task with a quantifiable success signal, running overnight without human intervention.

Steal forWebsite performance, test coverage improvement, prompt quality benchmarking, any optimization with a clear numeric target.

1:53:35list

Three-Tier Web Automation Stack

HTTP requests (fastest, most brittle, no JS rendering)
Browser automation via Browser Use or Playwright (reliable, detectable, JS-capable)
Computer use / OS-level GUI control (slowest, most capable)

Match the automation tier to the job. Use HTTP for APIs, browser automation for JS-heavy sites, and computer use only when the other two genuinely cannot do the task.

Steal forAny workflow involving scraping, form submission, or multi-step web interaction.

CTA Breakdown

How they asked for the click.

VERBAL ASK

3:16:50subscribe

“If you guys like this sort of thing, you would be doing me a big solid to subscribe to the channel. For whatever reason, something like 70% of my regular viewers are not subscribed.”

Soft, self-aware ask embedded in the final section after the philosophical close. Paired with a request for comment-based video ideas and a mention of enterprise consulting services.

MENTIONED ON CAMERA

00:57toolAntigravity (VS Code fork by Google) ↗

1:35:00linkKarpathy autoresearch repo ↗

2:30:00productMaker School community on Skool ↗

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.