Big Idea

The argument in one line.

Running AI models locally isn't just a backup plan — it's the foundation of owning your stack, and the same tools that let you self-host a model let you rebuild any SaaS product for free.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use Claude, ChatGPT, or NotebookLM daily and felt exposed when Fable 5 was pulled.
You want to run AI models on your own hardware — even an old laptop — without paying per token.
You're spending more than $200/month on AI subscriptions and want to understand whether local hardware makes financial sense.
You use an agentic coding setup and want to route cheap or private tasks away from frontier models.

SKIP IF…

You need frontier-level reasoning today — local models are honestly 6-12 months behind and the video says so clearly.
You have no interest in the command line or GitHub — the setup requires basic terminal familiarity.

TL;DR

The full version, fast.

Cloud model access can be revoked at any time, and local AI is the only ownership guarantee. The playbook is three steps: check your hardware RAM to know which model size fits, install Ollama and download a model (Qwen 3 for coding, Gemma 4 for general use, DeepSeek for tool calling), then wire it into your workflow via a decision engine that sends private or cheap tasks local, bulk tasks to cheap API models like DeepSeek at 1% of frontier cost, and only the hardest reasoning to frontier models. The open-source ecosystem means any SaaS tool — including NotebookLM — can be cloned from GitHub and run locally for free.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:47

01 · Fable 5 Just Got Banned

Hook: urgency framing + double promise — local models + local software alternatives.

00:47 – 02:00

02 · You Do Not Own The Models

Historical examples of model revocation: GPT-4o killed Feb 2026, Anthropic cut Windsurf, region lockouts. You can be shut down at any time.

02:00 – 02:54

03 · Why Local Is 100% Private

Four local benefits: private, works offline, $0 tokens, runs forever. Your data never leaves your machine.

02:54 – 03:30

04 · How Good Local Really Is

Honest assessment: local is ~6-12 months behind frontier. RTX 5090 runs last year's model at 70-85% quality for $0/token.

03:30 – 04:04

05 · Own The Platform Not The Model

The bigger insight: own the entire SaaS stack, not just the model. Micro-SaaS era — if you can imagine it, you can build it.

04:04 – 06:00

06 · Step One: Check Your Capacity

Screenshot Mac specs → ask Claude to recommend model. RAM table: 8GB → Qwen3 8B, 16GB → Gemma 4, 24GB+ → RTX-grade models. $200/month AI spend = 2-3 year ROI on local rig.

06:00 – 07:56

07 · Installing Ollama

Download from ollama.com, run terminal command to install. Course pitch mid-chapter. Terminal explained as 'a chat window with your computer.'

07:56 – 09:57

08 · Installing GPT OSS via Claude

Ask Claude to generate the install command, run it in terminal. Demo: GPT OSS 20B running locally in Ollama app, answering questions with no internet.

09:57 – 10:57

09 · Connecting Local To Hermes Agent

/model command in Hermes agent → custom models list → select local model. Demonstrates talking to local model through the agentic OS.

10:57 – 12:23

10 · The Best Local Models Right Now

Gemma 4 (vision, 16GB max, good all-rounder), Qwen 3 (best agentic coding), GPT OSS (best small reasoner), DeepSeek (best tool calling). Model intelligence dashboard shown.

12:23 – 14:37

11 · The Decision Engine For Routing

'ONE BRAIN. EVERY MODEL.' Four routing buckets: local (private), cheap API at 1% cost/95% quality, long context, hard reasoning. Routing intelligence prompt offered in description.

14:37 – 15:29

12 · Tagging Models With OpenRouter

OpenRouter key connects agent to any model dynamically. Demo: live model rankings, switching mid-conversation, creating a 'deep reasoning agent' skill with a specific model.

15:29 – 16:22

13 · Verify Work With Codex And Gemini

Never release significant work without multi-model verification: Claude + Codex + Gemini all review the same output. Codex often catches what Claude misses.

16:22 – 17:50

14 · Building Your Own Micro Software

Community member rebuilt a paid SaaS tool with Claude in hours. Open-source contributor model (59 contributors on Open Notebook) means tools improve faster than paid SaaS.

17:50 – 19:01

15 · Cloning Open Notebook From GitHub

Clone the open-notebook repo via Claude: 'clone this repo and open it on localhost.' GitHub explained as 'fancy file storage.' Open source = all files freely available.

19:01 – 21:32

16 · Running A NotebookLM Clone Locally

Open Notebook at localhost:3000. Adds Glaido.com as a source, creates a notebook, switches model to local Ollama model, queries it. All running 100% locally.

21:32 – 22:31

17 · When To Stay On The Frontier

Honest close: local isn't the answer for everything. $20/month gets 90% of results. Use local for the right tasks — high volume, private, offline. In a year local = today's frontier. CTA to next video.

Atomic Insights

Lines worth screenshotting.

GPT-4o was killed off in February 2026, Claude's Windsurf access was cut in 2025, and region lockouts happen constantly — you have no lease on cloud models.
Local models are roughly six to twelve months behind the frontier, meaning next year's local model will match today's best cloud model.
An RTX 5090 runs last year's frontier model at 70-85% quality for $0 per token — the gap only bites on the hardest reasoning.
If you spend over $200 a month on AI, a local rig pays for itself in two to three years and then runs at about $3 per month.
DeepSeek v4 delivers roughly 95% of frontier model quality at about 1% of the price — the strongest cheap-API option available right now.
Ollama is the runtime layer: download it, pull a model with one terminal command, and it runs entirely offline with no rate limits and no data leaving your machine.
Qwen 3 is the best all-round local model for agentic coding; Gemma 4 is the best vision-capable all-rounder; GPT OSS 20B is the best small reasoner.
OpenRouter gives a single API key that connects to every major model, so your agent can route tasks dynamically without you managing multiple subscriptions.
The decision engine has four buckets: local for private/sensitive, cheap API for bulk, long-context window for large documents, hard reasoning for frontier.
Open Notebook is a GitHub-hosted NotebookLM clone that runs on localhost, accepts any model backend, and costs nothing — same feature set, zero SaaS dependency.
The open-source contributor model means tools like Open Notebook improve faster than paid alternatives, for the same reason Wikipedia defeated paid encyclopedias.
Verifying important outputs across Claude, Codex, and Gemini simultaneously catches errors that any single model misses — a multi-model review is now a standard practice.
Free local isn't always cheaper in practice: model performance lags, so the real value of local is privacy and offline resilience, not replacing frontier work.
The micro-SaaS era means any subscription you pay for today can be rebuilt with AI in hours — the question is whether the time cost beats the subscription cost.

Takeaway

Build the routing layer before you need it.

WHAT TO LEARN

The risk isn't which model wins — it's that any model can disappear overnight, and the builders who planned for that already have local infrastructure running.

Cloud model access is not a lease — GPT-4o, Claude's Windsurf integration, and regional services have all been cut without warning, so resilience requires owning at least one local fallback.
The three-step local setup is more accessible than it looks: check your RAM, download Ollama, run one terminal command — even an old laptop with 8GB can run a useful model.
Hardware spend makes financial sense above $200/month in AI subscriptions — a local rig pays for itself in two to three years and the ongoing cost drops to roughly $3/month.
Local models are genuinely 6-12 months behind the frontier, so the decision isn't local vs. cloud — it's knowing which tasks each tier handles best and routing accordingly.
A four-bucket decision engine (local for private, cheap API for volume, long-context model for big documents, frontier for hard reasoning) captures most of the cost savings without sacrificing output quality.
DeepSeek v4 at roughly 1% of frontier cost and 95% of quality is the strongest cheap-API option right now for tasks that don't require privacy or offline operation.
Open-source equivalents now exist for most SaaS tools, including NotebookLM — and they run locally, accept any model backend, and improve faster because contributors outnumber any paid team.
Verifying important outputs across two or three models (Claude + Codex + Gemini) catches errors that any single model misses — multi-model review is a low-cost quality gate worth building into any serious workflow.

Glossary

Terms worth knowing.

Ollama: A local runtime that downloads open-source AI models and runs them entirely on your own hardware, with no internet connection required after the initial download.
Local model: An AI model downloaded to and executed on your own computer, as opposed to accessed via a cloud API. Once downloaded it cannot be revoked, region-locked, or retired.
Decision engine: A routing layer that directs each AI query to the most cost-effective model capable of handling it — local for private tasks, cheap API for volume, frontier for hard reasoning.
OpenRouter: A unified API gateway that provides a single key connecting to virtually every major AI model, allowing dynamic model switching without managing separate provider accounts.
Open Notebook: An open-source clone of Google's NotebookLM that runs locally via GitHub, accepts any model backend including local Ollama models, and has no subscription or usage limits.
Hermes agent: The creator's proprietary agentic operating system that orchestrates multiple models and skills via a chat interface, shown being connected to local models in the video.
Micro-SaaS: Small, purpose-built software applications that individuals can now build themselves with AI assistance in hours, replacing recurring SaaS subscriptions with owned tools.
Model agnostic building: A development philosophy where no single AI provider is treated as permanent — instead, each task is routed to whichever model is currently best for that job.

Resources

Things they pointed at.

06:00toolOllama ↗

14:37toolOpenRouter ↗

17:50toolOpen Notebook (GitHub) ↗

20:00productGlaido ↗

06:25productClaude Code Full Course (Hermes) ↗

Quotables

Lines you could clip.

03:30

“You don't just wanna own the model. You want to own the platform.”

One-sentence thesis that reframes the whole local AI conversation→ IG reel cold open↗ Tweet quote

22:05

“Cheap AI — you can get 95% of the performance quality of the top tier models in the world using the latest DeepSeek v4, for example, for roughly 1% of the price.”

Specific numbers, counterintuitive claim, standalone→ TikTok hook↗ Tweet quote

02:50

“Your model you downloaded can't be retired. It cannot be revoked. It's not region locked to yours.”

Three clean parallel negatives — quotable for the ownership argument→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogy

Claude Fable five just got banned with no warning or notice. The best AI model on the planet can be taken away from you at any time. And so there are two things that you need to do right now and barely anybody is talking about the second.

And so in this video, I'm gonna show you exactly how to run the world's most powerful models on your computer that can never be taken away from you even if you're a complete beginner. And how to run world class software like Notebook Lamp completely locally so you can save thousands of dollars and hours of time.

And if you're new, I'm Jack. I built and sold my last tech startup with like a gazillion customers, and now I bought my own AI startups and share here the stuff that actually works. If you haven't already, grab that beautiful coffee, and let's dive straight in.

The US government has asked Anthropic to basically shut down their most powerful model. We had it for seventy two hours, and yet we all feel exactly like this guy. People can't sleep.

Relationships are falling apart before our eyes because we don't have access to the best model. So it'll be a shock for many, but you do not own the most powerful models in in the world.

And in this video, I'm gonna show you some unlocks local. We're gonna cover exactly how to set it up. And at the end of the video, how you get alternative to some of those incredible softwares on the planet.

You have to learn this skill. It is never more important than it is right this second. So here's a key thing.

Just to reemphasize the point, a bit of a context if you're not familiar, that you don't own the models. And a couple of examples of this. In February 2026, g p t four o was killed off.

In August 2025, if you remember, Anthropic also cut OpenAI Claud's access over basically a dispute months after cutting off Windsurf. And then again, in 2024, 2025, different regions were cut off at various different things.

So if you don't own the model, it's not running on your computer, you effectively can be shut down at any time, which is really important to understand. And I'm not saying that local will is currently outperforming the best models.

There are absolutely trade offs, but you want this in your stack. It's so important.

But the one that cool benefits of why local's important, the most people don't realize is a, it's 100% private. So you can talk to it about anything.

You have no limits. You can do it if you're in the sky on a plane or you're 500 feet the ground. Don't know why you'd be 500 feet on the ground.

Maybe you're digging. I don't know. And, also, it's yours.

Your data, your prompts, sensitive health information, private company data, it is completely locked in yours. I'm gonna show you what the best models are for this right now and exactly how to set it up.

It is, as I say, a 100% private, works offline, no meter, no limits. Now the idea here is when you download a local model and it runs on your computer, you can use it essentially infinitely, and it will cost you a massive $0. It's just to compete on your computer.

You don't pay a single dollar or cent for it, and it can effectively run forever. Your model you downloaded can't be retired. It cannot be revoked.

It's not region locked to yours, and it's exactly good for what it is. It's basically just gonna run on your computer. Now in terms of how good are these, the current estimation is that local models are roughly six to twelve months behind the premier front end models.

So in a year's time, we'll probably have a model just like Fable five running on locally. That's, generally speaking, how powerful it is. For example, an RTX fifty ninety, which you can get for a few thousand dollars, runs last year's Frontier model at 70 to 85% of the quality, $0 for tokens, fully private.

The gap only really bites on the hardest reasoning. But this is the part that everybody misses. It's actually way bigger, this whole trend is, than just the model.

You don't just wanna basically own the model. You want to own the platform. What do I mean by that?

You think about any SaaS application that you use, notebook, LM, subscriptions that you have, CRM systems. We can build all of these and run them locally and don't have to pay any dollars whatsoever for any of those subscriptions.

We are living now in the era of the micro SaaS because if you can imagine it, you can build it as you're going to see in this video. Now essentially, there are only three steps to running any model locally, whether you wanna do that with your Hermes agent or ClordCode. We have understanding our capacity, downloading the actual model itself, and then connecting to that, and then we are ready to rock and roll.

Now step one is just understanding the ceiling and the capacity of your model. So more memory means bigger brains that you can actually run. So some people run this on an old laptop.

A lot of people are going out and actually buying it. You might remember the Mac mini phase. So the first thing that we're gonna do is find out the capabilities of our computer.

So if you're a Mac, you click on the Apple icon on the top left, come down to about this Mac, and you're gonna see some information. First thing we're gonna do is we're gonna come over and we are going to screenshot this information. And then we come over to Claude Pacer Imaging and say, hey.

Based on the capacity of my current computer, what would be the best model that I could run on this at a reasonable speed that would be the most powerful given my system requirements? You just ask a very simple question like this, and Claude will come back and recommend a couple of different models.

And actually, remember, we wanna download a couple, try a few, see what works, and then from that, we can effectively do anything. And once you basically ask that question, you'll get some recommendations. So Quen three is fantastic.

It's such a good local model. So this is really helpful for us, and it's fantastic. So what we're gonna do now, we've actually understood what we're gonna go for.

And bear in mind, you need some headroom. So if you had, for example, 20 gigabytes of space, you wouldn't get a 20 gigabyte model. You need some headroom above that as Claude very well knows.

And so now we have a personalized suggestion from Claude. Here's just some figurative examples that might help you understand the better.

You know, you might run the QUAN three eight billion parameter on old laptop with eight gigabytes. If you've got a Mac mini, that's normally like 16 gigabytes. You can do Gemma four, which is a powerful all round model.

We love that. 24 gigabytes for your kind of RTX. Again, that's 64 gigabytes.

It starts to pick up a little bit, and you can use server clusters. And as you can see, basically, guys, the Angles, the bigger the actual computer that you've got, the servers or clusters, the more powerful and faster the local model actually is, which is fantastic. An interesting ROI, if you're spending over $200 a month on AI, a local rig pays for itself in two to three years then runs at $3 per month for years.

So you can treat hardware as an investment. And so now we understand our capacity. The next thing to do is actually install Alama.

Now there's a few different ideas you can do with this. I personally always find Alama really great. So what we're do is come down come over to basically alarmnet.com and just download the application.

And then you're gonna find this cool here. So just come down and copy this also. Then we're gonna come up and open the terminal like so.

And if you're thinking, Jack, what on earth is a terminal? What does all this mean? It sounds like I'm speaking Chinese.

I'm gonna put a link down below for the full Claw Code Masterclass. It will take you from a complete beginner through building websites, power features, memory systems, features I have never shared on YouTube.

It is the most comprehensive course I have ever done. It has 10 modules. It is fully up to date, and you also get immediate access to this incredible Claude code and Hermes agentic operating system that has, like, a gargantillion cool things about it.

That will get you up to speed in no time. Now in the terminal, what we're gonna do is basically enter in that code. Terminal, by the way, is just think of it as like a chat window to have a conversation with your computer.

And, basically, just you can give it commands and instructions. And then we can also open up the Alama app once you've downloaded that. Now what's really cool here is you can have conversations with free models once you've downloaded it.

Alarm is really cool because what it can also do is effectively, like, chat with anything on the cloud as well. Now we can combine this with our Hermes agent. We can do many different things.

But, essentially, once you download a local model, in this app here, when you send us messages, this is running 100% private on your computer, which means if you had zero Internet connection and you opened up this app, you can talk to it about anything. And a lot of them are multimodal, they can understand images and text, and some of them have tool calls as well so they can build apps for you and websites.

They can effectively do everything, which is why we talk a lot about different agent systems. For example, if I look at this operating system to show you Hermes agent, one the things I have as a skill for my Hermes agent is it's connected to local models. So effectively, I can delegate things to which I'm gonna show you in this video, you know, the kind of strategy behind it based on what the task physically is, which is really, really handy.

Now for the purposes of the demo, I'm gonna go ahead and install this GPT OSS. So all I'm gonna do is I'm gonna copy this and I'm coming to Claude. I'm gonna say, hey, there.

I'd like to install the GPT model for demonstration purposes. Could you give me the command that I need to run on my terminal to install this model, please? And then come down.

I'm just gonna paste in that model and let it go live. Now, actually, Claude can do this for you itself, but I just wanna show you the process just so you get more familiar about everything. If it feels complicated, that is completely normal.

I promise if you just follow these steps, step by step, you will have this running in no time. So as you can see, it's got it there and it shown as a command. So all I'm gonna do is come down and hit play, which effectively is just gonna run that command for us in the terminal, and that's going to install.

Now local is also fun to do. Like, it's really, really fun. I had a big conversation about this recently, just that I love the idea that you can have local models.

And it's profoundly epic because you can effectively do it in any situation you want to. And I'm a big believer in what we call model agnostic building, which means that we're not loyal to any particular company.

We bring in the best model for that particular job. And if one day it's not this or something else, I will tell you that and we'll move over there together. Beautiful.

And when that's complete, we can officially use it. Now if you install it like that on Claude, basically, you just have to give it a few minutes. You can ask Claude, hey.

Has it finished updating? And it will give you an overview. So all we're gonna do now in the Alama app is you can click this, and we're looking for the GPT model, OSS 20,000,000,000 parameters, and I can give it a prompt like, hey there.

If I were trying to get a six pack, what are three things that I should do to accomplish that? Because, guys, obviously, we gotta win in business, health, and relationships, three big infinity stands.

So let's just get a bit of a six pack protocol on there. And look at that. Coming back with the thinking.

Now this is running completely locally. I could have no Internet, no anything, and it'll come back. And it's pretty impressive, and all the different models have different trade offs.

Like, of the open source models are really good at tool calling. Like, DeepSeek, for example, is fantastic at that. Obviously, if we're connecting to our Hermes agent, we can do, like, a million things with this.

And then let's say you wanted to now use that with your Hermes AI agent. You could do the following. Hey there.

I would love to connect this model to my Hermes agent. Could you please configure this and confirm when this is complete? And then once you've spoken with Claude or you can chat with Hermes yourself, or you can do in the actual panel itself, when I've got my Hermes agent, let's come down forward slash model, and then we can come down.

And you can see the current model is GBT 5.5. If I click on custom, check this out, I have all of my beautiful custom models. And if I click on GBT OSS 20,000,000,000 parameters, I'll now be having a conversation with that.

For example, I click on this, and now I'm literally talking to it on my computer. And so you can ask a question, hey, which model is this? And it comes back and it tells you which model it's running.

Speed will vary based on the model you download and the size of your computer. But the key thing to understand is that this entire thing is running on my own laptops. If I had zero dollars to spend and I was entering the world, I could actually use my Hermes agent with this model.

You just have to make sure it has at least a context. I think it's, like, 64,000 tokens.

So as you can see, we can drive the entire local model with Hermes agent as well, which is fantastic. So let's talk a bit about the best available models right now. You know, I've done some incredible videos on deep sea and the giant whale and what that looks like.

We're gonna touch on a couple now just so you're aware of what is happening in the world with these beautiful local models. So Gemma four is freaking amazing. This one is actually a Google model, which is very, very cool.

It's best probably for, like, 16 gigabyte max. It's a good all rounder. It has vision, so you can give it images of things, so it can read and analyze those.

It's really freaking powerful. And there's, four or five different versions of it. Some, in fact, you can run on your phone, which is really cool.

I just I love the idea you can run out on your mobile as you're running about. QUAN three is the best all round local model, good for Egentyc coding. Obviously, got GPT OSS, which is the one that I showed you.

This is the best small reasoner. Then what's really interesting actually in the Egentyc operating system, I actually added in the bottom here under models, this full breakdown. So you can see, like, cheapest areas, fastest, most used, smartest, and you'll able to have a look at all these different models and actually go, you know, double click into them.

So I was really interested in the DeepSeek v four Flash, for example, which is number one. On OpenRoads right now. I can get a good little bit of an overview, which I think is good.

So your AI, your Claude system, your DeepSeek system excuse me. Your Hermes system can plug in and understand what the best models are for any given particular time, which I think is super duper important. So I'll let you screenshot and have a look at those if you want But honestly, guys, a lot of it is chatting with Claude about what are the capabilities and trying a few of them out, and you can just download them and have conversations whenever you like.

And so this leads on to what do we actually do now? Because the answer isn't just to run 100% local right now. Because in honesty, they're not the top performing models.

Like, they're just not. They're not as good right now as Opus 4.8 or GPT 5.5. And there are a lot of occasions whether you're building websites or apps or creating copy or something where we want the best model for the job, and local ain't it as Shakespeare would actually explain to us.

So the core idea here is that we use basically the best model for the job. We have a specific task, and we have what we call a decision engine that will dynamically route that specific query to the specific model based on what is physically required.

This is going to be especially important once we regain access to Fable or a Fable level model again so we we can actually get the right token economics nailed down. For example, if it is a private query or something sensitive or maybe you just want a model that's good enough to run the background twenty four seven on the thing, we can use local.

Cheap AI, you can get kind of 95% of the performance quality of the top tier models in the world using, like, the latest DeepSeek v four, for example, for roughly 1% of the price.

So that's pretty amazing. Obviously, don't own the data in that scenario. We've got things for long contacts.

We may want a million contacts window. Generally speaking, we don't want that because we know performance goes down the longer the conversation is. And then you have ones for hard reasonings, your brainiacs.

And there's other things that we can do to increase these numbers. Obviously, we cover a lot of them the channel, like make sure your first prompt is correct. But effectively, we have this decision making matrix, and it decides which of the four groups do we set it down based on the task.

And by way, I'll put down my routing intelligence prompt that you can use with Claude or Hermes that helps you understand where to orchestrate and send tasks dynamically based on what the query is, whether it should be free.

It needs a heavy thinking model, which will help you save a lot of money on tokens and get maximum bang for your buck. And I've got a few decision making heuristics for you down below so you can check it out. If it's private and sensitive, you may wanna go local.

Again, the thing is if you've got GPT in other words, in English, if you have a chat GPT subscription, for example, and using, like, a personal AI system, you can just use that, and that's not running out, and that's fine. Right?

But we can basically tag in the different models. Now if we're using a Hermes agent, we can access these via Grok silence. You you can use your Grok subscription.

You can use a ChatGPT subscription. You can use OpenRooter, which effectively if you give a model an OpenRooter key, you can effectively connect to any model in the world, which is amazing.

I mean, check this out. Litch, if I come down to models over here, then I click on rankings. You can see right now in real time, just like in the dashboard, what are the most popular models.

You give it an open router key so it connects to your your agent, and it can access any models dynamically, and you can effectively decide which ones I want to tag in at which specific times. Even Hermes, for example, right now, when I change the model, I can see what model I'm dealing with, which is really handy.

I know, of course, we can get in create specific skills and say, hey. I wanna create a deep reasoning agent. Okay?

And that deep reasoning agent, I want to use this model and here's a prompt, here's a description. So when I call that skill, it uses that specific model by using something in the in the Pantheon if you're using Hermes agent. And if you're using Claude, for example, what we do is we use something called a command line interface.

So you can say to Claude, for example, hey there. I would like you to connect to Codex via the CLI.

Okay? And it'll basically open up ChatGPT in the new window. You'll sign in, and then you can actually ask it questions to delegators.

So you could say to Claude something like, hey. I want you to check over your work, and then I want you to also spin up a sub agent in Codex to also verify it.

I never release anything unless I verify it with Gemini. I use Google's model. I use Codex as well.

So I have ChatGPT. I have Gemini, Google's model, and Claude all together review my work, and you would be amazed, by the way. The amount of times that Codex catches something that Claude completely missed.

Honestly, make sure you don't whenever you're doing any work of significance, you wanna make sure that you're using that kind of test. But this then leads on to the biggest trend that actually accompanies this local revolution. And we've already covered the fact that local is six, twelve months behind the premier model.

But there's something that is at the available at the best standard today, which people are not discussing enough. And this idea that you can actually build Microsoftware. Somebody in my community on a call on Friday.

We have coffee. We hang out. It's a great time.

Was explaining to me that they had a service they were paying for, a significant amount of money, and they basically rebuilt it himself using Claude in a matter of hours. And this is not an isolated incident. I do it myself on a daily basis.

Effectively, it's so easy now to build beautiful software that you can actually do it yourself. And beyond this, we have the open source community behind us.

What do I mean when I say that? For example, if you look at NotebookLM. NotebookLM is the world's number one design and research intelligence platform.

Really cool. But what if I you know, what if one day Google says, sorry, guys. We're shutting it down, or we're only gonna let you add 10 notebooks unless you pay us a $100 or a thousand dollars a month.

Well, we're what is known as screwed. Right? Well, maybe not because we can use open source versions of this, which is the exact same software.

We could build it ourselves if we wanted to, or we can use open source versions of this and have any model we want to to do anything. This also happened with core design, where with open design, we effectively reskinned the entirety of this, meaning that we could effectively create anything with any model we want to on at unlimited level.

So this is best understood by taking an example. So let's take open notebook as an example. Okay?

What I'm gonna do is come over to this website right here, and we're gonna check this one out to see what it's like. All you ever do is you come down to GitHub. Again, open Notebook.

And the first thing we do when the software is we check to see, does this software already exist somewhere else? I'm gonna come down. All I'm gonna do is you come down to code, copy this code, go straight over to Claude, and say, hey there.

I would like you to clone this repo and open it up for me in a local host. Now if you're not familiar, GitHub is just fancy speak for place we store files. So essentially, all this is is a load of different files that do various different things.

And when you click on this and copy a link, effectively, if it's open source, it just means that all the code and the files are openly available for everybody. And oftentimes with products, you know, in projects like this, you have loads of contributors. There's 59 people that are contributing to improve this, which means it just grows so much faster.

This is one of reasons why Wikipedia grew so big and defeated paid alternatives because it was just running on the generosity of individuals that wanna grow up. It's the exact same thing that's happening with Hermes agent.

Loads of different contributors all coming together to make it epic. And the cool benefit of this is that, like, it basically takes the power and influence of any individual company away in a sense because you can build versions yourself now with AI.

We're only limited by our imagination. Beautiful. And now we have this fullest thing available called Open Notebook, and it's got local host 3,000 just above.

I So can copy this if I want to. And if I put it into a browser, we should see. So local host just means running on your computer.

And, effectively, this is a notebook LM clone. We can make any adjustments I want to. I can shoot from blue to red, whatever I like.

So if we come on the left hand side now to models on the top left, you can see, effectively, we can just connect different models. And so it's really cool, guys. Actually, when you can decide any model you wanna use here, I can use a llama.

That beautiful llama we downloaded. So I could technically use this locally hosted. I'd recommend that you use, like, an API key, but you can do all of these different things here.

We can basically configure it. We can even come back. Like I told this guy here, I said, I do.

Go and connect it to Alama. And, basically, we'll go ahead and do that for you. And then they've got all these different local models.

I can come off, and I can delete ones that I don't wanna be in there. It's got different embeddings. And, effectively, now we could just go ahead and use it.

So now, for example, I can come over. I can click on new. I can click on source, and I can give it, like, a URL.

So let's give it a sample one. I'll ahead and give it glutter.com, which is the dictation tool that I've been using in this video.

Come back over here, drop that in for instance, click on done. There we go. And it's gonna be added in.

The source is now going to be indexed, and then the source will appear. Then I can basically just do notebooks. So I click on notebook here.

I may just say, like, don't know, interesting websites and come down here, click on create new notebook. We go. I can click on this guy, and then we can add in all the sources we want to.

Come down here, click on source, add one of our existing sources so we can use the same sources on multiple versions. Come down, click on this, click on add selected, which is cool, and then I can have a conversation. Right?

So I might say something like, hey there. What is Glido about? And as you can see, we've the model selectors.

I have if I have different models, I can literally change that so I can chat to anything. And, obviously, I can do as many of these as I physically want to without any kind of limitation at all, which is fantastic. And, like, in it, instantaneously, it comes back.

Glatto is a real time voice text tool. And, guys, this was all 100% local.

Now if I wanted best performance, I would go ahead and grab my API key from Anthropic, and I would use the, you know, basically, Opus 4.8 or ChatGPT. But, hopefully, you can see that.

Again, we can create podcast models transformations. And if I wanna build on this, I can do. For example, I can come back and say, hey.

I don't like the color blue. Go ahead and change blue to green. And now instead of blue, it's green.

And just like that, I can basically make any amendment I want to this. This is just how powerful AI models actually are. So we understand now that effectively we can build any of your softwares.

We can use open source stuff, run it on our computer. And again, no Internet, No problem. I can run this local models.

I can do anything I want to. You can use it to bring down the cost of subscriptions. Honestly, though, I don't see this as a compelling use case, simply because I think I would probably opt to use something like DeepSeek.

Again, 95% of the performance, 1% of the price. Free isn't always free in that sense because the model performance can kinda lack.

So you have to use it for the right tasks, not just for any task. High volume repetitive stuff, it's pretty good for and building offline for true independence. And, again, in a year's time, we're gonna have models that are as good as the Frontier models today.

So you have to learn this, Scott. It's gonna be incredible for the future. Now you wanna stay on the Frontier.

Again, if you want the very best reasoning, we use the Frontier models. Again, for $20, you can get, like, 90% of the results, and you may wanna stick using Exclusive Frontier if you don't have the hardware to actually go ahead and do it.

Because remember, the performance is always gonna vary with the hardware, but it is very fun to do, and I highly recommend it. Now running locally is great, but if you don't have an agentic operating system, you're leaving too much value on the table, which is why the next thing that we're going to do is learn how to build one of those together.

I'll do that in this video right here.

The Hook

The bait, then the rug-pull.

When Claude Fable 5 disappeared without notice, the internet collectively panicked — but the real vulnerability wasn't the model. It was everyone's assumption they had a lease on it. This breakdown follows Jack Roberts's 22-minute response: a step-by-step guide to building an AI stack that no company can revoke.

Frameworks

Named ideas worth stealing.

12:23model

The Decision Engine (One Brain, Every Model)

Local — private/sensitive, offline, $0
Cheap API — 95% quality at 1% cost (DeepSeek v4)
Long context — million-token window tasks
Hard reasoning — frontier models only

A four-bucket routing matrix that directs every AI query to the cheapest model capable of handling it, preventing over-spend on frontier tokens.

Steal forAny workflow using multiple AI providers — build this as a system prompt or Hermes skill

03:40list

Three Steps to a Model You Own

Capacity — understand your hardware ceiling
Download — pull the right model via Ollama
Connect — wire it into your agent or tool

The minimum viable path to running a local model for any beginner.

Steal forTutorial video structure, beginner onboarding flows

05:40concept

Local Model ROI Threshold

Spending over $200/month on AI subscriptions means a local rig (RTX 5090 or Mac Studio) pays for itself in 2-3 years and runs at ~$3/month thereafter. Frame hardware as a capital investment.

Steal forPricing page comparison, hardware recommendation posts

CTA Breakdown

How they asked for the click.

VERBAL ASK

22:20next-video

“Now running locally is great, but if you don't have an agentic operating system, you're leaving too much value on the table, which is why the next thing that we're going to do is learn how to build one of those together.”

Soft bridge CTA — teases the next video rather than a product pitch. The Hermes agent and Claude Code course are pitched mid-video around the 6-minute mark.

MENTIONED ON CAMERA

06:00toolOllama ↗

14:37toolOpenRouter ↗

17:50toolOpen Notebook (GitHub) ↗

20:00productGlaido ↗

06:25productClaude Code Full Course (Hermes) ↗

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.