Modern Creator
Jack Roberts · YouTube

Hermes Agent + Ollama = 100% Private OS

A 19-minute walkthrough for running a fully private AI operating system on your laptop, free and offline-capable.

Posted
today
Duration
Format
Tutorial
educational
Views
1.4K
78 likes
Big Idea

The argument in one line.

Local AI has closed to within one year of frontier performance, making it practical to run a private AI operating system on a personal laptop at zero ongoing cost with speed as the only real tradeoff.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You pay monthly for AI tools and want to eliminate that cost.
  • You handle client data, health information, or proprietary IP you cannot send to a third-party cloud.
  • You want to use an AI agent offline on a plane, off-grid, or in a regulated environment.
  • You have a modern laptop and want to run open-source models like Qwen or Mistral locally.
  • You are a solo builder or small team who wants a private shared agent without a per-seat cloud bill.
SKIP IF…
  • You need the absolute best reasoning quality for hard coding or complex multi-step tasks where frontier cloud models still win.
  • Your machine is underpowered; running a 32B parameter model on an older laptop will be frustratingly slow.
TL;DR

The full version, fast.

Running Hermes agent locally with Ollama costs nothing and keeps all data on your own machine. Setup is three steps: install Ollama, pull a model whose parameter count fits your hardware (Qwen3-Coder-64K for Hermes compatibility), and point Hermes at the local endpoint. The honest tradeoff is that the best local model today benchmarks at around 74 compared to a frontier model at 88, roughly one calendar year behind, which is acceptable for private tasks and background agents but not for the hardest reasoning jobs where cloud still wins.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:46

01 · Run Hermes For Free

Cold open promise: run Hermes locally at $0, private, no internet required. Host intro.

00:4601:25

02 · Why Local AI Matters

Jensen Huang / NVIDIA framing. The phone-moment analogy: computers will become private AI supercomputers just as phones stopped being phones.

01:2503:05

03 · Your Data Stays Private

Data never leaves home ownership argument. No internet needed, no company watching, works on a plane or underground.

03:0504:05

04 · What The OS Does

Hermes OS live demo: memory, connections, goals, personas, GitHub integration, document view.

04:0505:04

05 · The Ownership Cheat Code

Why local beats VPS. Free forever, no gatekeeper, stop renting intelligence.

05:0406:10

06 · Local vs Cloud Tradeoffs

Best local model is about 1 year behind frontier. Qwen 3 = 74 benchmark vs Claude Opus 4.8 = 88.6. Ollama as the key unlocking open-source models.

06:1007:10

07 · Install Ollama First

Visit ollama.com, click download, run terminal install command. App sits in menu bar.

07:1008:56

08 · Pick The Right Model

Screenshot MacBook specs, send to Hermes desktop app to get a recommendation. Top pick: Qwen 3 32B for speed/quality.

08:5610:49

09 · Download And Run It

ollama pull qwen3:32b in terminal. Ollama app shows the model. Chat demo with color theory question, fast local response.

10:4912:56

10 · Branch Chats Mid Session

Hermes branch-chat feature: fork a conversation into two parallel tracks while preserving context. Demo: strategy vs DM outreach tracks.

12:5613:51

11 · Connect It To Hermes

Hermes requires 64K context window. Download Qwen3-Coder-64K. Select it in Hermes bottom-right model picker.

13:5115:20

12 · How Good Is Local

Benchmark comparison chart. Qwen 3 at 74 vs Claude Opus 4.8 at 88.6. Honest: not the premier model but trades off on privacy, performance, and price.

15:2016:27

13 · Free Forever But Slower

The honest scorecard: free, private, as fast as your machine vs frontier still wins the hardest jobs. Encouragement to experiment.

16:2717:42

14 · Vault Mode vs Cloud Mode

Toggle Your Privacy diagram. Vault = client data, health, IP, offline. Cloud = best answer, phone, fresh web, raw quality beats privacy.

17:4218:46

15 · Local Is The Future

Within one year, Opus-level models will run locally. Compliance angle: SOC2, GDPR, ISO 27000. Local is the future, learn this skill now.

18:4619:04

16 · Build The Full OS

CTA: watch the next video to complete the Hermes operating system setup.

Atomic Insights

Lines worth screenshotting.

  • The best local model today performs at roughly the same level as the best cloud model from one year ago, and the gap is closing, not widening.
  • Ollama is not a model; it is the key that unlocks every open-source model (Qwen, DeepSeek, Gemma, Mistral) and runs them locally for free.
  • Hermes agent requires a model with at least 64K context window; most downloaded Ollama models fall short and you need Qwen3-Coder-64K specifically.
  • The fastest way to pick the right local model is to screenshot your machine specs and ask any AI what fits your hardware.
  • Vault Mode and Connected Mode are not competing philosophies but routing decisions you toggle per task based on sensitivity versus quality needed.
  • Running an AI agent locally means it can work as a 24/7 background agent at $0 per token, not just as a chat interface.
  • Local AI compliance value is concrete: SOC2, GDPR, and ISO 27000 audits are easier when data physically never leaves the building.
  • The phone-moment analogy predicts the arc: today a chatbot, soon a private brain, just as phones stopped being phones.
  • Branching a Hermes chat mid-session lets you fork a project into two parallel tracks while preserving the shared context.
  • The real reason to prefer local over a VPS is trust: your data never crosses a network even to your own remote server.
Takeaway

Own your AI before you build on it.

WHAT TO LEARN

The gap between local and cloud AI is now measured in months, not years, and the models you can run free on a laptop today are good enough for most real work.

  • The best local model today performs at roughly the level of the best cloud model from one year ago, and the gap is closing faster than most people expect.
  • Ollama is a free tool that downloads and runs any open-source model locally, replacing metered cloud APIs for tasks that do not need frontier reasoning.
  • Hermes agent requires a model with at least 64K context window to function; Qwen3-Coder-64K is currently the right pick for connecting a local model to the Hermes memory system.
  • Choosing a local model is hardware-dependent: screenshot your machine specs and ask any AI what fits, since there is no universal best model for everyone.
  • Vault Mode and Connected Mode are routing decisions, not competing philosophies: sensitive data goes local, tasks needing the best answer or fresh web data go cloud.
  • Local AI running as a background agent costs $0 per token indefinitely, which completely changes the economics of 24/7 autonomous tasks.
  • The compliance case for local AI is concrete: regulated industries benefit directly because data physically never leaves the building, simplifying SOC2 and GDPR audits.
  • Frontier cloud models still win on the hardest reasoning tasks; going local is not ideological but a routing decision based on sensitivity, quality needed, and cost tolerance.
Glossary

Terms worth knowing.

Ollama
A free open-source tool that downloads and runs large language models locally on Mac, Linux, or Windows, acting as a local API endpoint AI tools connect to instead of a cloud service.
Hermes Agent
An AI operating system aggregating memory, connected apps, goals, and personas into one interface. Supports local model backends via Ollama.
Vault Mode
A Hermes privacy setting routing all requests to the local model only with no internet. Used for sensitive data like client files, health notes, and proprietary code.
Connected Mode
The Hermes setting that uses a cloud frontier model for tasks requiring highest reasoning quality, fresh web data, or quick mobile one-offs.
Context window
The maximum amount of text a model can read and reason over in a single session. Hermes requires at least 64,000 tokens to function properly with its memory system.
Qwen3-Coder-64K
An open-source coding-optimized model from Alibaba with a 64K context window, available free via Ollama and currently the recommended local model for running Hermes agent.
Parameters (billions)
A rough measure of a model size and capability. Larger models are more capable but require more RAM and run slower on consumer hardware.
Resources

Things they pointed at.

06:10toolOllama
05:50productClaude Code full course (paid)
Quotables

Lines you could clip.

04:05
The cheat code is ownership.
Punchy standalone line, zero setup neededTikTok hook↗ Tweet quote
05:04
Stop renting your intelligence.
Visceral one-liner, works as a cold openIG reel cold open↗ Tweet quote
00:47
Local AI is the direction of trouble. It is the future.
Bold thesis claim, slightly awkward phrasing makes it memorableNewsletter pull-quote↗ Tweet quote
17:55
In one year's time, we will have a model like Claude Opus 4.8 that you can run completely locally on your computer.
Falsifiable prediction with a specific model name and timelineTikTok hook↗ Tweet quote
16:28
We're not ideological with this stuff. We just follow what works.
Trust-building credibility line; cuts against the local AI purist stereotypeNewsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogystory
00:00Imagine if you could use Hermes agent 100% privately and free on your own computer.
00:06In this video, I'm gonna show you exactly how to power Hermes agent 100% locally. So you can build and use your Hermes operating system from $0 and have everything completely private with no limitations, rate limits, and work anywhere in the world with no Internet.
00:22And I'm also gonna cover a brand new Hermes update that makes this even easier even if you're a complete beginner. And if you're new, I'm Jack. I built and saw my Let Tech starter with a gazillion customers.
00:32Now I build my own AI companies and I share on this channel the stuff that actually works. So if you haven't already, grab that beautiful coffee and let's jump straight in.
00:41So Hermes Agent is world number one AI agent assistant right now, and with Alarm, we can own our own intelligence. And as somebody who recently scored a perfect 100 on an IQ test, I've been speaking a lot about intelligence recently. Now the idea here is it's a private AI, and we're gonna own it on your own machine.
00:57Now one cool concept that you have to understand before we even get started here is that local AI is the direction of trouble. It is the future. Jensen Huang at NVIDIA conference very, very recently had this to say, that everybody who uses computers today as a tool, every engineer, every creative artist will need an AI supercomputer.
01:16By the way, if you watch the end of this video, you're gonna know exactly how to run Hermes Agent on your phone from anywhere completely private and locally hosted.
01:25Now the idea here with local AI is that it runs on your laptop, your data stays in the room, and you never get a monthly bill because it's 100% private.
01:35You physically own it, everything, and the data is not going to OpenAI or Anthropic. It's completely yours. Now, if you think about the analogy that Jensen used here, is that in a phone in the nineteen nineties, a beautiful decade to be born if you ask me, the whole purpose and concept of the phone was that we would basically just make calls.
01:52And I'm told with great degrees of confidence that people used to have these kind of bricks on the side of their heads walking around. Whereas today, we effectively do everything except make phone calls.
02:03It probably counts, at least in my case, probably 2% of my actual phone use is on calls. And Jensen's philosophy here, which I think is really interesting, obviously Jensen being the CEO of NVIDIA, is that today the one thing you don't do with your phone is make phone calls, you just about do everything else. His direction and philosophy, which he believes very strongly and is backing NVIDIA's future on, is that the same will be true with your computer that we will all have these supercomputers.
02:27So understanding exactly how to go local and how that connects with your agent and cloud code is gonna be an incredibly powerful skill and I'm gonna show you all that with no fluff in this video. So here's the thing.
02:39The idea is we're gonna have a Hermes operating system, which is your entire world in one place. Now we can speak to Hermes, we can run our beautiful Hermes operating system that shows our entire memory system and covers everything that we possibly may wanna use. We can chat with it directly, see our connections, come down here, even look at goals, we can build personas and skills, we can even connect this to GitHub, do beautiful things, see our information, get skills, and even view documents in a beautiful configured way.
03:05This is the power of the operating system which is really cool. Now the idea of why we wanna do a private operating system is gonna be really interesting. So the idea here is that we have one home for every app.
03:16We can see our usage everywhere. We can schedule and run things. We read documents.
03:20It remembers what it learns and it proactively comes back and suggests how you can be better based on the conversations you've had. But crucially, it actually connects everything together.
03:32So it isn't just about Hermes. Sometimes we're talking to ChatGPT, and other times we're talking to Cloud Code.
03:37So being able to see and understand our entire AI world in one location is really, really, really important.
03:45And crucially with private, you're not tied into any vendor. You literally you're basically running everything you want to locally. Now interestingly, homies, I've just released a desktop app and I'm gonna show you exactly how that actually makes one thing easier in this video, when you should use it, when you shouldn't, and what this thing actually physically means.
04:02Now before we build up, we have to understand why we're actually doing this in the first place. So the cheat code essentially is ownership. So the idea here is that your data never leaves the home.
04:12Or if you're a business, all of the data, all the models are running directly on your own computer. That's one of reasons why I really like running it locally, not on a VPS for several different reasons.
04:22I'm not sponsored to sell you some kind of VPS. I tell you what I do, I run it locally. This is what I tell my people, run it locally.
04:28I just think it's a lot better way to do that. The idea here is your data never leaves your home. No internet needed, no company watching, it's all yours.
04:36And basically, whether you're 16,000 feet underground or you're on a SpaceX rocket ship, makes no difference. Okay. Works without the internet.
04:43No company gets your data to the extent to which you think that analyzing it is is completely your call, but they don't get it anyway. No brain limits ever. It's free forever.
04:52There's no gatekeeper and you own it outright, which is fantastic. So idea here is that we wanna stop, so to speak, renting our intelligence under a few interesting trade off of this. I'm gonna get into this video.
05:01But idea here is that you're gonna have it all running perfectly local. Okay? Again, it's complete free to do.
05:06The top labs won't open. I usually typically speaking we see in markets. If you're behind, you just open source a thing and push it.
05:12And a laptop now beats old GP. Now just to put this into perspective, is that the because you might be thinking, Jack, what about performance? The best local model today is about one year behind wherever we're currently at.
05:24So for example, the best local model today is as good as the best model that existed in around mid twenty twenty five. So that would be Claude Sonnet four. Just to give you some perspective, that's how good they are and how close they are behind current models.
05:38Our expectations just changed so quickly. So these aren't exactly, you know, caveman basically running around in our laptops. And then what we're gonna be doing in this video is gonna be using Allama to essentially unlock everything for us.
05:47So we have the metered way of doing AI, which is maybe using ChatGPT, OpenAI, Claude, etcetera in cloud services.
05:55Just fancy way of saying that we're just running it on their service, their infrastructure. But with our powerful, handsome llama over him, he has the keys to unlock Quan, DeepSeek, Gemma, Mistral, loads of these kind of interesting open source models that we can do.
06:09We can download the ones, run them for free forever, and nothing ever leaves our machine. So the very first thing I'd love you to do is head over to this beautiful website. Come over and just click on download.
06:18This actually sits like an app and you can, when you get these things, literally just chat to your models there if you want to. But of course, we're gonna connect to Hermes and we can chat to anything. So we're gonna come down and download that for Mac OS.
06:29And once you've done that, I need you to open up the terminal. That's command space bar and type in terminal and the terminal will appear. And then effectively, all we're gonna do is let you come over here and just copy this information like so.
06:40And then when I bring the terminal up, this will just install the latest version of Alama onto our computer. You can see all the code is gonna go in the background, is fantastic. And if this sounds like I'm actually speaking Icelandic right now, you can go ahead and grab this Clock Code full course.
06:54I'll put a link for it down below. It takes you from foundation setups, building websites, power features, memory systems, and Stuff I have never covered on YouTube. It is the best course I've ever created.
07:03You'll unlock all of this, and you also get the entire cord code, Hermes operating system immediately as well. I'll put a link down below for you so you can grab that if you find that beneficial. Now on the terminal, this is now fully done.
07:15What we wanna do is open up the app. So it's your command space bar and type in Olama and he will appear when we say his name. Cool.
07:22So this is the Olama app. So if you wanted to, at this point, you could just talk to your model privately. And effectively, you can literally just click this download button, like Gemma four, whatever it is, and you'd be ready to go.
07:31But the first thing that we need to do is understand what is the best model for you to look at. And I've looked at alternatives to Alarm and it comes to the free space.
07:39I just find it easiest to use. Now if you click on the app if you're on a MacBook, the top left, you come down to about this Mac, you'll then get some information. All you're gonna do is literally screenshot this guy by coming up like so, and we can literally ask Hermes, hey, what is the best local model for us to run for this thing here?
07:54And the best way to show you that might be actually using the Hermes app itself. So if you come over to this website, which is Hermes Agent News Research, you click on desktop app, come down, and what we can do is download the Mac OS here. What's cool is I haven't done it on this computer, so we can get the entire process together.
08:08You And can just get an overview of what it is and how this fits into the stack. So you go we click on it, and then we double click on Hermes, and it should just pop up. We're happy for that to happen.
08:15Click on open, install Hermes, although we've already got it, which is fine. Now this Hermes app installation step is completely optional. You can just chat to it in Telegram if you want to.
08:23I just wanna show you what Hermes have done so you understand it. Come down and click launch Hermes. This rocket here icon, by the way, is classic when we when you vibe code stuff up.
08:31No. It's kinda like always pops up. But guys, when I downloaded it, it could not launch the desktop app.
08:35If that happens, it is because your Hermes is not up to date. So all we're gonna do is come down and do Hermes update inside the terminal. Man, you feel like you're the matrix sometimes doing a terminal.
08:45It's basically just a way to talk to your computer. That's straightforward. Come down, and we wanna restore these changes, which is cool.
08:51I'll send that one off. Beautiful. So now it's complete.
08:53If I come off this and just rerun it one more time, that should work fantastically for us. Come down and install. Beautiful.
08:57And then we have the Hermes dashboard. So, basically, what we can do is to create a new session here, click on new session, and then I'm just gonna come down and just say, hey. And you'll see, can we start a conversation?
09:06And it's just exactly the same thing as talking to it on your, you know, your actual telegram. The only difference here, it realistically is think of the Hermes desktop app. And the reason I've done dedicated video on it is because for me, basically, at the moment in its current configuration, it's just a less intimidating way of using the terminal, which I think is a wonderful thing and the guys are crushing it.
09:27So huge job to those guys. Now what I'm gonna do is give a message. Hey, though.
09:30Based on the specifications of my MacBook, what do you think would be the best performance model that I could grab from Olama to download and run a model on my computer? Okay. And then I'm gonna send in the image, and then we send that one off and let Hermes work its magic.
09:43Cool thing, usually, if you are gonna be using this desktop app, it should come down to here and you can actually pick the models that you've installed. And I think what is really cool is you can kinda pick like a minimal low, medium, high max, which I think is a quite a nice one then. So top recommendations, QUEN three thirty two b or QUEN five thirty two b is best overall performance right now.
10:00Excellent speed and quality. So let's try that one. Let's say we want something speed.
10:03What and all we're gonna do is actually copy this like so. Okay. Then basically to run it, you're gonna basically open terminal one more time.
10:09So come up to terminal like so, and then you're gonna give it two commands. And basically, can say, hey, what commands do I need to run to install insert the blank? So the first one for us is gonna be Alama basically QAN three, which is gonna be fantastic, and it's gonna pull down the manifest for us.
10:23And you can see the whole thing is literally downloading. If you wanna check out on the website, by the way, and come check out models, you can do that as well. I have a nice little browse.
10:31And it's really cool. I love the competition in the open source market. And again, remember, it's as good as models a year a year behind.
10:37So a year is not that much time when you really think about it. It's just incredible to see how much they're actually developing. So this is downloading this puppy here and she took about three minutes, then we're ready to rock and roll.
10:47And just while I start loading, one cool thing you can do is if I come down here and I say, hey, and I begin a conversation, I can if I want to, what we call branch something out into a new chat. So if I'm building on something in particular, I can come down here, click on this, click on branch in new chat. Okay?
11:01And then now it forks, which means, like, let's say that you're working on a project, like, I'd know to grow your LinkedIn or something, and you think, actually, I really wanna do two things now. One is I wanna build strategy, and the second thing I wanna do is do some DM outreach, fork the chat, same contact, two different windows.
11:16I think, basically, the strategy that building here is essentially to kind of be the use any model platform. That's kind of a direction of travel we're going in.
11:24And it's called the building out the artifacts and you can save it. One of the reasons why I kind of built out our operating system this way, and I do think the future is a 100% configurable, because you can actually just like add different things. Like, I can click on this and see it.
11:36I can actually come down and actually see the images as I'm actually pulling them out as I get it right. Like, this is an overview you did for me and you can edit it. So really interesting.
11:45And as you see here, this is now completely finished. So we just need to give it a second instruction. Now that code basically would be to chat to us.
11:50If I ever wanna chat to it in the terminal, I run this code here. But in reality guys, who's chatting to it in the terminal, unless you've got a huge Windows laptop and you're just like completely nerd now. In reality, we can come over to Alama and you'll see in Alama that this mysterious QAN model has mystically appeared.
12:05So if I come down here, I'll be able to select this here, which is QAN three thirty b just means billions of parameters, which is fantastic. I'm gonna go ahead and just shut this one down so we can chat to it in a cool interface that can say, hey there. Give me three interesting facts about color theory and design.
12:22And immediately, we're getting all the thinking, which is fantastic. And by the way, if you're looking for what I'm doing for speech dictation, just tell you what it is. I'm using something called glider.com, which is a company that we founded, fastest in the world, super private.
12:32So if you wanna check it out, I'll put a link down below with some goodies so you can have a little play around with that. We freaking love it. And look at this.
12:38This is completely guys, how freaking how fast was that number one? And how cool was that? Again, we went for a bit of a faster one.
12:44But now, anything that we talk we talk to Quan about is running on your computer. Think about that. It's literally on your computer.
12:50I have never had so much fun than running something locally, which is sick. But then it leads to the next question, which is how do we get it from the computer into Hermes agent? And so Hermes requires a local model to have 64,000 tokens in its context window.
13:05The one we just downloaded doesn't. So what we effectively then wanna ask the model effectively is what can I run on my computer that has enough context? I asked Claude this exact question, and essentially it said you wanna grab the QUEN three Coder 30 b.
13:17Now the great thing is the thing we just downloaded, we can use on a computer for anything. For tasks that don't require more than 25 to 30,000 words. But due to the nature of the way that the Hermes agent works and its contacts on its memory, it needs that 64 K.
13:31So we're gonna download it and then we can connect it directly to Hermes agent. And now that's complete, we can see the QUEN3Coda64K has arrived and we can chat to it within, you know, Alama.
13:41And also, it's here right here with the Hermes agent and you can see in the bottom right hand corner, QUEN3Coda64K. So now we can literally chat to it and do various different things with Hermes completely locally and private on your computer.
13:54And this then raises a really important question for us to understand here, which is essentially how good is local realistically? Like, how good is it actually? Well, if you think about that we're one year behind, these are the benchmarks to the extent to which you back and believe these.
14:0788.6 for basically called Opus 4.8. And again, you can make the argument that optimizing benchmarks, I get that. QAM, which we're running 74.
14:15So is it the absolute premier model? No. But you are trading off on privacy, performance, and price.
14:23And bear in mind, given the fact we're 12 only twelve months behind the best models, I can tell you in the future, within one year's time, we will have a model like Cord Opus 4.8 that you can run completely locally on your computer. Now, obviously, if you have some of the bigger, more powerful models, your computer might go a little bit slower.
14:40It might take two business days to go from here to here just to bring your mouse cursor across. So it's not for work that we want a snappy answer, especially if we're going for a big model, but it absolutely has a really important place in our ecosystem.
14:53And if you take, for example, if you imagine you split all the work you do, say for example, this laptop right with my beautiful Springer Spaniel on it, represents 100% of the work you do with Hermes agent, and you chop that up into percentages.
15:04There'll be some things where you're like, I actually want a private model that doesn't fall into the hands of any company that I can talk to, and it's amazing but I sort of stuff and it's only getting better. So transparently, it's free forever.
15:16$0 for token. It is though only as fast as your machine. Like I say, like, if it's really slow, it's gonna take ages to move across.
15:24Now, it's total privacy data never leaves your computer. That's it. You could be 16 feet on the ground, you could be high in the sky, you could be on Mars, it doesn't matter.
15:34And I'll tell you, I was flying from Dubai, a beautiful place where I live right now, to LA. And I remember, like, the Internet at one point wasn't working or, like, I hadn't set up yet. And I was just using it on my computer and it felt really freaking cool.
15:46I was like, it's like so fun to be able to ask these questions locally on your laptop without the need for anything else. It's really freaking sick. I absolutely love it.
15:54Now, when it comes to peak performance, the frontier models are still winning. They're still the best ones.
16:00They do the hardest jobs, and they're great for those kinds of things. So the philosophy that I'm progressing, and I encourage you to think about because I'm not so I'm always gonna keep it real with you guys. I'm never gonna say, oh, we're always private.
16:10We're always local. No. Like, we bring in the best thing for the job.
16:14And if something's no longer the best thing, I'll tell you, I didn't. It's no longer the same thing about something else. We, you know, we're not, like, ideological with this stuff.
16:21We just follow what works. That's that's the whole ethos of the channel. Stuff that works and is fun and just you can build anything.
16:27Now the idea with this toggling your property thing is we have vault mode. Okay? So things like maybe private data, health stuff, client information, we're gonna go vault mode with that.
16:35And the cool thing is we can actually dynamically get Hermes agent to bring that in. I can say, hey, Hermes, I'm just run up my local, my private send this to the private model.
16:44Okay? Go do this over here if you want to. Very freaking cool.
16:47Then we have connected mode, which effectively is performance mode. Right? Give a little more detail on it.
16:51When would I use private vault mode? Well, client data, finances, health notes. You could boast in proprietary IP.
16:57Interesting. Right? Like, maybe we're building a OpenAI two point o and we can't have someone knowing about it.
17:02That would be a private thing. It's offline. You're on a plane.
17:05You're off grid. Doesn't matter. You can do it.
17:07Let my attitude joke sometimes like the electricity went out. And he had a private model. He could still, like, think about things, which is really cool.
17:13Twenty four seven background agents all day. They can be running for you twenty four seven at absolutely $0, which is code for good news.
17:20And then we've got the cloud. So when we want the best answer, quick one offs from your phone, fresh web infos, private search pipeline, when raw quality beats privacy.
17:27And by the way, don't feel bad at all if you download a model and it just makes your computer too slow. That's cool because you need a little bit of headroom above what you download.
17:36So it's fine to download, delete, find models that you think are good. Have this is supposed to be fun. Like, this isn't supposed be I gotta get this it's a fun thing.
17:43It's it's good. It's like it has a lot of utility, and it's also freaking fun. So have a bit of time with it.
17:48Now the idea here, you're gonna run your business. Now we are in a year's time gonna have a model like Opus 4.8 that's gonna run completely on your computer. We just this is based on where we're at compared to local private.
17:58So learning the schools that you're learning right now in this video is gonna put you so freaking far ahead. It's unfricking believable. And you'll have this in offices.
18:05These private company brains running with client data, everything is kind of, you know, basically boxed off and the cloud is outside. So we had this moment, everyone going to the cloud. Now the cloud is old news.
18:14Now we're going local. Local is the future. It's a big, big, big trend.
18:18Expect to see this blow up, and you've learned the schools exactly how to do that. And the idea is that client data is always gonna stay ours, one private agent for the entire team, and it works in a regulated environment.
18:27In Glider, for example, we are going through SOC two compliance, GDPR compliance, ISO 27,000, all that sort of stuff.
18:34Because compliance is freaking absolutely critical and super duper important. I can tell you now when we're building these things out, having these local models does make a significant difference and it's great and really important for regulated work. The idea being we can own our intelligence.
18:49But it does bring us on to one interesting question. And that's that going private is one thing, but if you don't have an operating system, you're not unlocking the full capabilities of this incredible technology. So the next thing I'm gonna do is set that up by watching this video right here.
The Hook

The bait, then the rug-pull.

The promise lands in the first four seconds: run the world's top AI agent on your own machine, free forever, with nothing leaving the room. What follows is a no-fluff installation walkthrough that delivers exactly that and then builds out the privacy philosophy around why owning your intelligence beats renting it.

Frameworks

Named ideas worth stealing.

01:05model

The Phone Moment

Jensen Huang analogy: phones were invented for calls but now we do everything except calls. Computers are at the same inflection point.

Steal forOpening frame for any local AI is inevitable argument
05:10model

Ollama Unlocks Your Models

  1. Qwen
  2. DeepSeek
  3. Gemma
  4. Mistral
  5. LLaMA

Cloud = locked behind a metered gate. Ollama = the brass key that unlocks every open-source model for free local use.

Steal forAny explainer on open-source AI tools
16:27model

Toggle Your Privacy

  1. Vault Mode (local only, airgapped)
  2. Connected Mode (cloud, frontier quality)

Not either/or: dynamically route tasks to local or cloud based on sensitivity vs quality needed.

Steal forPrivacy posture framework for AI tool stacks
15:20list

The Honest Scorecard

  1. Free forever ($0 per token)
  2. Total privacy (data never leaves machine)
  3. Speed limited to your hardware
  4. Frontier models still win hardest jobs

Transparent four-quadrant breakdown of what you gain and give up going local.

Steal forAny comparison where you want to build trust by naming the downside
17:42concept

Your Business, Its Own Brain

A private local AI agent for an entire team: client data never leaves the building, one shared agent scales headcount, compliance-ready for regulated industries.

Steal forB2B pitch for local AI in professional services
CTA Breakdown

How they asked for the click.

VERBAL ASK
18:59next-video
So the next thing I'm gonna do is set that up by watching this video right here.

End-card redirect to the Hermes OS setup video. Low-friction, no product pitch, pure retention play.

MENTIONED ON CAMERA
Storyboard

Visual structure at a glance.

cold open
hookcold open00:00
Hermes + Ollama intro slide
promiseHermes + Ollama intro slide00:52
Local AI is the future
valueLocal AI is the future01:05
The Phone Moment diagram
valueThe Phone Moment diagram01:46
The cheat code is ownership
valueThe cheat code is ownership04:05
Ollama unlocks your models
valueOllama unlocks your models05:10
Ollama download page
valueOllama download page08:57
How close is local, really?
valueHow close is local, really?13:51
The honest scorecard
valueThe honest scorecard15:20
Toggle your privacy
valueToggle your privacy16:27
Your business its own brain
valueYour business its own brain17:42
CTA next video
ctaCTA next video18:46
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

28:01
Jack Roberts · Tutorial

How I Build $10,000 AI Websites in 17 Mins

A 28-minute walkthrough of the complete AI website pipeline: extract design DNA, brief it into Google AI Studio, refine in Claude Code, then use competitor outlier analysis to wire it for conversion.

May 26th
13:17
Jack Roberts · Tutorial

Google's Gemini 3.5 Just Dropped, and?

Jack Roberts breaks down the triple Google drop ? Flash 3.5, Antigravity 2.0, and the CLI that replaces Gemini CLI ? and shows you exactly where each fits in a Claude-first workflow.

May 20th
Chat about this