Hermes Agent + Ollama = 100% Private OS
A 19-minute walkthrough for running a fully private AI operating system on your laptop, free and offline-capable.
June 5thA 22-minute guide to running AI models you actually own — local hardware, open-source SaaS clones, and a decision engine that routes every task to the right model.
Running AI models locally isn't just a backup plan — it's the foundation of owning your stack, and the same tools that let you self-host a model let you rebuild any SaaS product for free.
Cloud model access can be revoked at any time, and local AI is the only ownership guarantee. The playbook is three steps: check your hardware RAM to know which model size fits, install Ollama and download a model (Qwen 3 for coding, Gemma 4 for general use, DeepSeek for tool calling), then wire it into your workflow via a decision engine that sends private or cheap tasks local, bulk tasks to cheap API models like DeepSeek at 1% of frontier cost, and only the hardest reasoning to frontier models. The open-source ecosystem means any SaaS tool — including NotebookLM — can be cloned from GitHub and run locally for free.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
Hook: urgency framing + double promise — local models + local software alternatives.

Historical examples of model revocation: GPT-4o killed Feb 2026, Anthropic cut Windsurf, region lockouts. You can be shut down at any time.

Four local benefits: private, works offline, $0 tokens, runs forever. Your data never leaves your machine.

Honest assessment: local is ~6-12 months behind frontier. RTX 5090 runs last year's model at 70-85% quality for $0/token.

The bigger insight: own the entire SaaS stack, not just the model. Micro-SaaS era — if you can imagine it, you can build it.

Screenshot Mac specs → ask Claude to recommend model. RAM table: 8GB → Qwen3 8B, 16GB → Gemma 4, 24GB+ → RTX-grade models. $200/month AI spend = 2-3 year ROI on local rig.

Download from ollama.com, run terminal command to install. Course pitch mid-chapter. Terminal explained as 'a chat window with your computer.'

Ask Claude to generate the install command, run it in terminal. Demo: GPT OSS 20B running locally in Ollama app, answering questions with no internet.

/model command in Hermes agent → custom models list → select local model. Demonstrates talking to local model through the agentic OS.

Gemma 4 (vision, 16GB max, good all-rounder), Qwen 3 (best agentic coding), GPT OSS (best small reasoner), DeepSeek (best tool calling). Model intelligence dashboard shown.

'ONE BRAIN. EVERY MODEL.' Four routing buckets: local (private), cheap API at 1% cost/95% quality, long context, hard reasoning. Routing intelligence prompt offered in description.

OpenRouter key connects agent to any model dynamically. Demo: live model rankings, switching mid-conversation, creating a 'deep reasoning agent' skill with a specific model.

Never release significant work without multi-model verification: Claude + Codex + Gemini all review the same output. Codex often catches what Claude misses.

Community member rebuilt a paid SaaS tool with Claude in hours. Open-source contributor model (59 contributors on Open Notebook) means tools improve faster than paid SaaS.

Clone the open-notebook repo via Claude: 'clone this repo and open it on localhost.' GitHub explained as 'fancy file storage.' Open source = all files freely available.

Open Notebook at localhost:3000. Adds Glaido.com as a source, creates a notebook, switches model to local Ollama model, queries it. All running 100% locally.

Honest close: local isn't the answer for everything. $20/month gets 90% of results. Use local for the right tasks — high volume, private, offline. In a year local = today's frontier. CTA to next video.
The risk isn't which model wins — it's that any model can disappear overnight, and the builders who planned for that already have local infrastructure running.
“You don't just wanna own the model. You want to own the platform.”
“Cheap AI — you can get 95% of the performance quality of the top tier models in the world using the latest DeepSeek v4, for example, for roughly 1% of the price.”
“Your model you downloaded can't be retired. It cannot be revoked. It's not region locked to yours.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
When Claude Fable 5 disappeared without notice, the internet collectively panicked — but the real vulnerability wasn't the model. It was everyone's assumption they had a lease on it. This breakdown follows Jack Roberts's 22-minute response: a step-by-step guide to building an AI stack that no company can revoke.
A four-bucket routing matrix that directs every AI query to the cheapest model capable of handling it, preventing over-spend on frontier tokens.
The minimum viable path to running a local model for any beginner.
Spending over $200/month on AI subscriptions means a local rig (RTX 5090 or Mac Studio) pays for itself in 2-3 years and runs at ~$3/month thereafter. Frame hardware as a capital investment.
“Now running locally is great, but if you don't have an agentic operating system, you're leaving too much value on the table, which is why the next thing that we're going to do is learn how to build one of those together.”
Soft bridge CTA — teases the next video rather than a product pitch. The Hermes agent and Claude Code course are pitched mid-video around the 6-minute mark.
00:01
00:16
00:33
00:51
01:14
01:32
01:51
02:09
02:27
02:45
03:00
03:12
03:22
03:35
03:51
03:58
04:12
04:28
04:45
05:02
05:13
05:35
05:51
06:02
06:20
06:41
06:57
07:14
07:31
07:40
08:04
08:21
08:39
09:04
09:13
09:31
09:48
09:57
10:19
10:34
10:53
11:05
11:22
11:40
11:57
12:13
12:23
12:48
13:12
13:21
13:38
14:00
14:16
14:28
14:47
15:07
15:20
15:36
15:55
16:19
16:30
16:48
17:08
17:23
17:36
17:53
18:16
18:34
18:52
19:09
19:29
19:48
19:59
20:25
20:44
21:03
21:20
21:41
22:01
22:19A 19-minute walkthrough for running a fully private AI operating system on your laptop, free and offline-capable.
June 5thA 41-minute field guide to the open-source AI agent framework — 21 concepts, zero jargon, one tutorial that starts from zero.
May 29thHow an Apache-licensed local clone removes the weekly cap, unlocks every LLM, and lets you build client work without burning Anthropic credits.
May 1stHow Jack Roberts wired Google NotebookLM into the Hermes Telegram agent -- turning a free 300-source research platform into a 24/7 AI assistant that reads, acts, and automates from your phone.
May 18thA 14-minute walkthrough for wiring Andrej Karpathy's self-auditing LLM wiki into Hermes agent — so your AI can read your inbox, meetings, and expert research, not just you.
June 14thA 17-minute live showdown comparing one-shot website descriptions, award-winning site clones, and design DNA extraction -- making the upgrade case for Fable 5 while showing how to tame its token appetite.
June 12th