100 hours of Hermes Agent lessons in 46 minutes
A 47-minute walkthrough of all seven levels of Hermes Agent — from bare VPS to full MCP back end.
May 6thA two-host deep dive on why self-hosting open-source AI is a freedom fight, not just a cost play — covering hardware tiers, model benchmarks, geopolitical risk, and the case for owning your inference stack.
Running frontier-capable AI at home crossed from hobbyist experiment to economically rational decision in 2025, and the window to spread ownership before governments restrict access may be shorter than two generations of model releases.
Self-hosting AI crossed a viability threshold: a $50,000 rig of RTX Pro 6000s now runs GLM 5.2 at 60–80 tokens per second — frontier-capable inference that cost $100,000 to match a year ago. The hosts argue ownership is not just a cost play but a political one: governments are likely to restrict access to the next one or two model generations, and spreading open-weight models now, like seeding torrents, is the only hedge. Hardware advice is practical: start with LM Studio on any machine, rent cloud GPUs before buying, and think in increments — two RTX 3090s for under $2,000 run all Qwen and Gemma models; eight RTX Pro 6000s for $80,000 run everything at frontier speed.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
0xSero demos his home inference rig running GLM 5.2 with custom compression, showing concurrent sessions and token throughput.

Discussion of GLM 5.2 capabilities — agent work, Docker, reverse engineering — and why Chinese models face a distribution and perception gap despite technical strength.

Oxylabs sponsor read, then both hosts share their experience with Fable 5 before the ban — described as 'an actual contributor, not a tool'.

The guest predicts the model will return lobotomized; discusses whether the ban was justified on cybersecurity grounds or a strategic narrative pivot.

Both hosts argue that within two model generations, governments will control access to the most capable models via sanctions-style lists. Bitcoin / crypto analogy introduced.

Discussion of why open-source believers still go to Anthropic — access to the most intelligent systems, Moloch dynamics, and the Claude constitution culture.

0xSero breaks down why Anthropic's subscription model is unprofitable per consumer but profitable at enterprise, and how self-hosting inverts that math at 2B+ tokens/month.

Autonomous vehicles as the current case study for technology deployment being slowed deliberately. Societal effects on male employment and entry-level jobs.

Practical breakdown: $2K (2x RTX 3090 → Qwen 35B), $9K (2x DGX Spark → Step 3.7 Flash), $20K (4x DGX Spark), $50K (6x RTX Pro 6000 → GLM 5.2), $100K (8x RTX Pro 6000 → everything).

Technical explanation of why mixture-of-experts models have cheaper decode memory than headline parameter count suggests; experimental NVIDIA+Mac hybrid prefill/decode setups.

Screen share of multiple concurrent local agents — file reorganization, GPU research, Bloodborne-style game generation — all running on the home rig via Droid.

Practical case for downloading weights now as a hedge; uncensored Hermes 70B example (peyote cactus care); French government dataset takedown precedent.

0xSero's origin story (first tech revolution he could participate in at 24), San Francisco energy vs. cost, unitree robot demo with local Gemma inference, closing thoughts.
Self-hosting frontier-capable AI crossed from expensive hobby to rational infrastructure investment in 2025, and the window to act before access gets politically restricted may be short.
“I'm using 374,000,000 tokens a month locally.”
“If the government controls the future of intelligence once we have AGI and beyond — it's over. It cannot be centralized.”
“A company spending $300,000 a year on Anthropic billing — it's not beyond the realm of possibility to purchase hardware and reduce costs long-term while having private inference.”
“I can give inference to maybe 24 people with the cards I have.”
“I was basically way too early to any large technological revolution — and this is the first one I can actually take part in.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
The video opens with a live demo: GLM 5.2, the largest practical open-source model, running at home on custom-compressed weights — 374 million tokens a month, locally, for the price of a mid-tier GPU cluster. The question the rest of the conversation answers is whether the math and the politics make that worth it for everyone else.
00:00
01:17
02:52
03:47
04:51
06:35
07:20
08:41
10:18
11:23
12:34
13:46
14:58
16:10
17:22
18:34
19:46
20:58
22:10
23:22
24:33
26:19
26:38
28:09
29:21
30:33
31:45
32:57
34:09
35:21
36:32
37:44
38:56
40:08
41:20
42:32
43:44
44:56
46:08
47:20
48:31
49:43
50:55
52:07
53:19
54:57
55:43
56:55
57:46
59:19
59:57
61:42
62:54
64:06
65:18
66:30
67:42
68:54
70:06
71:18
72:20
73:08
74:53
76:08
77:16
78:29
79:41
80:27
81:31
83:17
84:28
85:40
86:52
88:04
89:16
90:56
91:40
92:42
94:04
95:16A 47-minute walkthrough of all seven levels of Hermes Agent — from bare VPS to full MCP back end.
May 6thA senior developer's real AI-agent setup, and the argument that the harness — not the model — is where the leverage lives.
June 18thA 22-minute live walkthrough of wiring Hermes Agent to Apify MCP connectors and Supabase to automate lead scraping, scoring, and outreach.
June 15thA 32-minute live workflow session on why agentic harnesses are the right home for Fable 5, not the Claude app or raw API.
June 10thHow MiniMax M3 sparse-attention architecture makes always-on autonomous agents 10–100x cheaper than running Opus or GPT-5.
June 8thA 26-minute step-by-step tutorial on the agentic loop command that runs until your goal is actually done.
May 16th