100 hours of Hermes Agent lessons in 46 minutes
A 47-minute walkthrough of all seven levels of Hermes Agent — from bare VPS to full MCP back end.
May 6thA 12-minute setup guide for running Claude Code multi-agent Dynamic Workflows on free local models with no Anthropic account required.
Claude Desktop gateway mode lets you substitute any OpenAI-compatible local model for the paid Anthropic API, making Claude Code multi-agent workflows accessible at zero inference cost.
Claude Desktop ships a gateway mode that routes inference to any OpenAI-compatible endpoint instead of the Anthropic API. By pointing it at LM Studio local server and loading a Gemma model aliased as claude-opus-4.8, you satisfy Claude model-discovery without an Anthropic account. The Dynamic Workflows feature lets a single /deep-research command spawn up to 16 concurrent agents with a 1,000-agent-per-run ceiling, all on your local machine. The main tradeoff is prefill latency: Claude Code first message includes roughly 30,000 tokens of system context, and local models are slower at ingesting that than at generating responses.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
Introduces Claude Dynamic Workflows, names the two backend options (LM Studio local, OpenRouter cloud), and frames the cost angle.

Walks through official setup guide. Step 1: download Claude Desktop. Step 2: do not sign in with an Anthropic account.

In Claude Desktop settings: set connection type to gateway, credential kind to static API key.

Download, model search interface, GPU compatibility green-tick indicator, three setup steps.

Enable developer mode in LM Studio, open Developer tab, copy local server URL, paste into Claude gateway base URL field.

Claude rejects models not named Sonnet/Opus/Haiku. Fix: rename the model API identifier to claude-opus-4.8 in LM Studio load settings.

Local models lack built-in web search. Toggle disables native tools so Claude looks for MCP connections.

If already signed in to Claude, sign out first to see the gateway login screen. Continue without account.

Copy NPX install snippet from BraveSearch docs, use Claude to merge it into existing claude_desktop_config.json.

16 concurrent agents, 1,000 total per run. /deep-research is a bundled slash command. Live business-plan demo.

Agents running in real time on Gemma 26B. Explains why first message is slow (30K tokens). Suggests leaving the computer for 1-2 hours.
Claude Desktop gateway mode is an official feature that lets you substitute any local model for the Anthropic API, and one naming convention is the only non-obvious requirement.
“Instead of us using the paid API from Anthropic or even needing an Anthropic account, I'm gonna show you how to do this by using local AI models that are running completely on your computer.”
“The gateway returned no usable models -- Claude is only looking for things that have Sona or Opus or Haiku.”
“A dynamic workflow is a JavaScript that lets you basically deploy hundreds of sub agents.”
“From the very first message, we're sending like 30,000 tokens. It's a lot. But then from here, you can literally just leave your computer.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
A thousand agents, zero API bill. The Claude Desktop app ships a gateway mode that routes all inference to a local LM Studio server -- and the only non-obvious step is a model-renaming trick that takes thirty seconds.
Two phases of local LLM inference. Prefill determines time to first token; decode determines generation speed. Local hardware bottlenecks at prefill for large contexts.
Anthropic term for running Claude Desktop against a third-party inference provider, enabling any OpenAI-compatible endpoint as a drop-in replacement.
“If you enjoyed it, I would appreciate if you could like the video, drop a comment, or subscribe to my channel.”
Standard subscribe ask after content wraps. Also seeds a follow-up video on OpenRouter integration.
00:00
00:13
00:23
00:32
00:41
00:50
00:57
01:09
01:18
01:27
01:36
01:45
01:54
02:04
02:13
02:22
02:31
02:40
02:49
02:56
03:08
03:17
03:26
03:35
03:45
03:54
04:03
04:12
04:21
04:31
04:40
04:49
04:58
05:07
05:16
05:26
05:35
05:44
05:53
06:02
06:12
06:21
06:30
06:39
06:48
06:58
07:07
07:16
07:29
07:34
07:43
07:53
08:02
08:11
08:20
08:29
08:39
08:48
08:57
09:04
09:15
09:25
09:30
09:43
09:52
10:01
10:10
10:20
10:29
10:38
10:47
10:56
11:06
11:15
11:23
11:32
11:42
11:52
12:03
12:10A 47-minute walkthrough of all seven levels of Hermes Agent — from bare VPS to full MCP back end.
May 6thA 17-minute walkthrough of Max Hermes: the cloud-hosted Hermes agent that costs 95% less than Opus 4.7 and writes its own skill playbooks after every task.
June 2ndA 29-minute walkthrough of the Four Cs framework for running your entire business through Claude Code.
May 29thA 16-minute breakdown of why AI browsers lost before they launched and how Codex and Claude Code absorbed the browser entirely.
May 28thHow to wire a top-10 ranked free reasoning model into an open-source persistent agent harness and what you can actually do with it.
May 25thHow one Claude Code operator, trained with 24 skills, handles AI video generation, editing, and multi-platform distribution without touching a single platform UI.
May 19th