Hermes /goal is insane
A 26-minute step-by-step tutorial on the agentic loop command that runs until your goal is actually done.
May 16thHow MiniMax M3 sparse-attention architecture makes always-on autonomous agents 10–100x cheaper than running Opus or GPT-5.
MiniMax M3 is the first AI model to break the price-to-capability curve, delivering frontier benchmark performance at one-twentieth the compute cost by skipping irrelevant tokens during attention.
MiniMax M3 uses Sparse Attention (MSA) to process only the relevant fraction of a 1M-token context window, cutting per-token compute to 1/20th of a standard transformer. At $0.60/$2.40 per million input/output tokens -- currently 50% off -- it is 10-20x cheaper than Opus 4.8 while matching or exceeding it on BrowseComp, SVG Bench, and SWE Bench Pro. The $20/month Token Plan gives 1.7B tokens, equivalent to roughly $1,300 of Opus API credits. The video demonstrates three agents running in parallel spending under 10 cents total, with open-weights release promised around June 10, 2026.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
100x cheaper claim stated immediately

Benchmark comparisons vs Opus/GPT-5.5/Gemini; price-to-capability line framing; 1M context; 24-hour 2,000-tool-call sessions

Sparse Attention explained with diagram; Opus 4.8 vs M3 pricing table; 50% discount

$20/month = 1.7B tokens = ~$1,326 Opus API equivalent; sponsor CTA

curl installer from GitHub; provider selection; subscription key entry; model = minimax-m3; launching TUI

Prompt: 20 tech/AI events in Polish cities over 90 days; agent starts tool calls; usage at 1%

OpenCode connected to same subscription key; Doodle Jump prompt; two agents running simultaneously

Third OpenCode for SVG animation; Hermes + 2x OpenCode; total spend: 1 cent; token usage under 1%

Hermes delivers 20-event report; compared vs Perplexity Computer; verdict: merge both outputs

Rifle assembly/fire/fade SVG at $0.0025; model overthinking workaround noted

1,100-line Doodle Jump clone with graphics and audio at $0.40; platform spacing bug; screenshot sent to agent; 9 cents total

M3 open weights ~June 10; MSA diagram; multimodal: video, image, code, text

All three demos summarized; 12% affiliate discount; plan selection guidance
The per-token cost of a model only matters when you understand how agent workloads actually distribute tokens -- and that understanding changes which model you should use.
“For the first time ever, an AI model broke the price-to-capability line, where the more you pay, the more you get.”
“If you have the Claude $20 plan, you are hitting limits constantly, every single day. With this, good luck. You would have to be a serious AI engineer to hit limits.”
“The fact that these are comparable deep research reports, where one of them is 10 times cheaper and gives you probably 40 to 100 times more deep researches per month.”
“At context length 1 million, M3 per-token compute is just 1/20th of that of a standard transformer.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
The claim lands before a single slide appears: run the hottest open-source agent stack for a hundredth of the usual cost. What follows is a live proof of concept -- three autonomous agents running in parallel on a MacBook, racking up a total bill measurable in pennies.
Two-stage attention: index branch scores token relevance, sparse branch attends only selected blocks. Net result: full 1M context at 1/20th compute of a dense transformer.
Historical framing that every model sits on a cost-capability boundary. M3 positioned as first model above it.
Converting subscription tokens to equivalent API credits using typical agent input/output ratios.
“Click the first link below the video to get 12% off any of these paid plans.”
Soft sell repeated three times at t=154, t=690, t=1140. Affiliate discount code in Token Plan URL. Secondary CTA for builders call-in form.
00:00
00:29
00:31
00:51
01:06
01:24
01:38
01:48
02:09
02:28
02:32
02:53
03:08
03:18
03:37
03:53
04:11
04:19
04:40
04:54
05:11
05:23
05:38
05:57
06:07
06:26
06:43
06:58
07:07
07:28
07:40
07:59
08:15
08:29
08:46
09:00
09:13
09:37
09:45
10:00
10:10
10:31
10:46
11:00
11:18
11:29
11:47
12:00
12:13
12:39
12:41
13:03
13:18
13:33
13:49
14:04
14:15
14:32
14:49
15:00
15:24
15:36
15:50
16:05
16:16
16:33
16:45
17:00
17:14
17:34
17:45
18:07
18:22
18:38
18:53
19:08
19:30
19:38
19:54
20:05A 26-minute step-by-step tutorial on the agentic loop command that runs until your goal is actually done.
May 16thPietro Schirano left Anthropic, built MagicPath in a week, and raised funding from a single tweet. Here he explains exactly how he builds — and why he hasn't touched Claude Code in five months.
June 4thA 47-minute walkthrough of all seven levels of Hermes Agent — from bare VPS to full MCP back end.
May 6thA 31-minute live walkthrough of giving a Hermes AI agent autonomous phone capabilities via VAPI -- from booking a 5-cent spa call to 24/7 cold-outreach cron jobs.
May 20thA 25-minute walkthrough of running long-lived AI coding agents on a VPS by wrapping every session in tmux — so closing a laptop, killing an SSH connection, or losing power never interrupts a job that's supposed to run for 24 hours.
May 25thDavid Ondrej installs SuperGemma4-26b locally via Ollama, then open-sources a two-day Claude+Codex build: an automated loop that discovers which prompt harnesses make commercial models answer what they normally refuse.
May 11th