Modern Creator
David Ondrej · YouTube

Hermes Agent is crazy… 180,000+ github stars

How MiniMax M3 sparse-attention architecture makes always-on autonomous agents 10–100x cheaper than running Opus or GPT-5.

Posted
yesterday
Duration
Format
Tutorial
hype
Views
17.2K
690 likes
Big Idea

The argument in one line.

MiniMax M3 is the first AI model to break the price-to-capability curve, delivering frontier benchmark performance at one-twentieth the compute cost by skipping irrelevant tokens during attention.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You run Hermes Agent, OpenCode, or any BYOK autonomous agent and keep hitting monthly API cost limits.
  • You want to know whether a low-cost model can actually match Opus 4.8 or GPT-5.5 on agent-specific benchmarks before switching.
  • You are building always-on background agents and need a model cheap enough to run 24/7.
  • You want a setup guide for using MiniMax M3 Token Plan subscription key inside an OpenAI-compatible agent.
SKIP IF…
  • You only use chat interfaces occasionally and have no agent workflows to optimize.
  • You need enterprise SLAs, data residency guarantees, or US-hosted inference.
  • You are happy with your current API spend and are not running long multi-tool sessions.
TL;DR

The full version, fast.

MiniMax M3 uses Sparse Attention (MSA) to process only the relevant fraction of a 1M-token context window, cutting per-token compute to 1/20th of a standard transformer. At $0.60/$2.40 per million input/output tokens -- currently 50% off -- it is 10-20x cheaper than Opus 4.8 while matching or exceeding it on BrowseComp, SVG Bench, and SWE Bench Pro. The $20/month Token Plan gives 1.7B tokens, equivalent to roughly $1,300 of Opus API credits. The video demonstrates three agents running in parallel spending under 10 cents total, with open-weights release promised around June 10, 2026.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:02

01 · Hook

100x cheaper claim stated immediately

00:0201:21

02 · Why MiniMax M3

Benchmark comparisons vs Opus/GPT-5.5/Gemini; price-to-capability line framing; 1M context; 24-hour 2,000-tool-call sessions

01:2102:36

03 · MSA architecture + pricing

Sparse Attention explained with diagram; Opus 4.8 vs M3 pricing table; 50% discount

02:3604:05

04 · Token Plan value math

$20/month = 1.7B tokens = ~$1,326 Opus API equivalent; sponsor CTA

04:0506:01

05 · Install Hermes Agent

curl installer from GitHub; provider selection; subscription key entry; model = minimax-m3; launching TUI

06:0107:13

06 · Hermes deep research demo

Prompt: 20 tech/AI events in Polish cities over 90 days; agent starts tool calls; usage at 1%

07:1308:42

07 · OpenCode + second agent

OpenCode connected to same subscription key; Doodle Jump prompt; two agents running simultaneously

08:4210:04

08 · Three agents simultaneously

Third OpenCode for SVG animation; Hermes + 2x OpenCode; total spend: 1 cent; token usage under 1%

10:0412:21

09 · Deep research result + comparison

Hermes delivers 20-event report; compared vs Perplexity Computer; verdict: merge both outputs

12:2115:25

10 · SVG animation + cost reveal

Rifle assembly/fire/fade SVG at $0.0025; model overthinking workaround noted

15:2517:02

11 · 2D game demo + live debug

1,100-line Doodle Jump clone with graphics and audio at $0.40; platform spacing bug; screenshot sent to agent; 9 cents total

17:0218:07

12 · Open weights + architecture detail

M3 open weights ~June 10; MSA diagram; multimodal: video, image, code, text

18:0720:17

13 · Recap + CTA

All three demos summarized; 12% affiliate discount; plan selection guidance

Atomic Insights

Lines worth screenshotting.

  • MiniMax M3 is the first model to sit above the price-to-capability line, giving more benchmark performance per dollar than any model before it.
  • MSA (MiniMax Sparse Attention) first indexes which tokens matter, then attends only those blocks -- achieving 1/20th the compute of a normal transformer at 1M context.
  • The $20/month MiniMax Token Plan gives 1.7 billion tokens, roughly equivalent to $1,300 in Opus 4.8 API credits -- a 50-65x value multiplier.
  • Running three simultaneous agents (Hermes deep research + two OpenCode instances) for 20 minutes cost under 10 cents on the Token Plan.
  • MiniMax M3 can run a single agentic session for over 24 hours straight, executing up to 2,000 tool calls with no human intervention.
  • On BrowseComp -- the benchmark most relevant to web-scraping agents like Hermes -- M3 outperforms Opus 4.8 and roughly matches GPT-5.5.
  • The Token Plan subscription key is OpenAI-API-compatible: paste it into any agent the same way you would an Anthropic or OpenAI key.
  • MiniMax M3 deep research output matched Perplexity Computer ($200/month) on major conference coverage while costing an estimated 40-100x fewer credits per run.
  • The model tends to over-plan before writing code; telling it to stop thinking and write the file directly is the documented workaround.
  • MiniMax committed to releasing M3 open weights on Hugging Face and GitHub within 10 days of launch, which would make it the most capable open-weights model available.
Takeaway

Cheaper models beat expensive ones when the task is long and tool-heavy.

WHAT TO LEARN

The per-token cost of a model only matters when you understand how agent workloads actually distribute tokens -- and that understanding changes which model you should use.

  • Agent workloads are input-heavy: deep research and long-context coding sessions generate roughly 90% input tokens and 10% output, which means a model with a low input price has a disproportionate cost advantage over one with a balanced price.
  • A subscription token pool and a pay-per-request API key are not interchangeable -- the flat monthly quota can represent 50-65x more API value for workloads that run continuously or in parallel.
  • Running multiple agents simultaneously against the same token budget is cheaper than running them sequentially, because the subscription quota resets on a rolling window rather than billing per session.
  • Benchmarks designed for static code generation tasks underpredict how a model performs in agentic loops; BrowseComp and tool-call success rate are more predictive for Hermes-style workloads.
  • Model over-planning -- reasoning extensively before writing output -- burns tokens without proportional quality gain on simple tasks; explicit instructions to skip planning and write directly reduce cost and latency.
Glossary

Terms worth knowing.

MSA (MiniMax Sparse Attention)
A two-stage attention mechanism that identifies which token blocks are relevant (index branch), then runs attention only on those selected blocks (sparse branch), reducing compute to roughly 1/20th of a standard transformer at 1M context length.
Token Plan
MiniMax subscription pricing tier: a flat monthly fee for a large pool of shared tokens with an OpenAI-compatible API key, as opposed to pay-per-request API pricing.
Hermes Agent
An open-source autonomous AI agent by Nous Research (180,000+ GitHub stars) that runs in a terminal TUI, supports long tool-heavy loops, and accepts any OpenAI-compatible model as its backend.
OpenCode
An open-source terminal-based coding agent similar to Claude Code that connects to any OpenAI-compatible provider via a /connect command.
BrowseComp
A benchmark measuring a model ability to research and retrieve information through web browsing, particularly relevant for agents that use search tools.
Price-to-capability line
The conceptual boundary where historically higher cost equals higher capability; M3 is presented as the first model to break above this line.
Resources

Things they pointed at.

Quotables

Lines you could clip.

00:15
For the first time ever, an AI model broke the price-to-capability line, where the more you pay, the more you get.
declarative breakthrough claim, no setup neededTikTok hook↗ Tweet quote
04:30
If you have the Claude $20 plan, you are hitting limits constantly, every single day. With this, good luck. You would have to be a serious AI engineer to hit limits.
direct competitor comparison, relatable pain for Claude Code usersIG reel cold open↗ Tweet quote
14:00
The fact that these are comparable deep research reports, where one of them is 10 times cheaper and gives you probably 40 to 100 times more deep researches per month.
concrete value quantification with real-world comparisonnewsletter pull-quote↗ Tweet quote
17:45
At context length 1 million, M3 per-token compute is just 1/20th of that of a standard transformer.
single precise technical claim, no context neededTikTok hook↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:00You can actually run Hermes agent for 100 times cheaper. Here is how.
00:04So a major AI breakthrough just happened. For the first time ever, an AI model broke the price to capability line, where the more you pay, the more you get. And everything so far has been either on this line or below.
00:16But MiniMax m three is a new AI model that for the first time ever is above this line. But most people, for some reason, still only use cloth or GPT, which is costing them hundreds of dollars per month every month, while minimax is nearly free.
00:31Oh, and it's also really, really good. On SWE bench pro, it outperforms not only Gemini 3.9 pro, but also GPT 5.5. On SVG bench, it outperforms Opus GPT 5.5 and Gemini.
00:44And on BrowseComp, which is very important for Hermes agent, it is better than Opus and roughly similar as GPT 5.5. And this is just one of many reasons why Minimax m three is perfect for Hermes agent. It's also cheap enough to run twenty four seven, which makes always on agents finally possible.
01:02Also, given the large context window of Minimax m three, it can do up to 1,000,000 tokens. It can hold your entire project in memory, not forgetting anything.
01:10It's also built for long tool heavy loops such as the slash goal feature inside of Hermes. In fact, Minimax m three can run for over twenty four hours straight doing up to 2,000 different tool calls with no human intervention.
01:24That's incredible. Now you might be thinking, but, David, how is all of this even possible? How come Minimax m three can outperform models 10 to 50 times more expensive?
01:33Well, the answer is the architecture. It has a novel architecture called MSA, which stands for minimax sparse attention.
01:40And I know it sounds complex. It sounds difficult, but, actually, it's very simple and very easy to understand. And understanding it will give you an unfair advantage over everyone else.
01:49So here is MSA clearly explained. Normal transformers read every single token in context every time. That's why long context workflows cost a lot of money.
01:57MSA, on the other hand, checks which parts of the context matter and then only reads those parts skipping everything else. So the result is a full 1,000,000 context window at one twentieth of the compute. And, of course, the pricing of MiniMax is just insane.
02:12For comparison, OPUS 4.8 costs $5 for a million input tokens and $25 for a million output tokens. MiniMax m three costs $0.60 and $2.40.
02:22And on top of that, now it's 50% off, which makes it 10 to 20 times cheaper than Opus. This is how you can run Hermes agent twenty four seven for just a couple of dollars. Real quick.
02:32Are you building something impressive? I'm gonna be hopping on a call personally with people who are building impressive stuff with AI.
02:39So if that's you, make sure to fill this out. Again, it's the second link below the video. It's completely free to apply.
02:44Now to put the insane cost effectiveness of Minimax into perspective, when you get the $20 a month plan, you get 1,700,000,000 tokens per month. Not million, billion.
02:55And at the current pricing, okay, not the discounted 50% off pricing. The full pricing, this plan, 1,700,000,000 tokens, gives you a blended average of 1,326 per month of API usage.
03:08Okay? Which if we compare that to Opus 4.8 tokens, would be $11,900 of monthly API credits if you get the $20 plan for a minmax.
03:20I mean, for the max plan, which costs $50 a month, you literally get over 5,000,000,000 tokens per month, which I don't think you can get anywhere else. And since many of you are interested in running Hermes agent as cheaply and efficiently as possible, I reached out to the Minimax team to sponsor this video and agreed.
03:37So if you want 12% off any of these paid plans, click the first link below the video and make sure to apply. Huge thank you to the Minimax team for sponsoring this video. With that being said, let's get to building.
03:47So first, go to Google while it still exists and type in Hermes Agent GitHub to go to the official GitHub for Hermes Agent. So, obviously, Hermes is fully open source, so it's free to download. Scroll down here until you reach the quick install section, which gives you this one command.
04:01Click on that. Boom. Open any terminal on your computer.
04:05I'm just using the default Mac OS one, but you can use c mux, t mux, whatever you want. Paste it in. Hit enter, and this will begin installing Hermes Agent on your computer.
04:13If you don't have it installed, it will take, like, two minutes to install it. And then we need to select the Minimax token plan. I have the $20 a month, but I do recommend getting the $50 a month if you hit the limits.
04:23But it's very hard to hit limits on 1,700,000,000. Yeah. The the value you get for this is really incredible.
04:29Anthropic cannot compete with this at all. Like, if you have the cloth $20 mark subscription, you're hitting limits constantly, every single day, multiple times. With this, good luck.
04:37You'd have to be a serious AI engineer to hit limits. Right? When the install finishes, you'll see the option to select the plans.
04:43So all the way, scroll down here and you can see minimax. Here we have three options, minimax o off, minimax default, and minimax China.
04:50So here, what you wanna do is select the first one, minimax, and we need to get the API key. So, again, when you click the first link below the video, you'll be brought to the minimax pricing plan. Here, you can go to the top right, click on console after you create an account, which is super easy.
05:03You could just create one with Google. Takes, like, ten seconds. So in here, inside of the Minimax platform, go to the left and click on plan details, and here you have your subscription key.
05:13This allows you to use your subscription to power Hermes agent and many other AI agents such as OpenCode, which I'm gonna show you later in the video, which actually, you can use Minimax m three to build anything, build any type of software, whether it's a AI startup, whether it's internal software, whether it's just a cool demo for you, for your friends, for your family.
05:30You can use it inside of open code, which is also free and open source with the subscription key from here. And, again, I'm gonna show you that later in the video. But first, copy it, switch back to the terminal, and paste that in.
05:40K. Here, it's asking for the base URL. You can just hit enter.
05:43And for the model, obviously, select Minimax m three. Okay. Here, you can keep current messaging platforms.
05:48We can skip that for now, and we can click done. And just like that, we've set up Hermes agent locally. So I can do clear and type in Hermes, and this should launch the Hermes agent t u I, terminal user interface.
06:01Here we go. We have minimax m three here. We can do hay, and this is powered with my token plan.
06:06Right? So I'm using the $20 a month plus plan from minimax to power Hermes agent. And again, this gives me 1,700,000,000 tokens per month.
06:16So you can literally use Hermes agent $24.07, and you'll probably not run out. And if you do run out, you upgrade to the 50 plan, which gives you 5,100,000,000 tokens and then you will not run out.
06:26So what I'm gonna do is I'm gonna c d into a folder and I'm gonna launch Hermes agent in here and I'm gonna tell it use the following repository to save any and all files.
06:37Do not create or modify files outside of this project. K. Send delimiter, and I'm gonna send it the prompt.
06:43So this is a deep research prompt, and, basically, I'm gonna compare it how good it would perform against something like Perplexity Computer. So I'm gonna send the same prompt into Perplexity Computer, which, of course, costs $200 a month.
06:56Uh, that's a lot more expensive, and the usage is way less. Okay? So we can easily run out of these $200 a month, these credits, inside of Perplexity computer.
07:06But for Minimax, you can actually check your plan plan usage, and you can see that so far we are at 1% of our weekly quota. Okay?
07:14So you can do many, many deep researches with this, and you're not gonna hit your limit. So this Hermes agent will be running and doing this deep research about events in Katowice, Krakow, and Warsaw over the next ninety days. So different AI tech business events, and I want to put everything into a single m d file.
07:32So, yeah, it's gonna do this deep researches, and I'm gonna let it run. In the meantime, let me show you how to build anything with Minimax m three by using it inside of OpenCode. So if you don't know, OpenCode, it's like a open source alternative to CloudCode, and, again, it's completely free.
07:45So you can just go to opencode.ai and copy this one liner installer command. And while this is running, I'm gonna do a new terminal, and I'm gonna paste that in to install it locally on my MacBook.
07:58As you can see, it's pretty fast. Boom. There it is.
08:01So I'm gonna clear and type in open code. And, of course, we wanna use Minimax inside of OpenCode. So type in slash connect to connect a new provider.
08:10I'm gonna have in minimax. There it is. Minimax.io.
08:14And we need to put in the same API key. So let's go back to plan details here and copy the subscription key. Boom.
08:20Paste that in and hit enter. Select Minimax m three, and here here we are. Inside of open code, literally, how fast was that?
08:27Like, twenty, thirty seconds? And we are chatting with Minimax m three inside of another agent. Right?
08:33So not only do we have Hermes agent here doing a deep research using the subscription, here we have OpenCode also using the Minimax token plan subscription, but here we can optimize it for coding.
08:45So I'm gonna give it a prompt. So the first thing I wanna test is a simple two d game, Doodle Jump platformer game as a single contained HTML file. So let's see how good it is at that very use case.
08:57And inside of OpenCode, can see the usage at the bottom. So so far, we spent 1¢, 1p.
09:03We're gonna see how expensive this is, but I already know that this is not even gonna be comparable to Opus or g b d 5.5 because of how insanely efficient Minimax m three is. And let me highlight this once more because I think a lot of you don't realize the significance of this. So far, every single model has been on this line or below, which means you have to pay more to get more capability, but the Minimax m three broke the line.
09:25It's the first model that gives you way more for the same amount, completely crushing Sonnet 4.6, by the way, which is not only over 10 times more expensive, but also worse at long horizon coding tasks, which is the type of stuff we're currently testing. Right? And, again, I'm gonna be jumping on a call with a few of you who are building the most impressive things with AI completely for free, so make sure to fill this out.
09:46It's the second link below the video. Right. Let's check on Hermes, how it's doing.
09:50Still debrief searching. So far, used 13% of the context window, 60,000 tokens. Inside of open code, we are at still one penny.
09:58That's insane. Let's check the plan details here. Let me reload plan usage.
10:02Okay. So we're at 9% of our five hour limit and still 1% of our weekly limit.
10:08Incredible. I wanna see how good this game will be. And, actually, one more thing I wanna test while these two are still running is how good Minimax m three is at creating SVG files, right, and animations with SVG because it's crushing on the SVG bench eval.
10:21So what I'm gonna do is I'm gonna open a new terminal, command n, type in open code. We can launch another open code instance while the previous one is running, and we can tell it build the following SVG animation. And I'm gonna paste in a description of what I want to build.
10:37A single self contained SVG file utilizing animate, animate transfer, blah blah blah. It's a pretty long prompt describing exactly what I want. So what this should show is a looping animation of different rifle parts assembling, shooting a shot, and then fading out.
10:52So it's not just a simple image, static image. It should be a SVG animation. And we can see that there is already thousands of tokens of mini max.
10:59Let's check the usage. This is the main thing, guys, because I'm telling you, you can still less than 1% of the weekly limit, incredible. Let's look at the Hermes deep research.
11:07It's still researching. It's being very, very thorough, running for twelve minutes.
11:12I told you, it can run for over twenty four hours doing nearly 2,000 tool calls without any human interruption. And, again, if you want 12% off any of the token plans for Minimax, make sure to click the first link below the video and get any of these plans for yourself.
11:26You're gonna get 12% off of this already incredible price. Okay. So it seems like Hermes agent is finalizing the deep research with 20 different events happening over the coming weeks and months.
11:38And the huge benefit is that, obviously, Hermes agent is open source running locally. Minimax is going to be open source. So far, they open source previous models.
11:46And the cheapest plan is 10 times cheaper than the cheapest Perplexity computer plan, which obviously is $200 a month. And, again, I'm running out of this usage pretty often.
11:57Like, even for $200 a month, this doesn't last at all. With this, you'll probably get, like, 50 x more deep researches inside of Hermes than with Perplexity Computer. And, again, this is cloud hosted, so who knows where these files are stored?
12:10You're have to trust Perplexity with all of this and all their data. K. There it is.
12:14So here is the formatting. Coverage by city, Katowice three, Krakow seven, Warsaw 10. I mean, yeah, Warsaw is the biggest city, so that makes sense.
12:22Top five local recommendations. Cybersecurity expo and forum, Katowice, June 1516.
12:29Interesting. And here we have all the sources at the bottom. Very clean report, not overly bloated.
12:34That's also important. It's not like hundreds of pages of just AI slop. Let's look at what Perplexity Computer gave us.
12:41A lot of these events are the same. I mean, you cannot really invent, you know, other events like that. Okay.
12:47So the events it recommended are different. Interesting.
12:50Actually, what we can do, I'm gonna kill this Hermes agent. Clear Hermes.
12:58Let's say list out all folders in this repo and list out all files in slash reports folder. This is a separate RMS a separate session, so it will have no bias towards which is better.
13:13And we can have Minimax analyze these two reports and compare them. Okay. Now analyze both of these markdown files, read them in full, and give me a clear and objective comparison.
13:24Be very concise. Right. The SVG should be finishing up now.
13:27So let's look at it. Comparison format. This is pros per event selection.
13:32This is structure table. Yeah? Event overlap.
13:35Okay? Unique to Perplexity and Computer. So some events are unique.
13:39Some of them overlap. Perplexity was stronger on small free meetups. Events is stronger on big conferences.
13:46The top five picks were different, and let's get the verdict. Events MD is the stronger calendar. Perplexity is better niche meetups.
13:54Neither is complete merging them. So that's crazy. The fact that these are comparable deep research reports, where one of them is 10 times cheaper and gives you probably like, my best estimate is, like, 40 to 100 times more deep researches per month compared to what you would get on the $200 per month plan here, it's really insane the efficiency of the Minimax token plan.
14:18We're at 3% of our weekly limit, and we're running multiple things here still. This SVG is gonna be crazy. I don't know why it's cooking for so long, but this better be good.
14:28And look at the spend. 9 pennies, 9¢. And let me remind you that this is not some cheap seven b model.
14:35In fact, artificial analysis puts it as one of the top models in terms of intelligence, literally neck to neck with Gemini 3.1 from Google, which is way bigger model, way more expensive to run. And on the intelligence index, it literally is ranked as one of the best models in the world.
14:51So this is not just some AI model, a small model that's cheap to run. This is literally one of the best models out there that's also the most cost efficient model to run. Okay.
15:01So we have this SVG file finally. Took a while because it kept overthinking, so I told it to write the file.
15:07Let's see. Okay. Great.
15:08It's written. So I say, now open this SVG for me in brave browser as a new tab.
15:16You can literally do anything if you just speak English. Like, you need to verbalize your thoughts and describe what you want the agent to do and it can just do it. Alright.
15:24So there it is. Rifle is being assembled. Okay.
15:27That's actually pretty good. Right? Chambered.
15:31Shot the gun. Now it should disappear. This is impressive, guys.
15:36And, again, 0.25¢ SVG animation that can be used in video games, in graphics, in videos, video editing, landing page on your site, and the shell is running out.
15:47Guys, this is this is good. This is good for 0.25. Pretty insane.
15:54Let's see the other instance. This is building the two d game.
15:59This is a bigger project admittedly, but I also had to tell it to stop overthinking because this model is very perfectionist. Right? I mean, there's a reason why it's performing really well on all these benchmarks.
16:08This model likes to think a lot. It likes to reason a lot. And sometimes that's good when you want a really detailed report, a really impressive refactor.
16:15But if you want something simple and quick, you need to kinda tell it, like, stop planning, stop thinking, do not overthink, just write the file now. So, hopefully, we can test the game soon.
16:25So the five hour limit, we're at 25%. And by the this resets in ten minutes. Right?
16:29So we cannot even get to how do you even use this out? Like, I'm using three agents, two open codes, one Hermes agent.
16:36I'm even struggling to hit the limit. This is pretty impressive. Okay?
16:39You literally need to be running four to eight agent in parallel twenty four seven to even, like, hit these limits. Anyways, we already know it can do detailed deep research on the level of Perplexity Computer, but even better, while doing that for a fraction of the cost.
16:53We already know it's great at SVGs. In fact, it outranks other frontier models at SVG bench. So let's see how good it is at two d games.
17:03And, yeah, this is the beautiful multi modality of Minimax m three. It's great at video, images, coding, text, all of the things you would expect a Frontier AI model to do in 2026.
17:14And by the way, the Minimax team has committed to open weights. So this model will be open weights on Hugging Face and GitHub within ten days of its release, so that's gonna be around June 10, which means it will probably make Minimax m three the most powerful open weights model in the world the moment they open weights.
17:31It's not open source, but it's gonna be open weights. And in fact, we have the HTML file down, so I'm gonna say open that in brave browser as a new tab, and let's see how the game performs.
17:43It's 1,100 lines of code. And here we are.
17:48Hopper. Okay? The graphics are nice.
17:51Okay. How do we play it? Resume.
17:54We need to jump. Okay. Look at the sounds.
17:58I need to pause my music. Okay. The sounds are pretty solid.
18:02Is there some boost? No. What are the what are the keyboard controls?
18:11Answer in short. KAD okay.
18:14So the issue is some of these look at this. Graphics are nice.
18:19The sounds are nice. There's no double jump. Okay?
18:23Make sure the platforms are closer to each other. That is the main issue. Most of these levels have the platforms too far apart.
18:32Okay. This one should be playable. Look at this.
18:36Let's jump to the left. Okay. Too far.
18:39Yeah. It needs to be either a double jump mechanic or the platforms need to be a bit closer, but not a bad game.
18:46Certainly not about graphics. And, again, open weight model, all available for, like, what?
18:520.4? Pretty crazy. Alright.
18:56Let's let's reload this actually. The gaps should be smaller. Alright.
18:59Here we go. Let's see how it goes. What is this?
19:02Some boost. Oh, it crashed.
19:07I'm gonna send it a screenshot, and, obviously, we can fix it and keep improving it forever. This type of platform crashes the game.
19:17It's gonna analyze that image and, you know, fix the bug. You get the point. So you can build any type of software with it, whether it's a two d game, whether it's SVG graphics, whether it's deep research, full stack web app, internal software.
19:30This model is inherently multi modal, and it's competent on all of the Frontier and Enchanted capabilities. You can basically do anything you needed to do for a fraction of the cost that other more other models are.
19:41So, again, most people are stuck to just using Cheshire Beer and Cloth. If you sit down and if you actually watch this video again and implement minimax, you're gonna be able to do everything these models can do for, like, 10 to 20 times less.
19:53So if you care about costs and if you wanna run Hermes agent in the most efficient way possible, set up the Minimax token plan connected like I did in the video. And, again, it's the first link below the video. Just choose the plan that works for you.
20:05If your budget is limited, go with the $20 a month. If you want 5,000,000,000 tokens per month, go with the max plan. But either way, click the first link below the video to get 12% off.
20:13And with that being said, thank you guys for watching, and have a wonderful rest of the
The Hook

The bait, then the rug-pull.

The claim lands before a single slide appears: run the hottest open-source agent stack for a hundredth of the usual cost. What follows is a live proof of concept -- three autonomous agents running in parallel on a MacBook, racking up a total bill measurable in pennies.

Frameworks

Named ideas worth stealing.

01:38model

MSA (MiniMax Sparse Attention)

Two-stage attention: index branch scores token relevance, sparse branch attends only selected blocks. Net result: full 1M context at 1/20th compute of a dense transformer.

Steal forexplaining why cheap frontier models are now possible without architectural compromise
00:15concept

Price-to-capability line

Historical framing that every model sits on a cost-capability boundary. M3 positioned as first model above it.

Steal forany cost-vs-quality positioning argument in AI product content
03:02list

Token Plan value math

  1. $20/month = 1.7B tokens
  2. 90/10 input-output split = ~$1,326 Opus equivalent
  3. 50-65x value multiplier
  4. $50/month = 5.1B tokens

Converting subscription tokens to equivalent API credits using typical agent input/output ratios.

Steal forcomparing subscription vs API pricing for any AI tool
CTA Breakdown

How they asked for the click.

VERBAL ASK
19:00product
Click the first link below the video to get 12% off any of these paid plans.

Soft sell repeated three times at t=154, t=690, t=1140. Affiliate discount code in Token Plan URL. Secondary CTA for builders call-in form.

Storyboard

Visual structure at a glance.

hook claim
hookhook claim00:00
benchmark slides
valuebenchmark slides00:15
MSA diagram
valueMSA diagram01:38
token plan math
valuetoken plan math03:02
GitHub install
valueGitHub install04:05
Hermes TUI live
valueHermes TUI live06:01
research result
valueresearch result10:04
SVG animation
valueSVG animation12:21
2D game demo
value2D game demo15:25
final CTA
ctafinal CTA19:00
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

25:39
David Ondrej · Tutorial

Build Anything with Tmux, Here's How

A 25-minute walkthrough of running long-lived AI coding agents on a VPS by wrapping every session in tmux — so closing a laptop, killing an SSH connection, or losing power never interrupts a job that's supposed to run for 24 hours.

May 25th
Chat about this