GLM-5.2 is MIT-licensed, 1M context, trained without American silicon — and it trails Opus 4.8 by just 1% on the hardest coding benchmark.
Posted
yesterday
Duration
Format
Talking Head
educational
Views
19.5K
606 likes
Big Idea
The argument in one line.
Open-weight Chinese models are closing the gap on closed frontier AI faster than incumbents can maintain their lead, with GLM-5.2 matching Opus 4.8 within 1% on long-horizon coding while running under MIT license with no geographic restrictions.
Who This Is For
Read if. Skip if.
READ IF YOU ARE…
You track open-source AI releases and want a thorough benchmark breakdown of GLM-5.2 against Opus 4.8 and GPT-5.5.
You are evaluating Command Code CLI as a cheaper harness for open models at $1/month versus premium alternatives.
You care about AI infrastructure trained outside the US chip stack for cost, sovereignty, or geopolitical reasons.
You want to see a live one-shot agentic demo — a playable multi-level platformer game built in a single prompt.
SKIP IF…
You need production-grade reliability today — GLM-5.2 still trails Opus 4.8 on most benchmarks and the gap matters for real engineering.
You have no interest in CLI-based coding agents; the second half is primarily a Command Code feature tour.
TL;DR
The full version, fast.
GLM-5.2 is Z.ai's new open-weight model — MIT-licensed, 1M context, trained entirely without NVIDIA or American silicon. On FrontierSWE it trails Opus 4.8 by just 1% while beating GPT-5.5 outright, and on PostTrainBench it ranks second only to Opus 4.8. The 1M context is made cost-efficient via IndexShare, which identifies the ~1,000 most decision-critical tokens from 800,000 so the transformer skips attending to everything. The video's second half is a sponsored demo where GLM-5.2 builds a full Mario-style platformer game in one prompt through Command Code CLI, and demonstrates /rewind checkpointing, /compact context compression, and session sharing — total session cost: under $1.
Free for members
Chat with this breakdown — free.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
GLM-5.2 announced on X by Z.ai. Key claims: frontier intelligence, open weights, MIT license, 1M context, same price as GLM-5.1.
00:58 – 02:08
02 · Pricing + no US silicon
Pricing table ($1.40/M input, $4.40/M output). First model with solid 1M context. Trained without NVIDIA/American chips.
02:08 – 04:11
03 · Benchmark analysis
FrontierSWE: GLM-5.2 at 74.4% vs Opus 4.8 at 75.1%, beats GPT-5.5. PostTrainBench: 2nd only to Opus 4.8. SWE-Marathon trails Opus 4.8 by 13%.
04:11 – 07:06
04 · IndexShare — technical deep dive
Why 1M context is expensive. IndexShare picks ~1K tokens from 800K. Excalidraw diagram. MTP improvements and RL for long-horizon tasks.
07:06 – 08:30
05 · Full benchmark table
GLM-5.2 vs Qwen3.7-Max, MiniMax M3, DeepSeek-V4-Pro, Opus 4.8, GPT-5.5, Gemini 3.1 Pro across reasoning and coding benchmarks.
08:30 – 10:55
06 · Getting started + sponsor intro
How to use GLM-5.2 via Z.ai Coding Plan or Command Code CLI. Go plan: $1/month covers GLM-5.2, DeepSeek, Qwen, MiniMax, Kimi.
10:55 – 14:30
07 · Live demo — platformer game
Single prompt: build a platformer game, you are live on YouTube. GLM-5.2 creates 8-task plan, builds Pixel Quest with 3 levels, Web Audio API sounds, parallax background.
14:30 – 18:36
08 · Command Code features tour
/init creates AGENTS.md. /share generates public session link. cmd resume <id>. /rewind checkpoint restore of conversation + code.
Sponsor outro for Command Code. Comment bait tweet shown at end.
Atomic Insights
Lines worth screenshotting.
GLM-5.2 trails Opus 4.8 by just 1% on FrontierSWE — the highest-stakes long-horizon coding benchmark — while beating GPT-5.5 outright.
It was trained without NVIDIA chips or any American silicon, which matters for both cost structure and geopolitical resilience.
IndexShare identifies the ~1,000 most decision-critical tokens from an 800,000-token context so the transformer skips attending to the rest — 2.9x fewer FLOPs at 1M length.
At $1.40/M input and $4.40/M output, GLM-5.2 significantly undercuts Opus 4.8 and GPT-5.5 at near-comparable benchmark performance.
Z.ai deliberately wrote 'no regional limits, technical access without borders' in the release post — a direct jab at Anthropic and OpenAI models banned by their own governments.
Command Code's /rewind gives git-like session checkpoints without initializing git — it restores both conversation history and code files together.
Command Code's /compact condenses context when approaching the token limit — the right time is past 500-600K tokens, not at 100K.
The entire platformer game demo — 3 levels, physics, Web Audio API sounds, parallax background — cost under $1 in GLM-5.2 tokens on Command Code's Go plan.
The harness contributes significantly to agent quality: planning, building, and self-verification behavior comes partly from Command Code's scaffolding, not raw model capability alone.
The gap between open-weight and closed frontier models is closing every few months; GLM-5.2 is the highest-ranked open-source model across all three long-horizon benchmarks at release.
Takeaway
Open-weight AI is a geopolitical story now.
WHAT TO LEARN
GLM-5.2 benchmarks within 1% of the best closed model on long-horizon coding, costs a fraction of the price, and was built entirely outside the US chip stack — all three facts matter independently.
A 1M context window is only useful if the model stays reliable across it — GLM-5.2 addresses this by training specifically on long coding-agent trajectories, not just increasing the token limit.
IndexShare is the key architectural insight: selecting the ~1,000 most decision-critical tokens from 800,000 lets the transformer skip expensive full attention — a meaningful efficiency gain that keeps 1M context economical.
Benchmark saturation changes what gaps mean: the difference between 74.4% and 75.1% on FrontierSWE is nearly nothing when both scores are approaching a ceiling, and the trend line matters more than today's margin.
The harness running the model contributes significantly to agent output quality — GLM-5.2's game demo works partly because Command Code's scaffolding forces planning, building, and self-verification regardless of which model is plugged in.
Open-weight models trained outside US infrastructure represent a structural shift: anyone can download GLM-5.2, deploy it on their own hardware, and run it without API dependency or geographic restriction.
Context management is now a skill: using /compact before hitting the token limit (not after), knowing when 600K tokens is the right trigger, and using /rewind instead of re-prompting are habits that change the economics of long agentic sessions.
Glossary
Terms worth knowing.
IndexShare
An architectural technique in GLM-5.2 that places a lightweight indexer at the first 4 transformer layers to identify the ~1,000 most decision-critical tokens in a long context, reducing per-token FLOPs 2.9x at 1M context length with a modest quality trade-off.
FrontierSWE
A long-horizon coding benchmark measuring whether an AI agent can complete open-ended technical projects spanning hours to tens of hours, covering systems optimization, large-scale code construction, and applied ML research.
PostTrainBench
A benchmark where each agent is given an H100 GPU and evaluated by how much it can improve models through post-training; GLM-5.2 ranks second only to Opus 4.8.
SWE-Marathon
An ultra-long-horizon software engineering benchmark covering tasks like building compilers, optimizing kernels, and developing production-grade services over extended compute windows.
MTP (Multi-Token Prediction)
A speculative decoding technique where the model predicts multiple future tokens simultaneously; GLM-5.2 applies IndexShare on the MTP layer to increase acceptance length by up to 20%.
Open weights
A model release where trained weight parameters are publicly downloadable, allowing anyone to run the model on their own hardware without API dependency — GLM-5.2 is MIT-licensed open weights.
“I feel China and these labs are the only players who are doing meaningful work at scale.”
Provocative opinion from a credible commentator — will generate replies→ TikTok hook↗ Tweet quote
09:06
“I am pretty sure this is targeted towards frontier AI labs like Anthropic, like OpenAI which keeps on saying that their models are dangerous and then they get banned by their own governments.”
Names Anthropic and OpenAI directly, connects to a real tension — high engagement→ IG reel cold open↗ Tweet quote
07:36
“The gap is shrinking fast. We could expect GLM 5.3, 5.4 or GLM six to be sort of like an Opus style model which is completely open weights.”
Forward-looking prediction with a specific trajectory — shareable claim→ newsletter pull-quote↗ Tweet quote
The Script
Word for word.
Read-along
Don't just watch it. Burn it in.
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
17px
analogystory
00:00There is a new big open source model that is released today that is GLM 5.2 saying it is doing frontier intelligence and its open weights. And it's a big thing when you get to the benchmarks part.
00:12But if you look at this model the announcement from ZAI there are some interesting things to cover in this model specifically they talk a lot about coding energetic tasks. Right so they have made significant improvements in coding energetic tasks. This is a 1,000,000 context window model just like we have GPT 5.5 and Opus.
00:30It's fully open sourced. It's MIT licensed so you can just take the model, deploy it on your own hardware and run it. And there is no price change compared to GLM 5.1.
00:40And if you want to check the actual prices, GLM 5.2 is 1.4 on input and $4.4 on output, which is significantly lower if you compare it with Opus or even GPT 5.5 models.
00:52Now there are a couple of interesting things about GLM 5.2 and GLM five category of models in general. First is that this is the first model that's introducing a 1,000,000 context window. And second of all, I don't know if you know this, but these models are not trained on NVIDIA stack.
01:08So there is no chips or American silicon that is used to train these models for inference, obviously. Um, but you can technically train a model like this without American companies, which is a big thing. So you mentioned that their blog post mentions that it's a solid 1,000,000 context that stably sustains long horizon work.
01:27So they immediately attack the claim that a 1,000,000 context is easy to claim but much harder to keep reliable under real engineering pressure. To this end we substantially expanded 1,000,000 context training for coding agent scenarios covering large scale implementation, automated research, performance optimizations and complex debugging.
01:49Which is exactly what we want, right? Because generally what happens especially with Google's models I have seen this is that if you try to push their context window a lot, results start to degrade very often, right? And that needle in a haystack problem starts to happen which is basically like the model quality is not that good.
02:09Generally speaking you also don't want to push your model to a million context window. Like no need to push it to the limit because again like it becomes expensive, it becomes slightly slower.
02:19It's also something that it's not giving you a lot of advantage until and unless your problem itself involves a million tokens. Generally what happens is that you're just lazy to start a new chat or copy some context over or you're lazy to compress your context which can be fixed I'll show you how in today's video.
02:35But look at this model look at the Frontier SWE scores and this is just behind Opus 4.8 right this is just almost like 1% difference and this just exceeded GPT 5.5 as well this just beat it. And if you look at some of these other benchmarks also, they are statistically almost similar.
02:54Right? I won't say like thirty four point three and thirty seven point two would have a lot of difference in terms of performance.
03:00Similarly, if you look at some of these other benchmarks that at least GLM is giving us, you're gonna see more or less they are on par with Opus 4.8 numbers. Right?
03:10Obviously, there is a little bit of difference here and there, but eighty one and eighty five is not a huge difference especially by the time you are about to saturate the benchmark.
03:20Right? If this is approaching 90 or approaching 95 or 100, this difference means even less on smaller amount of intervals, smaller amount of changes.
03:30Similarly you will see a bunch of these improvements over here like going from 18 to 46.2 in Deep SWE which is again like a cognition benchmark. It's like a good enough jump.
03:40Although nowhere close to like 70 where OpenAI is right now. But you will see that the performance increase has been consistently stable across this release.
03:50Right? So it's definitely much better than GLM 5.1.
03:54But this is actually in the pursuit of making this better than 5.1. It is approaching Opus level intelligence in a lot of these benchmarks and even bypassing like GPT 5.5 in some of these benchmarks.
04:07Now they sort of talk a little bit about the architectural changes that they have also done in the inference of the model where they talk about this index share for using you know reducing the computational cost of indexer in DSA. So simple way to understand this what this indexer is, is basically think of this thing.
04:26Like let's say if you have, you know, almost like 800,000 tokens.
04:31Right? Could be like a book, be like a big essay or whatever, like a big code base and so on. If you want your AI to work on predicting the next token which is what these large language models are, right?
04:41Like what comes next as a token? What is that response? You basically, your transformer needs to have attention or at least look at every single token, right, idle in an ideal world, which is going to be very very computationally expensive.
04:56So what they do is instead of, um, you know, looking at every single token inside these 800,000 tokens, they just figure out that okay maybe we just need to look at only 1,000 tokens.
05:08You know not all 800,000 tokens. So what are these 1,000 tokens which are like the decision makers of what the next token is supposed to be? This is designed and this is told by an indexer.
05:21Right? So they are basically saying in the blog post that we have optimized some things on this indexer and applied a few techniques which results in lower costs on the trade off obviously of slightly lower quality as well.
05:34But that trade off is acceptable so that the cost remains low. Now a better way to also understand this is that if I write a sentence like if you are a programmer, you should try GLM 5.2.
05:49Right? Let's say if I have the sentence, then obviously the whole sentence is not super important.
05:55Like, all the words are not important for you to understand it. Right? Uh, or, you know, if I just leave it, for example, like this.
06:02Like, now I have to predict what's the next token. Technically speaking, the only important word over here is probably programmer just to get an idea and then the GLM model.
06:13Right? Everything else is sort of like not really that important.
06:17So if I give you this sentence, you can probably say it based on your knowledge cut off that this should be 5.2 or 5.1 or whatever. So that is what indexer is doing that it's picking up this programmer and the GLM word and just not picking up other words which don't carry a lot of meaning.
06:35So anyway this is a little bit of technical breakdown. You can go through this if you want. They are just telling you some techniques and things that they have done for improving the performance.
06:44And this RL for long horizon is basically what we keep talking about on this channel sometimes where they are using reinforcement learning environments to improve the model performance in post training once they have created those models.
06:57So they talk a little bit about this as well like how they are using this reward hacking and anti hack systems to make the model better.
07:06But if you scroll down till the end you will see that they actually did a complete comparison with 5.1, Quinn models, MiniMax, DeepSeek and even Claude. Now you will look that at most benchmarks GLM is not exactly winning.
07:22For example, it's winning at this AIM twenty twenty six. But still, Claude mostly wins in a lot of interesting benchmarks.
07:30Right? The interesting thing over here is that Opus 4.8, even though it's winning right now the gap is shrinking fast. Right?
07:37So we could expect GLM 5.3, 5.4 or GLM six or whatever they release. This would be sort of like an Opus style model which is completely open weights.
07:49Which is a big thing. Just think about it that you can probably buy a piece of hardware, probably not economically right now but maybe in future that can run a model intelligent enough like how Opus 4.8 is today.
08:03Which is insane to think about because that just promises AI intelligence for every single person on planet. Which is a big thing and which is what technically OpenAI always keeps on saying their mission is. But I mean in terms of like opening the AI, I feel China and these labs are the only players who are doing meaningful work at scale, right?
08:27So these models, all of these ZAI company ZAI I think is a public company right. But if you look at other companies like Minimax or Kimi as well.
08:35Kimi recently released Kimi K 2.7 their coding model right. Which was what just three four days ago I think four five days ago.
08:44And now we have another model GLM 5.2. Interestingly I saw that there is one comment that they probably left deliberately in the blog post where they mentioned that it's pure open. An MIT Open Source License, no regional limits, technical access without borders.
08:59I am pretty sure this is targeted towards frontier AI labs like Anthropic, like OpenAI which keeps on saying that their models are dangerous and then they get banned by their own governments.
09:11So this is this looks like a very targeted sentence to include in a release block but just for fun. Alright.
09:19So I've been playing with GLM 5.2 a little bit in one of my repositories over here, one of my other projects that I'm doing. I'm sure like I've shared and teased this a little in some previous videos as well, but this is a small product that I'm building.
09:34But what I have seen so far with GLM is that it's good enough. It's good enough for keeping it concise, for building or at least suggesting architectural decisions as well.
09:44On the back end part it gave me like a couple of good suggestions on features that can be added and how can the business grow. But in this video I want to use a separate project because I want to check its capabilities. And in order to use GLM 5.2 I'm gonna use command code as a harness which ships with GLM 5.2 already and includes a bunch of more open source and open weight models including things like MiniMax M3 which are available for free for a limited time.
10:11Getting started with command code is super simple. All you have to do is create a free account and follow these instructions over here, install it globally, run CMD login and start the session.
10:22And these models that I'm talking about including GLM 5.2 are covered in a $1 per month plan that Command Code includes which is their Go plan which gives an absurdly high amount of AI usage for you on these open models whether that's DeepSeek, Quen, MiniMax and even the newer ones that are coming up.
10:43But I know for a fact that GLM 5.2 is also covered in this release. They just haven't not updated pricing pages yet. So you can go ahead and use GLM on this plan as well.
10:53So let's keep the model to GLM 5.2 And let me just ask hey, what's going on? And just answer me not much, just hanging out in your GLM directory, what can I help you with?
11:03So I'm gonna ask it, can you create a small platformer game?
11:10You are live on YouTube so other people will judge you based on the quality of the game.
11:21Awesome. So this is the same test that I did with Kimi k 2.7 as well. So this will sort of give us an idea on like how GLM is ranking right now.
11:31So you can see, based on the task, it has automatically created an eight list task right now. And now it will get to work.
11:39Now while GLM is working on these task list, I also want to quickly show you some features of command code CLI which makes it actually a really good way of working with these open models like GLM, like Deep Seek, MiniMax, any model that you're using. I'm gonna start this in a different tab and I'm gonna just write in it.
11:57Because if you look at over here, we don't have anything in the folder yet. And even the above one, the CLI over here is working on the project directly. Right?
12:05So we're just gonna do an init over here. And it'll just create an agents dot m d file. Now you can edit this file with memory or open it in your editor.
12:15So what does this file include? It's a typically small file that just has readme. Md but because we don't have readme.
12:21Md right now until this AI does not create it. And a few hints on what needs to be done. Right?
12:28So this agents. Md is just something that all agents are supposed to read except for Claude code because Claude code has Claude.
12:35Md as a file. And at any point that you want to update this file what you can do is just write memory and just update the project memory so it will open it directly inside your choice of your editor. Now it looks like that GLM 5.2 has basically done the task and now it's just verifying.
12:52And again like I've told you how the systems work, this is mostly because of the harness itself. So the way command code is built, it is built in such a way that it asks the AI model to not only just build what it is building but then also verify the implementation, make sure there are no bugs and anything. So it's like partly the intelligence that's coming from the model but partly also the AI harness which is Claude code in this case that you are using that kicks in that, uh, effects.
13:19Alright. So if you look at the directories inside this, you see that we have some photos over here and then we have a platformer. Html as well.
13:27So let's see if I open this platformer. Html, You can see that we also have audio in this file basically and it it seems like that it ended up sort of creating a faster version of Mario.
13:46Right? I mean, obviously, the characters are not there and the controls are a little too fast for me to use.
13:55But this is actually a really nice game. Right? And this is just it just created it in one go, one shot.
14:01And this is also the physics part at least is working fairly fine except for the fact that the game itself is like too too fast. So the gravity, for example, is a little too strong and, you know, the sounds are perfect.
14:15Like, this this sound reminds me of Mario Forever, which is exactly the same, the coin collection sound at least. And you can see like, you know, the levels are increasing.
14:24It has I don't know like how many levels it has created. But these sort of character Okay.
14:32So my game is now over. I'll have to start again. But these characters and the bounding boxes and, you know, the physics of, you know, collisions is working perfectly fine.
14:41Wonder if I can jump on this to also kill it. Yeah, I can. So a lot of interesting things in a small game which it ended up creating super fast.
14:50And as you can see now it has also given me instructions on this game field physics, content and so on. Now again like imagine that this all of this it did it with one single prompt that I gave it. Imagine if you're working with this model for like thirty minutes for an hour.
15:06You could create very interesting projects, games, things, whatever you want to build super easily. And the next fun thing about command code is that let's say if I want to do some sort of pair wipe coding with somebody, what I can just do is just share this session right then and there by writing the share command and you can see that it gave me this immediately this snippet over here.
15:29And if I open this over here you can see that my whole conversation is now available as a public link which you can then share with the people that you want to follow your exact same whatever prompting that you did.
15:43It was mostly like the AI itself. I have not done anything except for a single line. But you get the idea.
15:48You can copy this conversation, you can just paste it back inside your command code session and you can just resume it from there as well. And of course like if you yourself are using it on your own, what you can also do is just write CMD which is command code.
16:03Write Resume and then just resume the conversation that we were going through. Right?
16:08So you can just cut it anytime. You can just close the tab as well whenever you want. Start it again and have a Resume flag which will just resume your conversation instantly.
16:17Now let's say I ask it, can you make the speed and jump a little bit more realistic?
16:27Right now, it feels too fast. Interestingly, it has told me that it synthesized the sound effects via the Web Audio API which requires no asset files.
16:37Right? Which is actually good because if you look at this, there are basically no I don't know, it actually also removed the screenshots. Right?
16:45So it was probably you're just using them to see if the game was working properly or not, which is again like a pretty interesting trait trait from an AI model. So it has made some edits over here for, you know, just fixing the physics of the game. And I can't help but shake the feeling that this sound, this coin collection sound is pretty sure coming from the training data of all the gameplays or you know, I don't know like whatever open source code is available for platformer games like Mario because this sounds very very similar to how the coin collection sounds in Mario in some of the variations of the game at least.
17:25Now it has again, once again sort of done the work. It's just doing its verification now with the agent browser itself. So we can probably just shift back to the game, give it a refresh, and hit space to play again.
17:39Right. So, I mean, personally speaking, I don't feel like, you know, if it has made a lot of difference in the in the, you know, the fastness of the game.
17:53In fact, as a matter of fact, I think or at least I feel that the previous version was slightly better. If not And if that is indeed the case, what I can do right now is I can just go ahead and write rewind inside command code and you will see that it gives me a list of, you know, all the messages that I'm sending and I can have a checkpoint where I restore it.
18:18Now this is good because you don't need git for it because I have, like you have seen, like I have not initialized git in this repository. But I can just select this one and I'll just restore the conversation as well as code.
18:30Right? So I'll just remove this because it's not something that is working properly. Maybe give it a refresh space.
18:38And, yeah, it more or less feels like the same as before. Right?
18:46So I'm not pretty sure like what the model actually updated in the first place. But anyway, I'm sure like you can do a bunch of more interesting things like create more levels, you know, have, you know, a cheat code system for bypassing or jumping levels.
18:59Maybe you're gonna have like sort of dragon levels in which like how Mario used to have. So yeah, all in all, it's a super solid release in the open space, open source world. And this is something that should push the frontier models to figure out the optimizations in their stack, what can they do better to stay relevant with open weight models now because the way open weight models actually follow the closed source models, it's crazy.
19:27The progress that we are seeing is crazy. And GLM 5.2 specifically built for long horizon tasks. Right?
19:33They mention it in the title itself that it's so important that it's for long horizon tasks which means that this would be exceptionally good. It's supposed to be exceptionally good in coding specific tasks because a lot of them are indeed long horizon building features, debugging things, what's going wrong, figuring out like a needle in a haystack sort of situation where you don't exactly know what you're looking for.
19:55That requires a lot of patience and a lot of run time which this model has. So do check out GLM 5.2 using command code. You can see like for this application that we just built, it just took less than 100,000 tokens.
20:08It has over 900,000 tokens context remaining. And generally, what I would suggest to you is that the moment you reach like this mark, like 500 or even 600,000 tokens, what you can do is you can just run compact inside command code.
20:22And it'll just compact your conversation history, make it a little smaller so that it still retains all the important facts and figures, but it's not taking as much history as you want. Now this is obviously like a sort of like a overkill at a 100,000 window.
20:37But this would be useful once you approach like $3,400,000 tokens. So it saved me about 14,000 tokens right now which is not a lot but again like my context was not super large so that's expected.
20:47And if you look at my spending on GLF 5.2 it's less than $1 for building this simple application. And we also did sort of like we just reverted back a change after creating that.
20:59Right? So that also counts for the tokens over here. So, yeah, that's pretty much it for this video.
21:04Do check out command code CLI for trying out GLM 5.2 and other interesting models at a steel cost of $1 per month on the Go plan that they have. Once again, thanks to CommandCode for sponsoring this part of the video. I'm gonna see you in the next video very soon.
21:19If you're still watching, make sure you leave a comment. I watched till the end below to tell me that you were still here and let me know what do you think about the video.
The Hook
The bait, then the rug-pull.
A Chinese open-weight model just landed that benchmarks within 1% of the best closed frontier model on long-horizon coding — and the title is not the most provocative thing about it.
Frameworks
Named ideas worth stealing.
04:11model
IndexShare
Lightweight indexer at the first 4 transformer layers that identifies ~1,000 most decision-critical tokens from 800K context, so remaining layers attend only to those — reducing per-token FLOPs 2.9x at 1M context length.
Steal forExplaining why some long-context models are more cost-efficient than others
A first-day review from a builder who burned 700 million tokens in one session — benchmarks, demos, and the habits that make the usage limits survivable.
A 10-minute screen-recording breakdown of Claude Fable 5 -- benchmarks, a live flight simulator demo, the sandbox escape security story, and a clear framework for when to skip the upgrade.
A screen-share walkthrough of Anthropic's dual model drop: Fable 5 for everyone, Mythos 5 for Glasswing partners only -- and why the host saw it coming.