Why Modern Creator?

Mehul Mohan · YouTube

China's New AI Beat Claude AND OpenAI! (WHAT)

GLM-5.2 is MIT-licensed, 1M context, trained without American silicon — and it trails Opus 4.8 by just 1% on the hardest coding benchmark.

Posted

June 17th

1 months ago

Duration

21:32

Format

Talking Head

educational

Views

19.5K

606 likes

Big Idea

The argument in one line.

Open-weight Chinese models are closing the gap on closed frontier AI faster than incumbents can maintain their lead, with GLM-5.2 matching Opus 4.8 within 1% on long-horizon coding while running under MIT license with no geographic restrictions.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You track open-source AI releases and want a thorough benchmark breakdown of GLM-5.2 against Opus 4.8 and GPT-5.5.
You are evaluating Command Code CLI as a cheaper harness for open models at $1/month versus premium alternatives.
You care about AI infrastructure trained outside the US chip stack for cost, sovereignty, or geopolitical reasons.
You want to see a live one-shot agentic demo — a playable multi-level platformer game built in a single prompt.

SKIP IF…

You need production-grade reliability today — GLM-5.2 still trails Opus 4.8 on most benchmarks and the gap matters for real engineering.
You have no interest in CLI-based coding agents; the second half is primarily a Command Code feature tour.

TL;DR

The full version, fast.

GLM-5.2 is Z.ai's new open-weight model — MIT-licensed, 1M context, trained entirely without NVIDIA or American silicon. On FrontierSWE it trails Opus 4.8 by just 1% while beating GPT-5.5 outright, and on PostTrainBench it ranks second only to Opus 4.8. The 1M context is made cost-efficient via IndexShare, which identifies the ~1,000 most decision-critical tokens from 800,000 so the transformer skips attending to everything. The video's second half is a sponsored demo where GLM-5.2 builds a full Mario-style platformer game in one prompt through Command Code CLI, and demonstrates /rewind checkpointing, /compact context compression, and session sharing — total session cost: under $1.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:58

01 · Hook — X announcement

GLM-5.2 announced on X by Z.ai. Key claims: frontier intelligence, open weights, MIT license, 1M context, same price as GLM-5.1.

00:58 – 02:08

02 · Pricing + no US silicon

Pricing table ($1.40/M input, $4.40/M output). First model with solid 1M context. Trained without NVIDIA/American chips.

02:08 – 04:11

03 · Benchmark analysis

FrontierSWE: GLM-5.2 at 74.4% vs Opus 4.8 at 75.1%, beats GPT-5.5. PostTrainBench: 2nd only to Opus 4.8. SWE-Marathon trails Opus 4.8 by 13%.

04:11 – 07:06

04 · IndexShare — technical deep dive

Why 1M context is expensive. IndexShare picks ~1K tokens from 800K. Excalidraw diagram. MTP improvements and RL for long-horizon tasks.

07:06 – 08:30

05 · Full benchmark table

GLM-5.2 vs Qwen3.7-Max, MiniMax M3, DeepSeek-V4-Pro, Opus 4.8, GPT-5.5, Gemini 3.1 Pro across reasoning and coding benchmarks.

08:30 – 10:55

06 · Getting started + sponsor intro

How to use GLM-5.2 via Z.ai Coding Plan or Command Code CLI. Go plan: $1/month covers GLM-5.2, DeepSeek, Qwen, MiniMax, Kimi.

10:55 – 14:30

07 · Live demo — platformer game

Single prompt: build a platformer game, you are live on YouTube. GLM-5.2 creates 8-task plan, builds Pixel Quest with 3 levels, Web Audio API sounds, parallax background.

14:30 – 18:36

08 · Command Code features tour

/init creates AGENTS.md. /share generates public session link. cmd resume <id>. /rewind checkpoint restore of conversation + code.

18:36 – 21:10

09 · Context management + spending

/compact saves ~14K tokens. Token dashboard: 86.8K used, 913.2K remaining. Total GLM-5.2 spend: $0.76.

21:10 – 21:32

10 · CTA + comment bait

Sponsor outro for Command Code. Comment bait tweet shown at end.

Atomic Insights

Lines worth screenshotting.

GLM-5.2 trails Opus 4.8 by just 1% on FrontierSWE — the highest-stakes long-horizon coding benchmark — while beating GPT-5.5 outright.
It was trained without NVIDIA chips or any American silicon, which matters for both cost structure and geopolitical resilience.
IndexShare identifies the ~1,000 most decision-critical tokens from an 800,000-token context so the transformer skips attending to the rest — 2.9x fewer FLOPs at 1M length.
At $1.40/M input and $4.40/M output, GLM-5.2 significantly undercuts Opus 4.8 and GPT-5.5 at near-comparable benchmark performance.
Z.ai deliberately wrote 'no regional limits, technical access without borders' in the release post — a direct jab at Anthropic and OpenAI models banned by their own governments.
Command Code's /rewind gives git-like session checkpoints without initializing git — it restores both conversation history and code files together.
Command Code's /compact condenses context when approaching the token limit — the right time is past 500-600K tokens, not at 100K.
The entire platformer game demo — 3 levels, physics, Web Audio API sounds, parallax background — cost under $1 in GLM-5.2 tokens on Command Code's Go plan.
The harness contributes significantly to agent quality: planning, building, and self-verification behavior comes partly from Command Code's scaffolding, not raw model capability alone.
The gap between open-weight and closed frontier models is closing every few months; GLM-5.2 is the highest-ranked open-source model across all three long-horizon benchmarks at release.

Takeaway

Open-weight AI is a geopolitical story now.

WHAT TO LEARN

GLM-5.2 benchmarks within 1% of the best closed model on long-horizon coding, costs a fraction of the price, and was built entirely outside the US chip stack — all three facts matter independently.

A 1M context window is only useful if the model stays reliable across it — GLM-5.2 addresses this by training specifically on long coding-agent trajectories, not just increasing the token limit.
IndexShare is the key architectural insight: selecting the ~1,000 most decision-critical tokens from 800,000 lets the transformer skip expensive full attention — a meaningful efficiency gain that keeps 1M context economical.
Benchmark saturation changes what gaps mean: the difference between 74.4% and 75.1% on FrontierSWE is nearly nothing when both scores are approaching a ceiling, and the trend line matters more than today's margin.
The harness running the model contributes significantly to agent output quality — GLM-5.2's game demo works partly because Command Code's scaffolding forces planning, building, and self-verification regardless of which model is plugged in.
Open-weight models trained outside US infrastructure represent a structural shift: anyone can download GLM-5.2, deploy it on their own hardware, and run it without API dependency or geographic restriction.
Context management is now a skill: using /compact before hitting the token limit (not after), knowing when 600K tokens is the right trigger, and using /rewind instead of re-prompting are habits that change the economics of long agentic sessions.

Glossary

Terms worth knowing.

IndexShare: An architectural technique in GLM-5.2 that places a lightweight indexer at the first 4 transformer layers to identify the ~1,000 most decision-critical tokens in a long context, reducing per-token FLOPs 2.9x at 1M context length with a modest quality trade-off.
FrontierSWE: A long-horizon coding benchmark measuring whether an AI agent can complete open-ended technical projects spanning hours to tens of hours, covering systems optimization, large-scale code construction, and applied ML research.
PostTrainBench: A benchmark where each agent is given an H100 GPU and evaluated by how much it can improve models through post-training; GLM-5.2 ranks second only to Opus 4.8.
SWE-Marathon: An ultra-long-horizon software engineering benchmark covering tasks like building compilers, optimizing kernels, and developing production-grade services over extended compute windows.
MTP (Multi-Token Prediction): A speculative decoding technique where the model predicts multiple future tokens simultaneously; GLM-5.2 applies IndexShare on the MTP layer to increase acceptance length by up to 20%.
Open weights: A model release where trained weight parameters are publicly downloadable, allowing anyone to run the model on their own hardware without API dependency — GLM-5.2 is MIT-licensed open weights.

Resources

Things they pointed at.

00:00linkGLM-5.2 blog post ↗

00:00linkGLM-5.2 HuggingFace weights ↗

08:30productCommand Code CLI ↗

03:55productKimi K2.7

Quotables

Lines you could clip.

08:20

“I feel China and these labs are the only players who are doing meaningful work at scale.”

Provocative opinion from a credible commentator — will generate replies→ TikTok hook↗ Tweet quote

09:06

“I am pretty sure this is targeted towards frontier AI labs like Anthropic, like OpenAI which keeps on saying that their models are dangerous and then they get banned by their own governments.”

Names Anthropic and OpenAI directly, connects to a real tension — high engagement→ IG reel cold open↗ Tweet quote

07:36

“The gap is shrinking fast. We could expect GLM 5.3, 5.4 or GLM six to be sort of like an Opus style model which is completely open weights.”

Forward-looking prediction with a specific trajectory — shareable claim→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogystory

There is a new big open source model that is released today that is GLM 5.2 saying it is doing frontier intelligence and its open weights. And it's a big thing when you get to the benchmarks part.

But if you look at this model the announcement from ZAI there are some interesting things to cover in this model specifically they talk a lot about coding energetic tasks. Right so they have made significant improvements in coding energetic tasks. This is a 1,000,000 context window model just like we have GPT 5.5 and Opus.

It's fully open sourced. It's MIT licensed so you can just take the model, deploy it on your own hardware and run it. And there is no price change compared to GLM 5.1.

And if you want to check the actual prices, GLM 5.2 is 1.4 on input and $4.4 on output, which is significantly lower if you compare it with Opus or even GPT 5.5 models.

Now there are a couple of interesting things about GLM 5.2 and GLM five category of models in general. First is that this is the first model that's introducing a 1,000,000 context window. And second of all, I don't know if you know this, but these models are not trained on NVIDIA stack.

So there is no chips or American silicon that is used to train these models for inference, obviously. Um, but you can technically train a model like this without American companies, which is a big thing. So you mentioned that their blog post mentions that it's a solid 1,000,000 context that stably sustains long horizon work.

So they immediately attack the claim that a 1,000,000 context is easy to claim but much harder to keep reliable under real engineering pressure. To this end we substantially expanded 1,000,000 context training for coding agent scenarios covering large scale implementation, automated research, performance optimizations and complex debugging.

Which is exactly what we want, right? Because generally what happens especially with Google's models I have seen this is that if you try to push their context window a lot, results start to degrade very often, right? And that needle in a haystack problem starts to happen which is basically like the model quality is not that good.

Generally speaking you also don't want to push your model to a million context window. Like no need to push it to the limit because again like it becomes expensive, it becomes slightly slower.

It's also something that it's not giving you a lot of advantage until and unless your problem itself involves a million tokens. Generally what happens is that you're just lazy to start a new chat or copy some context over or you're lazy to compress your context which can be fixed I'll show you how in today's video.

But look at this model look at the Frontier SWE scores and this is just behind Opus 4.8 right this is just almost like 1% difference and this just exceeded GPT 5.5 as well this just beat it. And if you look at some of these other benchmarks also, they are statistically almost similar.

Right? I won't say like thirty four point three and thirty seven point two would have a lot of difference in terms of performance.

Similarly, if you look at some of these other benchmarks that at least GLM is giving us, you're gonna see more or less they are on par with Opus 4.8 numbers. Right?

Obviously, there is a little bit of difference here and there, but eighty one and eighty five is not a huge difference especially by the time you are about to saturate the benchmark.

Right? If this is approaching 90 or approaching 95 or 100, this difference means even less on smaller amount of intervals, smaller amount of changes.

Similarly you will see a bunch of these improvements over here like going from 18 to 46.2 in Deep SWE which is again like a cognition benchmark. It's like a good enough jump.

Although nowhere close to like 70 where OpenAI is right now. But you will see that the performance increase has been consistently stable across this release.

Right? So it's definitely much better than GLM 5.1.

But this is actually in the pursuit of making this better than 5.1. It is approaching Opus level intelligence in a lot of these benchmarks and even bypassing like GPT 5.5 in some of these benchmarks.

Now they sort of talk a little bit about the architectural changes that they have also done in the inference of the model where they talk about this index share for using you know reducing the computational cost of indexer in DSA. So simple way to understand this what this indexer is, is basically think of this thing.

Like let's say if you have, you know, almost like 800,000 tokens.

Right? Could be like a book, be like a big essay or whatever, like a big code base and so on. If you want your AI to work on predicting the next token which is what these large language models are, right?

Like what comes next as a token? What is that response? You basically, your transformer needs to have attention or at least look at every single token, right, idle in an ideal world, which is going to be very very computationally expensive.

So what they do is instead of, um, you know, looking at every single token inside these 800,000 tokens, they just figure out that okay maybe we just need to look at only 1,000 tokens.

You know not all 800,000 tokens. So what are these 1,000 tokens which are like the decision makers of what the next token is supposed to be? This is designed and this is told by an indexer.

Right? So they are basically saying in the blog post that we have optimized some things on this indexer and applied a few techniques which results in lower costs on the trade off obviously of slightly lower quality as well.

But that trade off is acceptable so that the cost remains low. Now a better way to also understand this is that if I write a sentence like if you are a programmer, you should try GLM 5.2.

Right? Let's say if I have the sentence, then obviously the whole sentence is not super important.

Like, all the words are not important for you to understand it. Right? Uh, or, you know, if I just leave it, for example, like this.

Like, now I have to predict what's the next token. Technically speaking, the only important word over here is probably programmer just to get an idea and then the GLM model.

Right? Everything else is sort of like not really that important.

So if I give you this sentence, you can probably say it based on your knowledge cut off that this should be 5.2 or 5.1 or whatever. So that is what indexer is doing that it's picking up this programmer and the GLM word and just not picking up other words which don't carry a lot of meaning.

So anyway this is a little bit of technical breakdown. You can go through this if you want. They are just telling you some techniques and things that they have done for improving the performance.

And this RL for long horizon is basically what we keep talking about on this channel sometimes where they are using reinforcement learning environments to improve the model performance in post training once they have created those models.

So they talk a little bit about this as well like how they are using this reward hacking and anti hack systems to make the model better.

But if you scroll down till the end you will see that they actually did a complete comparison with 5.1, Quinn models, MiniMax, DeepSeek and even Claude. Now you will look that at most benchmarks GLM is not exactly winning.

For example, it's winning at this AIM twenty twenty six. But still, Claude mostly wins in a lot of interesting benchmarks.

Right? The interesting thing over here is that Opus 4.8, even though it's winning right now the gap is shrinking fast. Right?

So we could expect GLM 5.3, 5.4 or GLM six or whatever they release. This would be sort of like an Opus style model which is completely open weights.

Which is a big thing. Just think about it that you can probably buy a piece of hardware, probably not economically right now but maybe in future that can run a model intelligent enough like how Opus 4.8 is today.

Which is insane to think about because that just promises AI intelligence for every single person on planet. Which is a big thing and which is what technically OpenAI always keeps on saying their mission is. But I mean in terms of like opening the AI, I feel China and these labs are the only players who are doing meaningful work at scale, right?

So these models, all of these ZAI company ZAI I think is a public company right. But if you look at other companies like Minimax or Kimi as well.

Kimi recently released Kimi K 2.7 their coding model right. Which was what just three four days ago I think four five days ago.

And now we have another model GLM 5.2. Interestingly I saw that there is one comment that they probably left deliberately in the blog post where they mentioned that it's pure open. An MIT Open Source License, no regional limits, technical access without borders.

I am pretty sure this is targeted towards frontier AI labs like Anthropic, like OpenAI which keeps on saying that their models are dangerous and then they get banned by their own governments.

So this is this looks like a very targeted sentence to include in a release block but just for fun. Alright.

So I've been playing with GLM 5.2 a little bit in one of my repositories over here, one of my other projects that I'm doing. I'm sure like I've shared and teased this a little in some previous videos as well, but this is a small product that I'm building.

But what I have seen so far with GLM is that it's good enough. It's good enough for keeping it concise, for building or at least suggesting architectural decisions as well.

On the back end part it gave me like a couple of good suggestions on features that can be added and how can the business grow. But in this video I want to use a separate project because I want to check its capabilities. And in order to use GLM 5.2 I'm gonna use command code as a harness which ships with GLM 5.2 already and includes a bunch of more open source and open weight models including things like MiniMax M3 which are available for free for a limited time.

Getting started with command code is super simple. All you have to do is create a free account and follow these instructions over here, install it globally, run CMD login and start the session.

And these models that I'm talking about including GLM 5.2 are covered in a $1 per month plan that Command Code includes which is their Go plan which gives an absurdly high amount of AI usage for you on these open models whether that's DeepSeek, Quen, MiniMax and even the newer ones that are coming up.

But I know for a fact that GLM 5.2 is also covered in this release. They just haven't not updated pricing pages yet. So you can go ahead and use GLM on this plan as well.

So let's keep the model to GLM 5.2 And let me just ask hey, what's going on? And just answer me not much, just hanging out in your GLM directory, what can I help you with?

So I'm gonna ask it, can you create a small platformer game?

You are live on YouTube so other people will judge you based on the quality of the game.

Awesome. So this is the same test that I did with Kimi k 2.7 as well. So this will sort of give us an idea on like how GLM is ranking right now.

So you can see, based on the task, it has automatically created an eight list task right now. And now it will get to work.

Now while GLM is working on these task list, I also want to quickly show you some features of command code CLI which makes it actually a really good way of working with these open models like GLM, like Deep Seek, MiniMax, any model that you're using. I'm gonna start this in a different tab and I'm gonna just write in it.

Because if you look at over here, we don't have anything in the folder yet. And even the above one, the CLI over here is working on the project directly. Right?

So we're just gonna do an init over here. And it'll just create an agents dot m d file. Now you can edit this file with memory or open it in your editor.

So what does this file include? It's a typically small file that just has readme. Md but because we don't have readme.

Md right now until this AI does not create it. And a few hints on what needs to be done. Right?

So this agents. Md is just something that all agents are supposed to read except for Claude code because Claude code has Claude.

Md as a file. And at any point that you want to update this file what you can do is just write memory and just update the project memory so it will open it directly inside your choice of your editor. Now it looks like that GLM 5.2 has basically done the task and now it's just verifying.

And again like I've told you how the systems work, this is mostly because of the harness itself. So the way command code is built, it is built in such a way that it asks the AI model to not only just build what it is building but then also verify the implementation, make sure there are no bugs and anything. So it's like partly the intelligence that's coming from the model but partly also the AI harness which is Claude code in this case that you are using that kicks in that, uh, effects.

Alright. So if you look at the directories inside this, you see that we have some photos over here and then we have a platformer. Html as well.

So let's see if I open this platformer. Html, You can see that we also have audio in this file basically and it it seems like that it ended up sort of creating a faster version of Mario.

Right? I mean, obviously, the characters are not there and the controls are a little too fast for me to use.

But this is actually a really nice game. Right? And this is just it just created it in one go, one shot.

And this is also the physics part at least is working fairly fine except for the fact that the game itself is like too too fast. So the gravity, for example, is a little too strong and, you know, the sounds are perfect.

Like, this this sound reminds me of Mario Forever, which is exactly the same, the coin collection sound at least. And you can see like, you know, the levels are increasing.

It has I don't know like how many levels it has created. But these sort of character Okay.

So my game is now over. I'll have to start again. But these characters and the bounding boxes and, you know, the physics of, you know, collisions is working perfectly fine.

Wonder if I can jump on this to also kill it. Yeah, I can. So a lot of interesting things in a small game which it ended up creating super fast.

And as you can see now it has also given me instructions on this game field physics, content and so on. Now again like imagine that this all of this it did it with one single prompt that I gave it. Imagine if you're working with this model for like thirty minutes for an hour.

You could create very interesting projects, games, things, whatever you want to build super easily. And the next fun thing about command code is that let's say if I want to do some sort of pair wipe coding with somebody, what I can just do is just share this session right then and there by writing the share command and you can see that it gave me this immediately this snippet over here.

And if I open this over here you can see that my whole conversation is now available as a public link which you can then share with the people that you want to follow your exact same whatever prompting that you did.

It was mostly like the AI itself. I have not done anything except for a single line. But you get the idea.

You can copy this conversation, you can just paste it back inside your command code session and you can just resume it from there as well. And of course like if you yourself are using it on your own, what you can also do is just write CMD which is command code.

Write Resume and then just resume the conversation that we were going through. Right?

So you can just cut it anytime. You can just close the tab as well whenever you want. Start it again and have a Resume flag which will just resume your conversation instantly.

Now let's say I ask it, can you make the speed and jump a little bit more realistic?

Right now, it feels too fast. Interestingly, it has told me that it synthesized the sound effects via the Web Audio API which requires no asset files.

Right? Which is actually good because if you look at this, there are basically no I don't know, it actually also removed the screenshots. Right?

So it was probably you're just using them to see if the game was working properly or not, which is again like a pretty interesting trait trait from an AI model. So it has made some edits over here for, you know, just fixing the physics of the game. And I can't help but shake the feeling that this sound, this coin collection sound is pretty sure coming from the training data of all the gameplays or you know, I don't know like whatever open source code is available for platformer games like Mario because this sounds very very similar to how the coin collection sounds in Mario in some of the variations of the game at least.

Now it has again, once again sort of done the work. It's just doing its verification now with the agent browser itself. So we can probably just shift back to the game, give it a refresh, and hit space to play again.

Right. So, I mean, personally speaking, I don't feel like, you know, if it has made a lot of difference in the in the, you know, the fastness of the game.

In fact, as a matter of fact, I think or at least I feel that the previous version was slightly better. If not And if that is indeed the case, what I can do right now is I can just go ahead and write rewind inside command code and you will see that it gives me a list of, you know, all the messages that I'm sending and I can have a checkpoint where I restore it.

Now this is good because you don't need git for it because I have, like you have seen, like I have not initialized git in this repository. But I can just select this one and I'll just restore the conversation as well as code.

Right? So I'll just remove this because it's not something that is working properly. Maybe give it a refresh space.

And, yeah, it more or less feels like the same as before. Right?

So I'm not pretty sure like what the model actually updated in the first place. But anyway, I'm sure like you can do a bunch of more interesting things like create more levels, you know, have, you know, a cheat code system for bypassing or jumping levels.

Maybe you're gonna have like sort of dragon levels in which like how Mario used to have. So yeah, all in all, it's a super solid release in the open space, open source world. And this is something that should push the frontier models to figure out the optimizations in their stack, what can they do better to stay relevant with open weight models now because the way open weight models actually follow the closed source models, it's crazy.

The progress that we are seeing is crazy. And GLM 5.2 specifically built for long horizon tasks. Right?

They mention it in the title itself that it's so important that it's for long horizon tasks which means that this would be exceptionally good. It's supposed to be exceptionally good in coding specific tasks because a lot of them are indeed long horizon building features, debugging things, what's going wrong, figuring out like a needle in a haystack sort of situation where you don't exactly know what you're looking for.

That requires a lot of patience and a lot of run time which this model has. So do check out GLM 5.2 using command code. You can see like for this application that we just built, it just took less than 100,000 tokens.

It has over 900,000 tokens context remaining. And generally, what I would suggest to you is that the moment you reach like this mark, like 500 or even 600,000 tokens, what you can do is you can just run compact inside command code.

And it'll just compact your conversation history, make it a little smaller so that it still retains all the important facts and figures, but it's not taking as much history as you want. Now this is obviously like a sort of like a overkill at a 100,000 window.

But this would be useful once you approach like $3,400,000 tokens. So it saved me about 14,000 tokens right now which is not a lot but again like my context was not super large so that's expected.

And if you look at my spending on GLF 5.2 it's less than $1 for building this simple application. And we also did sort of like we just reverted back a change after creating that.

Right? So that also counts for the tokens over here. So, yeah, that's pretty much it for this video.

Do check out command code CLI for trying out GLM 5.2 and other interesting models at a steel cost of $1 per month on the Go plan that they have. Once again, thanks to CommandCode for sponsoring this part of the video. I'm gonna see you in the next video very soon.

If you're still watching, make sure you leave a comment. I watched till the end below to tell me that you were still here and let me know what do you think about the video.

The Hook

The bait, then the rug-pull.

A Chinese open-weight model just landed that benchmarks within 1% of the best closed frontier model on long-horizon coding — and the title is not the most provocative thing about it.

Frameworks

Named ideas worth stealing.

04:11model

IndexShare

Lightweight indexer at the first 4 transformer layers that identifies ~1,000 most decision-critical tokens from 800K context, so remaining layers attend only to those — reducing per-token FLOPs 2.9x at 1M context length.

Steal forExplaining why some long-context models are more cost-efficient than others

02:08list

Long-Horizon Benchmark Trifecta

FrontierSWE — hours-to-tens-of-hours open-ended engineering
PostTrainBench — H100-evaluated model improvement tasks
SWE-Marathon — ultra-long-horizon: compilers, kernels, production services

Three benchmarks Z.ai uses to prove GLM-5.2 sustains quality under extended engineering pressure, not just token-count claims.

Steal forFraming any capability claim with a three-level evidence structure

08:30list

Command Code Agent Workflow

/init — creates AGENTS.md project memory
/memory — opens AGENTS.md in editor
/share — public session URL for pair coding
cmd resume <id> — resume any prior session
/rewind — checkpoint restore (conversation + code)
/compact — compress context, free headroom

Six commands that make Command Code useful as a persistent coding agent harness across open models.

Steal forAny CLI-based agent tutorial or product demo

CTA Breakdown

How they asked for the click.

VERBAL ASK

20:51product

“Do check out command code CLI for trying out GLM 5.2 and other interesting models at a steel cost of $1 per month on the Go plan that they have.”

Warm close after live demo that showed real token spend ($0.76) — makes the $1/month price point feel credible rather than abstract.

MENTIONED ON CAMERA

00:00linkGLM-5.2 blog post ↗

00:00linkGLM-5.2 HuggingFace weights ↗

08:30productCommand Code CLI ↗

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.

Download CommandCode and get access to GLM 5.2, MiniMax M3, DeepSeek, Qwen and a lot more for just $1 ↗

OTHER LINKSAlso linked in the description.

Storyboard

Visual structure at a glance.

open

hookopen00:00

pricing

valuepricing00:58

benchmarks

valuebenchmarks02:08

IndexShare

valueIndexShare04:11

sponsor

ctasponsor08:30

demo game

valuedemo game10:55

CTA

ctaCTA21:10

Frame Gallery

Visual moments.

open

Frame at 00:24 from China's New AI Beat Claude AND OpenAI! (WHAT)

pricing

Frame at 00:56 from China's New AI Beat Claude AND OpenAI! (WHAT)

Frame at 01:12 from China's New AI Beat Claude AND OpenAI! (WHAT)

Frame at 01:28 from China's New AI Beat Claude AND OpenAI! (WHAT)

Watch next

More from this channel + related breakdowns.

12:51

Mehul Mohan · Talking Head

Elon Bought Your Coding IDE

A 13-minute breakdown of why SpaceX acquiring Cursor is really about vertical model ownership — not rockets buying a code editor.

June 16th

36:00

Theo - t3․gg · Review

GPT-5.6: The Review

Theo spends 36 minutes putting real numbers behind the GPT-5.6 hype — Sol, Terra, and Luna, benchmarked against Claude Fable, one blog chart at a time.

July 12th

04:18

Chase AI · Reaction

Claude Opus 5: Beating Fable 5 at Half the Cost

A screen-by-screen read of Anthropic's Opus 5 announcement, benchmark chart by benchmark chart.

July 24th

36:06

Pat Simmons · Review

Kimi K3 Is Here! (Better Than Opus 4.8?)

Ten identical builds, five models, blind-ranked before the reveal — a real-world stress test of Moonshot AI's new open-source model against GPT-5.6 Sol, Opus 4.8, GLM 5.2, and its own predecessor.

July 17th

09:52

Nick Saraev · Review

I Tested Kimi K3 So You Don't Have To

A live side-by-side of Moonshot's open-source Kimi K3 against Anthropic's Fable 5 across coding benchmarks, generative design, 3D scenes, and mini-games — followed by a case for why the real bottleneck is about to stop being intelligence at all.

July 16th

41:35

Theo - t3․gg · Review

Kimi K3 Is the Best Open-Weight Model Ever Made (Sometimes)

Theo spends a day stress-testing Moonshot's 2.8-trillion-parameter open-weight release — and comes away convinced it's frontier-class, cheap enough to matter, and genuinely dangerous once the weights go public on July 27.

July 17th