Modern Creator
AI Edge · YouTube

I Tried GLM 5.2 Inside Claude (it's f*cking incredible)

A 13-minute breakdown of the Chinese open-source model that nearly matches Opus 4.8 intelligence at one-fifth the price, and the four-step setup to wire it into Claude Code.

Posted
3 days ago
Duration
Format
Tutorial
educational
Views
2.9K
110 likes
Big Idea

The argument in one line.

GLM 5.2 ranks fourth on the global AI intelligence index at roughly one-fifth the cost of Claude Opus 4.8, making it a viable replacement for the cheap repetitive 80% of development work while the smarter model handles the reasoning that actually matters.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • A developer who uses Claude Code daily and wants to cut AI spend without switching their primary workflow.
  • Someone who wants a concrete 10-80-10 stack strategy for mixing cheap and smart models on a single project.
  • A heavy API user running into Claude rate limits or cost ceilings.
  • Anyone curious whether open-source Chinese models have genuinely caught up with frontier closed models.
SKIP IF…
  • You are not a developer — the setup steps assume Claude Code familiarity.
  • You want deep technical coverage of model architecture or training data.
  • You are looking for local compute guidance — the host explicitly deprioritizes this path for most users.
TL;DR

The full version, fast.

GLM 5.2 from ZAI ranks fourth globally on the Artificial Analysis Intelligence Index behind only Claude Fable, GPT-5.5, and Claude Opus 4.8, while costing $5.80 per million tokens versus Claude at $30 — roughly 81% cheaper. The host recommends plugging it into Claude Code via a zed.ai API key in four steps and applying the 10-80-10 rule: use the smartest model for first-10% planning, GLM for the cheap 80% grunt work, then return to the smart model for the final 10% debugging. The bottom line is to stay model-agnostic with a portable memory system so you can switch freely as benchmarks shift.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:07

01 · Cold open — benchmark claim and roadmap

Stakes the claim, drops the ranking, previews the full video structure

01:0702:06

02 · Intelligence index ranking

Shows artificialanalysis.ai chart placing GLM 5.2 just behind Opus 4.8 and GPT-5.5

02:0603:15

03 · Cost breakdown

$0.52 vs $1.80 (Opus 4.8) vs $2.75 (Fable); ZAI stock; frontier lab competitive pressure

03:1504:40

04 · Capability demos

Website demo built with GLM 5.2; Echo the Dolphin game prompt across all frontier models

04:4005:54

05 · Local vs online decision

240GB RAM requirement reality check; why the API path makes more sense for most people

05:5406:51

06 · Setup tutorial

Sign up zed.ai, paste key into Claude Code, restart — four steps total

06:5108:25

07 · Model agnosticism argument

Pushback on local-model rhetoric; staying nimble; portable memory system teaser

08:2509:15

08 · GLM vs Claude cost table

Subscription tier comparison plus API per-token rate (~81% cheaper)

09:1511:25

09 · Bottom line framework

10-80-10 rule named; GLM for grunt work; Claude for hard 20%; sandwich metaphor

11:2513:11

10 · Don't marry one model — final verdict

Portable memory system, model-agnostic stance, final GLM 5.2 recommendation

Atomic Insights

Lines worth screenshotting.

  • GLM 5.2 costs $5.80 per million tokens versus Claude at $30 — about 81% cheaper at API rates for nearly equivalent intelligence.
  • The 10-80-10 rule: plan with the smartest model, grind the middle 80% with the cheapest model, debug with the smartest model again.
  • Running GLM 5.2 locally requires 240GB RAM minimum — effectively impossible for most people without tens of thousands in hardware.
  • The model-agnostic developer wins: portable memory and skills files mean you lose nothing when you switch models.
  • Open-source models at frontier quality cost means frontier labs can no longer justify arbitrary price increases without competitive pressure.
  • Claude Code can serve as a harness for any OpenAI-compatible API — swap the model endpoint without changing your workflow.
  • ZAI stock hit $2,300 on the GLM 5.2 release as markets priced in open-source eating the frontier's lunch.
  • Sovereignty is a real argument for local compute, but impractical for most — use the cloud in practice while owning your data layer.
  • GLM 5.2 still lags Claude on debugging and strategic reasoning — those are the 20% of tasks worth protecting with the smarter model.
  • Being model-agnostic is the only rational position when top rankings change every few months.
Takeaway

The cheap model does 80% of the work.

WHAT TO LEARN

When a model that scores near the top of global benchmarks costs one-fifth as much as the current leader, the rational move is not to pick sides but to build a stack that uses each model where it actually earns its price.

  • Benchmark rankings now shift fast enough that loyalty to a single AI model is a cost and capability trap — treating models as interchangeable tools is the durable stance.
  • The 10-80-10 rule applies any time the middle 80% of a task is repetitive: plan with your sharpest model, execute cheaply, review sharply again at the end.
  • Local compute sovereignty is a real value, but the infrastructure cost — 240GB or more of RAM — makes it inaccessible for most people without a significant hardware investment.
  • Portability in your memory and context files matters more than which model you are currently on — if switching costs nothing, you stay free to follow the best tool.
  • Cost per intelligence point is now a trackable metric rather than a vague feeling, and objective benchmarks make it possible to make this decision rationally rather than by hype.
Glossary

Terms worth knowing.

GLM 5.2
A large language model released by ZAI (Zhipu AI), a Chinese AI lab, that ranks fourth on the Artificial Analysis Intelligence Index at a cost roughly five times lower than Claude Opus 4.8.
Artificial Analysis Intelligence Index
A third-party benchmark at artificialanalysis.ai that scores AI models on composite intelligence, updated continuously as new models launch.
10-80-10 rule
A model-mixing strategy: use your smartest AI model for the first 10% of planning, switch to a cheaper model for the repetitive 80% of work, then return to the smart model for final debugging and review.
Model agnostic
A development posture where memory, skills, and context files are stored in a portable format so you can switch between AI models without losing accumulated knowledge or workflow continuity.
Harness
A coding assistant like Claude Code that serves as the user interface and workflow layer but can be pointed at different underlying AI models via an API key swap.
ZAI / zed.ai
The AI lab that developed GLM 5.2. They offer API access at $5.80 per million tokens with subscription tiers starting at $12 per month for light coding use.
Resources

Things they pointed at.

Quotables

Lines you could clip.

06:55
You don't get the full benefit of running it yourself for the true open source effect, but that is just not that realistic for the average person right now.
Honest counter to local-model hype; concise and standaloneIG reel cold open↗ Tweet quote
10:20
GLM comes in at $5.80 per million tokens, and Claude comes in at around $30 — around 81% cheaper or roughly a five x cheaper, which is absolutely insane.
Concrete numbers land the cost shock in one sentenceTikTok hook↗ Tweet quote
11:35
I'm not loyal to a model. I'm not marrying a model. I'm not marrying a company in AI.
Punchy three-part repetition, standalone philosophy statementTikTok hook↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogy
00:00GLM 5.2 just launched, and it's genuinely incredible. It's the first open source model to ever rank this high, and it's going toe to toe with the biggest closed models in the game, like Claude Opus 4.8 and GPT 5.5.
00:15And the crazy part is it's around five times cheaper than Claude. The founder of zed a I said that they're just months away from mythos level models. So the Fable model that got banned by the US government, they are almost there in terms of Chinese open source models, which is absolutely crazy because these models can't be controlled by the government, and they run at an extremely low cost.
00:36So we're entering a completely new era of AI. Like, just imagine being able to run mythos level intelligence locally on your computer. It's insane.
00:45I run a 30 person AI media company and I use these tools every day. And I knew as soon as I started using JLM 5.2, I had to make this video.
00:53So today, I'm gonna run you through everything. What it is, what it does, how much it costs, if you should use it, and how you can actually set this up yourself. If you like AI content like this, make sure to subscribe to the channel, and let's get straight in here looking at the artificial analysis intelligence index.
01:08This is pretty crazy to see. An open source model ranks so highly. It's only a fraction behind Chord Opus 4.8 max and gbt 5.5 high, and it now ranks higher than Gemini.
01:19Google's model, it ranks higher than Grok, x AI's model. So we're now seeing open source models going toe to toe with the best models in the game. And it happened at a good time in my opinion because, obviously, the Fable five ban was very controversial.
01:31This now puts a lot of pressure not only on the US government, but also the Frontier AI labs because they know that open source is catching up. The crazy part isn't just the raw intelligence, which we know makes 5.21 of the most capable models in the AI space.
01:46It's the cost for that intelligence. Because if you look at the cost of GLM 5.2, we could see it comes out at 52¢ per token.
01:53And we know, looking before, that it's the fourth smartest AI model in the world according to the AI intelligence index. However, Claude Fable five, the smartest model in the world, which hopefully someday we get it again soon, this is $2 and 75. And Claude Opus 4.8 max, which is available right now, comes in at a dollar 80.
02:11So this is around five times cheaper than Fable, four times cheaper than Opus 4.8, and it's almost as smart. Like, you gotta comprehend how much of a breakthrough this is for open source models.
02:21And the crazy part is responding to a tweet around full Mythos models being available by the end of the year, so in the next six months from open source labs, Elon Musk estimated it would probably happen later in q one next year. But the founder, Dia Tang of zed AI, who created GLM 5.2, which we're speaking about today, he said it won't take that long.
02:40And I'm hearing estimates that it could only take them three to four months to have a mythos class model. So I'm talking about an intelligence this high on the scale that is completely open source coming in at a fraction of the cost.
02:52This puts a lot of pressure on the top labs, and this is great for us as the consumer because the game is becoming more competitive. And more importantly, this could actually push cost down across the board and prevent the Frontier Labs from raising prices maybe to the levels that they want to or need to to justify the data center spend.
03:09And we've seen as a result, ZAI's stock has absolutely exploded to over $2,300. Zero hedge here saying that open models are about to eat the Frontier's lunch. And just take a quick look at what it can do.
03:21Like, some of the design stuff is genuinely impressive. This is a website that was created using 5.2, and this is very similar to what Opus 4.8 using Claude Design is actually able to output.
03:32And someone actually did an experiment where they gave the exact same prompt to create an echo the dolphin style video game across Fable five, GBT 5.5, Gemini, and GLM 5.2. And if you actually look at the results, it does a great job. Like, this is Gemini here.
03:45Probably the worst of all of them. Very, very, very basic. This is GPT 5.5 Pro here.
03:51Definitely better. This is Claude Fable five. I will say that in my opinion, I do think this is the best.
03:56It looks the most realistic. It looks the most interactive. Clearly, you know, it's the best model, so it makes sense.
04:02But what really impressed me is GLM 5.2. I think it did a great job. Like, you can see the shading here, the graphics.
04:08I think it did a genuinely fantastic job for just how cheap the model is. So we're now seeing outputs very, similar to Fable five and GBT 5.5 on a fraction of the cost. And this is something that, you know, if you're into AI, you can use today, and I'll show you how in today's video to get great outputs at a fraction of the cost.
04:25And it's something I'm now integrating across my AI workflows. But firstly, you have to make a decision. How are you going to run this model if you wanna test it out?
04:32And there are a lot of people online pushing the narrative that you should use local compute for this. The reality is this is a massive model. It requires around 240 gigs of RAM, which means you're gonna need the two fifty six configuration on a Mac Studio.
04:45And even then, it's not gonna run well. It's gonna take up almost all your RAM. So realistically, you need the five twelve gig variation, which is sold out and cost tens of thousands of dollars in the secondary market.
04:54So the reality is you're probably going to need multiple computers, a custom PC using an NVIDIA chip or a stack of computers to even have a chance at having the necessary RAM to run this model. So to be honest, I'm not even gonna spend that much time talking about that path today. Do I think you should run local compute?
05:09Absolutely. I run local compute, Kimi and some of the cheaper smaller models, and that's for grunt work. But you're not gonna be able to run this on local compute unless you're very, very rich.
05:17If you are one of the viewers of this video that happens to have a five twelve gigabyte Mac Studio just laying around, by all means, it locally. It's better because the AI lab can't get access to your data that way or your chat stay private. You truly own your compute.
05:31And I do think in the future, we wanna own our compute and we wanna own these models. However, for the average person, it's not realistic. So I'm going to say the best way to set it up is using it online.
05:41So you can actually use as a harness to interact with GLM 5.2, which is what I'm personally doing. So it's quite simple.
05:49Install Claude or ClaudeCode if you haven't already, then you wanna sign up for zed.ai. If you do look at their coding plan on their website, it's around $12 for Lite, 50 for Pro, 112 for Max, and you do get a yearly discount per year.
06:04And I'll run through the comparison in cost in just a minute with Claude. Once you've done that, you wanna paste your key into Claude. They're fully compatible with Claude or other harnesses.
06:12That is going to then spin up GLM 5.2 via the API on your computer, then you wanna restart Claude code, and then you're done. You can also do this on Codex.
06:20Once again, this is compatible with a bunch of coding tools. If you use Claude code, you can actually just ask it to set it up for you. Or if you use OpenClore or Hermes Agent, can just ask it to set it up for you once you have subscribed, and it's it's really gonna be that simple.
06:33You can set it up, and now you have access to one of the smartest models and one of the cheapest models in the world that is completely open source. Okay? You don't get the full benefit of running it yourself for the true open source effect, but that is just not that realistic for the average person right now.
06:48And, you know, even myself do I have the financial capability to go out and purchase five twelve gigs of RAM? Yes.
06:54Technically. And if you add up all of my Mac minis, I technically have that compute on my Mac Studios, but just not gonna be doing it because it's not efficient. And I wanna push back on this local model rhetoric a little bit because I think it's important, but I also don't think you need to use local models because the frontier models are great.
07:10The frontier labs are great. And unless you're doing heavy coding and constantly running into your rate limits, you're probably gonna be able to get away with a mix of subscriptions. The sovereign argument is completely different.
07:20That is more of a personal belief that you need to hold if you wanna run local compute, and I do share that belief to an extent. But in terms of practicality, using the cloud is going to be better and it's go and you're gonna maintain nimbleness.
07:30You're gonna be more flexible to switch models. And at the end of the day, I think it's very, very important to be model agnostic. Why?
07:36Because new models are coming out all the time and they're completely changing the game. So you've gotta remain model agnostic. You have to have a memory setup that actually enables configurability so you can hop from model to model depending what is best for your particular use case at any given moment in time.
07:50And that's something that I'm gonna be covering in future videos because I think it's so important to have a proper data storage and memory system that is portable from model to model so you can easily switch. So make sure to subscribe. I've got some amazing content coming that's, you know, really, really, I think gonna help you guys.
08:04So now let's walk through the cost versus Claude in practicality based on this subscription, and then I'm gonna walk through my summary of GLM 5.2 and, you know, how I'm realistically gonna be using it. So if you look at the three use cases, light use, daily development, heavy use, it is going to be noticeably cheaper than Claude.
08:20So you'll save roughly 25% for light use, 50% for daily development, and 44%. And here I'm specifically referencing the the tiers and the value for money, not the cost per token, which is where you can get almost a five x cheaper results. So if you do use the pay per use API, GLM comes in at $5.80 per million tokens, and Claude comes in at around $30, around 81% cheaper or roughly a five x cheaper, which is absolutely insane.
08:46And that is where, you know, you're gonna get your biggest savings. So if you do end up plugging it in and you use the API so let's say you go over your basic plan, which obviously gets you a discount, and then you start running over. So if you're a heavy coder or you're doing lots of scraping or, you know, just lots of coding in general, then you're going to significantly save using the GLM 5.2 API versus Claude.
09:05So it's no doubt cheaper. Keeping So in mind that it's cheaper, how am I actually using it day to day? Well, firstly, let's discuss whether it's a Claude killer or not because, you know, I've heard that term going around.
09:15It is if you're doing basic tasks. It is if you're purely looking at a cost per intelligence metric basis, but it isn't in terms of the metric of just what is more intelligent.
09:27Because Claude 4.8 technically is more intelligent. So if you wanna use the smartest model, you're using 4.8 or just a a rung below that 5.5. And of course, once we have access to Fable, you're gonna be using Fable.
09:37That is the smartest model in the world. But if you care about cost optimization, think we should all care about cost optimization, then it is going to be a cord killer for some of you.
09:45So I recommend, number two, how it's entering my stack that you use it for the grunt work, the big repetitive jobs, the first drafts of code. Remember, in some of my recent videos, I talked about the ten eighty ten rule, doing the first 10% with a really smart model. So the smartest model right now is Opus.
10:01If if it's Fable, then it's Fable. You wanna do your planning with the 10%. Then you wanna do the grunt work with the cheaper model.
10:06In this case, GLM is an absolutely fantastic option. And then for the last 10%, the code review and the debugging, you wanna pivot back to the smartest model available.
10:17And that way, you can kind of sandwich a development and get the maximum value for money. And that is my strategy, and that's how this is gonna be entering my stack for now. And the great thing is because I'm using Claude code as a harness as I showed you, I don't need to, you know, switch from harness to harness.
10:30I can just use Visual Studio Code with, you know, Claude and just switch between the two, or I can just use the Claude application or use Claude in my terminal and I get access to this model. So the Claude loops, which I'm gonna be doing a video on and all of my Claude goals and all of my skills, it's all just automatically routing to the preferred model, which is, you know, amazing.
10:48You don't even really need to think about it once you install it as long as you set up your parameters right and set up your instructions right on Claude. Where I'm genuinely keeping Claude is in, as I said, like, the the sandwich bots. So, like, the bread of the sandwich.
11:00So the real thinking, the the bugs, the debugging, the things that even though GLM 5.2 is really impressive, and I've been very impressed with my experimentation so far, I don't think it's the smartest Claude. Can already tell, especially when it comes to debugging, and it's almost there, like, for for scraping and data management and data analysis, but and it's certainly a lot cheaper, but it's not quite, like, the fastest strategic model.
11:21And that's where Claude comes in. Anything really important, that hard 20%, you still wanna use Claude. And thirdly, the bottom bottom line is that you don't wanna marry any model.
11:29I think I've made that clear on this channel by now. Like, Claude is great. I'm gonna use Claude.
11:33I'll be a diehard Claude fan as long as it's the smartest model. The second, it's not the smartest model or the, you know, best value model for my needs. I'm gonna switch.
11:40I'm not loyal to a model. I'm not marrying a model. I'm not marrying a company in AI.
11:44I have a portable memory system. I use Superbase for this. I'm gonna do a video on it.
11:48I also have a local memory system on my computer with context that automatically updates every time I have an important chat. I have skills on Claude which automatically update whenever I am doing a certain task, a repetitive task, so it gets smarter over time. And those skills are saved as local memory files which syncs to the cloud, which I own.
12:05So if I switch models from Claude to GPT or GPT to GLM, it understands, it has context about me, and I don't lose out. I just switch to whatever model is the best. By the way, artificialanalysis.ai, it's gonna tell you what model is the best at any time with an objective intelligence index.
12:20But of course, on the channel, I'm gonna break down objectively what model is good for what because, you know, that is also very, very important. Something might be very smart, but just not practical for daily use. So is it a Claude killer?
12:30Not by the definition of making Claude useless for sure, but it definitely could be, especially for the grunt work. At end of day, you don't wanna marry one model, but I definitely recommend giving GLM 5.2 a try. If you do have a crazy computer lying around for some reason, then by all means run it.
12:46It's gonna be free to run it that way. However, most people don't. And if you don't, I wouldn't go out and, you know, rush out and buy a bunch of hardware.
12:53I would at least first, if you're a serious developer, use it online through the API, which, you know, I run you through today. It's relatively simple to set up. So hopefully, you enjoyed this video.
13:03Make sure to subscribe to the channel for more AI content like this to keep you at the cutting edge of AI. See you in the next one. Have a lovely rest of your day.
13:10Peace out.
The Hook

The bait, then the rug-pull.

The title drops a profanity-laced superlative and the hook delivers immediately — a benchmark chart placing a Chinese open-source model within a rounding error of Claude Opus 4.8 and GPT-5.5. What follows is thirteen minutes of cost math and a four-step setup guide that makes the intelligence gap feel even smaller.

Frameworks

Named ideas worth stealing.

10:10model

10-80-10 Rule

  1. First 10% — plan with the smartest model
  2. Middle 80% — execute with the cheapest capable model
  3. Last 10% — debug and review with the smartest model

Cost-optimization heuristic for mixing AI models: use expensive intelligence only where it actually changes the output.

Steal forAny workflow where Claude API cost is significant — batch processing, scraping, first-draft generation
CTA Breakdown

How they asked for the click.

VERBAL ASK
12:50subscribe
Make sure to subscribe to the channel for more AI content like this to keep you at the cutting edge of AI.

Soft verbal CTA at end; subscribe pitch also dropped at the 1-minute mark inside the hook. No hard product CTA — the channel is the offer.

MENTIONED ON CAMERA
FROM THE DESCRIPTION
PRIMARY CTAWhere the creator wants you to go next.
OTHER LINKSAlso linked in the description.
Storyboard

Visual structure at a glance.

open
hookopen00:00
benchmark
proofbenchmark01:07
cost shock
valuecost shock02:06
local vs API
valuelocal vs API04:40
cost table
valuecost table08:25
framework
valueframework09:15
bottom line
ctabottom line11:25
subscribe
ctasubscribe12:50
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

21:09
AI Edge · Talking Head

Claude Fable — First Look and Honest Review

A 21-minute first-hours take on the public release of the Mythos-class model — what it does, what it costs, and a practical framework for deploying it without burning your token budget.

June 9th
Chat about this