Modern Creator
Greg Isenberg · YouTube

GLM 5.2: Set Up Local AI with Cursor/Codex etc

A 22-minute tactical breakdown of how to plug an open-source local model into your existing AI coding harness — and why the token math makes it worth doing now.

Posted
2 days ago
Duration
Format
Interview
educational
Views
11.5K
376 likes
Big Idea

The argument in one line.

Running a single frontier model for every task is a governance failure; a fusion approach that sequences planning, execution, and review across models by price and capability delivers near-frontier output at 20-30% of the cost.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You are a solo builder or small team already using Cursor, Codex, or Claude Code and watching token costs climb.
  • You want a step-by-step guide for adding an open-source model to your AI harness without rebuilding your workflow.
  • You are an engineering or ops lead making decisions about which models to give different teams access to.
  • You have been curious about local models but assumed the setup was too painful or the hardware requirements too steep.
SKIP IF…
  • You need vision or multimodal capabilities in your local model — GLM 5.2 does not support images natively.
  • You are looking for a deep technical or architecture-level breakdown of how the model works rather than how to use it.
TL;DR

The full version, fast.

GLM 5.2 from ZAI ships with a 1M-token context window and lands within about 7 points of Opus 4.8 on Terminal Bench 2.1 while costing roughly one-fifth as much per token — 44 cents versus $2.38 for a comparable task. The core argument is that local models are not replacements but sequenceable components: a frontier model handles vision-heavy planning steps GLM cannot do, then GLM executes the bulk of the build at scale. Setup takes a single API key from zed.ai pasted into Cursor's OpenAI field with the endpoint overridden, or an OpenRouter profile in Codex. The longer warning is about AI token subsidies behaving like Uber's early ride pricing — built to hook you before the prices normalize.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Voices

Who's talking.

00:00hostGreg Isenberg
01:09guestAmir
Chapters

Where the time goes.

00:0002:08

01 · Intro and episode promise

Greg frames the episode: a tactical guide to local AI and GLM 5.2 in 20 minutes or less.

02:0904:00

02 · GLM 5.2 and ZAI context

Amir introduces ZAI's GLM 5.2 release and why it marks an inflection point for local models.

04:0105:21

03 · Specs: 1M context and Terminal Bench 2.1

1M context window, 81 on Terminal Bench 2.1, about 4 points behind Opus 4.8; strong long-horizon task performance.

05:2206:41

04 · Making sense of benchmark scores

Both hosts admit benchmarks feel abstract; Amir's heuristic is to just build with the model and judge the output.

06:4208:09

05 · Setup in Cursor via ZAI API key

Step-by-step: get a ZAI API key from zed.ai, paste into Cursor's OpenAI field, override the endpoint, add GLM 5.2 as a custom model.

08:1010:17

06 · Setup via OpenRouter and Codex

Alternative path: OpenRouter key into Codex profile with model name and context window; switch from the CLI.

10:1811:41

07 · Local model upside: buy compute once

Amir frames the economic case for local hardware: one-time cost, unlimited runs.

11:4213:35

08 · Token cost math: 44 cents vs $2.38

Real numbers: 50k input + 85k output tokens for near-Opus quality = 44c on GLM 5.2 vs $2.38 on Opus 4.8. Five-times difference at scale.

13:3616:48

09 · Future-proofing and the Uber subsidy analogy

Greg frames AI token subsidies as Uber's early cheap rides. Amir argues the upfront hardware bet pays off when GLM 5.3/5.5 arrive.

16:4919:22

10 · Model chaining and the vision workaround

Fusion approach in practice: Opus 4.8 reads screenshots and describes the layout, GLM 5.2 executes the changes.

19:2321:58

11 · Token maxing vs routing to the right model

Amir reframes the mindset: token minimization plus output maximization. OpenRouter is the easiest on-ramp — $20 in credits, no hardware needed.

21:5922:45

12 · Closing

Wrap-up and links to Amir's socials.

Atomic Insights

Lines worth screenshotting.

  • GLM 5.2 scores 81 on Terminal Bench 2.1, sitting about 4 points behind Opus 4.8 while costing roughly one-fifth as much per token.
  • A task that costs $2.38 on Opus 4.8 runs for 44 cents on GLM 5.2 via OpenRouter — a 5X gap that becomes the dominant cost driver at team scale.
  • A fusion approach sequences models by strength: frontier model reads and plans, local model executes the build, review model polishes.
  • GLM 5.2 has no native vision support — the workaround is routing screenshots through a vision-capable model to get a layout description, then handing that to GLM 5.2.
  • You can start today with $20 in OpenRouter credits and no local hardware required.
  • AI token subsidies mirror Uber's early ride pricing — they are designed to build dependency before prices normalize.
  • Using a frontier model to format a single email is a real governance failure happening inside companies right now.
  • Model-agnostic harnesses like Cursor and Codex capture the most value as open-source models close the performance gap, because users swap models without rebuilding workflows.
  • The ROI of local hardware is not speed or privacy — it is unlimited compute once the upfront cost is paid.
  • The smarter framing is token minimization plus output maximization, not token maximization.
Takeaway

Route tasks by cost, not just by capability.

WHAT TO LEARN

The default of using one powerful model for everything is both expensive and unnecessary — model chaining unlocks near-frontier quality at a fraction of the cost once you know which tasks require which tier.

  • A model that scores 62% where the best scores 69% will handle most execution tasks correctly — the 7-point gap matters far less than the 5X price difference at volume.
  • The vision workaround reveals a general principle: use a capable model to translate what a limited model cannot perceive, then hand the structured description to the cheaper model to act on.
  • OpenRouter lets you test the fusion approach for $20 in credits with no local hardware — the barrier to experimenting is lower than the hype suggests.
  • The Uber subsidy frame is the most actionable warning: AI pricing is subsidized today to build workflow dependency; teams that localize now gain cost insulation before prices normalize.
  • Model governance is the unsexy but high-leverage work — the biggest overspend is not large tasks but routine ones sent to frontier models by default.
Glossary

Terms worth knowing.

GLM 5.2
An open-source large language model released by ZAI (Zhipu AI) with a 1M-token context window, designed for long-horizon coding and reasoning tasks.
Terminal Bench 2.1
A benchmark suite that evaluates language models on software engineering and long-horizon task completion; scores are reported as a percentage.
Fusion approach
A model-chaining strategy where multiple AI models are sequenced across a single workflow — one model plans, another executes, another reviews — to optimize for both cost and output quality.
OpenRouter
A cloud platform that provides unified API access to dozens of AI models, including open-source ones, on a credit-based pay-as-you-go system.
Long-horizon task
A coding or reasoning task that requires maintaining context and executing multiple sequential steps over an extended session.
Token governance
Organizational practices and tooling that control which AI models different teams are allowed to call, preventing expensive frontier models from being used for low-complexity tasks.
Resources

Things they pointed at.

02:33productGLM 5.2 by ZAI
10:43toolCursor
12:23toolCodex CLI
04:01toolTerminal Bench 2.1
22:22productHumblytics
Quotables

Lines you could clip.

21:36
You shouldn't be token maxing. You should be token minimizing as much as possible and output maxing instead.
Punchy reframe that inverts the common assumption about AI productivityTikTok hook↗ Tweet quote
12:03
It will cost us 44 cents. Whereas with Opus 4.8, it costs you $2.38.
Concrete numbers make the cost argument visceral — no setup neededIG reel cold open↗ Tweet quote
14:03
Think about Uber. When Uber first came out, they actually subsidized rides and they got you hooked onto the app. And then over time, they started increasing prices.
The Uber analogy is the most memorable frame in the episodenewsletter pull-quote↗ Tweet quote
Topic Map

Where the conversation goes.

00:0006:41steadyGLM 5.2 overview and benchmarks
06:4210:17denseSetup walkthrough
10:1816:48denseToken cost math and economics
16:4921:58denseModel chaining and governance
21:5922:45sparseClosing
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:00Okay. You've probably heard of GLM 5.2 that's going viral everywhere on Twitter.
00:05Yes. It's this new open source local AI model that people are saying is the chat GPT moment for local AI. But no one's actually gone and shown you how to use it and how do you actually set it up.
00:18So I figured I'd bring on my friend, Amir. He tells us exactly how you should think about running, uh, GLM 5.2, how you should think about running local models, how that integrates to something called OpenRouter, how you can use it with your codex or cursor or Claude code.
00:35In this episode, in twenty minutes or less, you're gonna get everything you need to know about local AI models, why GLM 5.2 is crushing benchmarks, and how you can set it up today so you you can go and build your startup and build your business and be more productive and be more efficient.
00:54Enjoy the episode, and I'll see you at the end of it. Hit that like, comment, and subscribe for more of this sort of stuff in your feed. Enjoy.
01:09Welcome to the show, Amir. By the end of this episode, what are we gonna learn?
01:14We're gonna talk about essentially learn about how local models are kinda keeping up now with the pace of these closed models as well and how you can kinda use compounding models or fusion models as OpenRouter calls it to be able to do sequencing between a more extensive thinking model and a more execute execution based model.
01:33We'll show you how GLM 5.2 actually compares and stacks up against other models and how you can effectively use it and get set up with it as well.
01:41Cool. I did an episode on local models. It was a hit.
01:44People wanted more tactical, how do I actually, you know, how should I think about local models, how do I implement local models, how I can actually make this a part of my daily workflow.
01:56So I brought on Amir. Welcome to the show, and let's I give you two welcomes by the way. That's how excited I am to have you share with
02:04everyone everything. So let's get into it. I'm super excited to be here.
02:08Let's jump right into it. So let's talk about what happened this week. Zetta AI came out with Gelone 5.2 and I think this was a big inflection point because we've typically seen with local models, it's either, you know, storage extensive that you can't essentially run it or install on your computer or you need a better, like, GPU RAM performance to be able to actually run it locally as well.
02:30Now, JL 5.2 is also resource extensive, but what we're seeing is with open source models like open source providers like OpenRouter or or Llama being able to help you run these models in the cloud and effectively being able to essentially pay slightly less for input and output tokens compared to the more closed models.
02:52Now what I want people to take away from this session is one, how to actually get set up with it. We're not gonna go through the detailed setup process, but I'm gonna just cover how you can do it in cursor using open router or in codecs, and then effectively talk about how GLM 5.2 stacks up against the other models, and then how you can effectively use it as well-to-do model training.
03:10And then we'll do model training, and then we'll do just a kind of a maybe a quick walk through of how I'm currently using it. You know, I wanna be very honest, like, these local models still have a lot of work to do in terms of having tool capabilities to be able to, you know, have the, you know, the modalities to be able to see images and conceptualize on what they're looking at.
03:32And I'm gonna tell you how you can effectively circumvent some of that where you can use other models to explain what the image is back to GLM and then have GLM work on it.
03:43And then also just have a very live test on how this stacks up against other models. Know, benchmarks are great. Personally, I'm not an expert in it.
03:50I don't know what any of these benchmarks actually mean. The way I do is off ops. Like, let's build it out and see how this actually looks and how it stacks up against other models.
03:58Sound good? Yes, sir. Let's do it.
04:01Okay. So GLON 5.2 came out, and, it has a 1,000,000 context window, and it scores 81 points on the terminal bench 2.1.
04:10It's just about four points behind OPUS 4.8, and it does quite well on the long horizon task evaluation. So this is essentially projects you have that you want to run long sequence tasks on and, you know, I I think it's in account like the the thinking parameters and how it can think through and plan through some of the the task it has at hand.
04:33So you can see that across all these different kind of benchmark reporting reports. GLL GLM 5.2 actually does quite well.
04:42So in this case, it's 62.1% compared to OPUS 69.2. Um, what's special about GLM is it's open source.
04:50You can run it locally on the machine if your device can support it, or you can run it in the cloud through the open, um, model providers. It's a big leap from 5.1. Uh, I personally didn't test 5.1.
05:01I got straight into 5.2, but from what I'm seeing based off of, like, Twitter and conversations with people, it's performing quite well, on the front end side of just execution based tasks.
05:12I haven't really tested more on the back end resource extensive tasks, but just based on perception and what we're seeing in the reporting, it's stacking quite well.
05:22So when you say stacking well, are we talking like because, like, I look at benchmarks and honestly, it goes through, you know, one ear or one eye and it goes at the other eye. I'm like I I glaze over it because I'm like, what does this really mean?
05:34Like, are we talking is it, like, 4.8? Is it, like, 5.5? Yeah.
05:39What do we how should I think about this? And yeah.
05:44Give it give it to me straight. Honestly, man, I'm gonna say, man. I I don't get it.
05:47You know what I mean? Like, I'm I'm not smart enough to understand how these benchmarks actually stack. So I'll be honest, like, for me, it's like, let's just let's just build it, use it, and see how we feel about, you know, it how it performs compared to the other models.
06:00And for me, I wanna get the best out of it. So if I feel like GLM 5.2 is strong in one part, but weak on the other, then I'm thinking about how do I actually use other tools or other models to essentially now I think, like, almost like a fusion approach now.
06:14I think OpenRouter, is one of the, like, the model providers, coined this where it's like you're able to do, like, sequencing between two different models to get the best output.
06:25So I'm totally game on if I can run a local model on my machine to do certain tasks but then call, you know, Opus or Codex to do something else and have them work together, by all means.
06:36I wanna be the most token and cost efficient and performance as well. Yeah. So on the setup side, I personally started using this through Cursor using OpenRouter's API.
06:51So how it works essentially is you gotta go to ZAI, which is the GLM provider. They created the GLM 5.2 model.
06:58You get an API key from them, and then you take that key and you go into your cursor settings, paste it into the OpenAI field, and then from there, you override the OpenAI endpoint with this API endpoint right here.
07:13And essentially from there, you go back to models, add a custom model, JLM 5.2, and you're able to then actually call GLM 5.2 directly. So in essence, instead of OpenAI key, you put your GLM API key from zed.ai, then you override the API endpoint for when you call OpenAI track completion with this one right here, and then you go back into custom models and add the custom model protocol.
07:39You can also alternatively do this using OpenRouter. So if you want if you're using codecs, you can go to OpenRouter, get your open router key, and then, um, go into the provider, get the endpoint, and then go into Codex, create a profile, and say, hey.
07:53I want you to install this model, um, open source model. Codex does actually support open source models, so you're able to provide the details of what the model is, the context window, and then essentially, when you're running Codex to the CLI, you can switch to JLN 5.2.
08:11Easy enough. Yeah. Easy enough, you know, and then maybe we can have a page or something to show later on where they can kind of follow these instructions.
08:18There's a lot of guides on Twitter and online. You can follow those, but essentially, in my opinion, the best way to get started with this is just go to open router and cursor and get that set up through and through.
08:30So let's talk about, okay, the model. We've talked about the benchmarks, how it performs.
08:34Really, if we wanna just at least takes you know, put some weight to the benchmarks, I'd say if they're scoring at 62 and Opus 4.8 is at 69, you know, that probably means something, you know, for for for the normal people, we probably won't really know until we actually play around with it.
08:50But I went in and was just looking at, for example, this website we have, it's a small app, and I built this in, I think, Opus 4.8. I was just testing it around, and I started refining the design using Geolog 5.2.
09:05So I was like, hey, redesign the hero section for me or refine it. There was this like section right here with all these images. I was like, why don't we just do a little like carousel style?
09:15So it, you know, it's fascinating in a way because I don't I personally don't think open the the local models had this kind of capability to be able to get get it so refined and accurate previously.
09:30And I I I tested the models like that we had, and I find that GLM 5.2 is a lot more refined and it's able to follow the instructions on like what you wanna do. So in a couple of prompts, I was like, hey, you know, let's do a carousel here, let's make sure we, you know, are able to show the the images, and then from there, I want you to build out like a bento grid style of all the features that we have.
09:52Now Mhmm. This is all in one prompt, obviously. You know, you you can see it's a little bit of vibe coated here.
09:57It has a little side like badge, uh, the labels, you know, and and you can tell. But at the end of the day, for a local model for, like, you know, a local model, if you're running this on a computer and not burning in tokens, it's it's doing quite well.
10:11And I think
10:13I can see how it stacks, like, internally reporting. Yeah. I mean, I think so.
10:18Like okay. What is the what is the the main benefit of using a local model versus something in the cloud is you don't burn tokens.
10:29You you know, you essentially the way to think about it is you're buying a machine.
10:34Mhmm. And
10:36correct me if I'm wrong, but you're buying a machine. We and we should talk about, like, some of the machines that people could potentially buy. But you're buying a machine.
10:45It's a, you know, one size. It's a one it's a cost. It could be 2,000, 5,000, 10,000.
10:51And then you can just run tasks. Right?
10:54So you're building a startup, you just want maybe you want conversion rate optimization. So maybe you say every day I'm gonna feed you customer feedback, and every single day I want you to work on the front end. And you just do that.
11:05So my question to you is, for local models specifically, for people who actually want to be building companies, shouldn't they be basically running it all the time on certain tasks, and how should people be thinking about it?
11:21Yeah. So I think that's a tough that's a little bit tough to answer, and I'll say why because, like, this model specifically is really resource intense intensive. So a lot of, I think, existing consumer computers may not be able to run this from what I've seen.
11:36Yes. I've been I've been running it on the cloud through OpenRouter directly.
11:42And what's the cost with that?
11:44Yeah. So with so I actually was trying to map out the token cost of model training. So if we had about 50,000 input tokens and 85,000 output tokens to get almost close to an OPUS 4.8 level of output, it will cost us 44¢.
12:03Whereas with OPUS 4.8, it costs you $2.38. So the you know, there's a big almost like a a big difference on the almost like five x, you know, price difference between Yeah.
12:17Which doesn't sound like a lot when you're like, oh, $2 here, 44¢ here. But when you're actually using these things and you're running it all the time and you have pretty, you know, big tasks that you're going after and you don't wanna be constrained by token costs, so it's a big deal.
12:33Five x is a big deal.
12:35Yeah. Yeah. And this is based on kind of the averages on, like, the coding benchmarks and what what Cursor charges you to the API pool.
12:42But I wanna I wanna I wanna be also future proofing ourselves. If we got this far with GLM 5.2, I wonder what next six months looks like.
12:50Right? So would it be even worth like, I think it's also worth thinking about making the upfront investment in your machine right now to be able to potentially download and run these local models so that when we get to g l g l 5.3 or 5.5, we're essentially made the upfront investment in the in the in the compute and equipment to now save a lot more in the long run with other feature models that are gonna come out that are gonna be much more extensive.
13:16Because I think this you know, we're seeing the the the AI subsidy on tokens. Right? Like, we're getting a lot more output out of Cloud, out of Codex, and I wonder, you know, I I personally seen it, I'm sure you have too, where it's like now we're hitting our usage a lot faster than we before.
13:30Especially when Fable came out, I ran I remember, like, I ran it in, like, in the first day I hit my limit, you know? Totally.
13:36So you're what you're saying is basically, like, if, you know, if you look at I mean, if you look at the history of VC backed startups, think about Uber. When Uber first came out, they actually subsidized rides and they got you hooked onto onto the onto the app.
13:52And then over time, they started increasing prices, increasing prices. What you're saying is with you know, in the AI age with a lot of these LLMs, they're gonna get you hooked into the workflows.
14:04You're gonna you're gonna build on top of it, and over time, you know, those subsidies are gonna go away as they go public and things like that. So what you're saying is maybe it's a good idea to actually invest in running this thing locally because, you know, the price of memory isn't getting cheaper.
14:25Mhmm. And the price of tokens aren't getting cheaper. Mhmm.
14:29So building it now, securing it while you can might be a good idea.
14:34Yeah. And to tie this all together, I'd say two things. One, harnesses that are agnostic on models.
14:41So like for example, Cursor, where you're able to run multiple models across the same sequence of tasks are gonna actually potentially benefit from this. Right?
14:49So, you know, I wouldn't be surprised if one, you know, Cursor decides to directly support GLM 5.2 as a model provider and lets you kind of tap into that cost saving if you couple it with like Composer 2.5.
15:04So this is where it goes into model training, right, where I was, for example, looking at earlier in this task here, I wanted to refine the hero section.
15:13So what I did is I actually used Opus 4.8 to to first import screenshots because Genlion 5.2 doesn't support vision capabilities.
15:24So what I did is I actually used Opus 4.8 to import screenshots and explain back to me what it sees. Right?
15:32I was like, tell me what you see specifically on the front end design for the hero section and lay it out. And then I switch to GLM 5.2 to study that layout and then actually act on making those changes.
15:44So it's kind of a way to, like, circumvent the fact that you have limitations to what GLM five point can do 5.2 can do on, like, image capabilities, but you're able to kinda now train the expensive model to think through the plan and then get the same level of frontier, like, frontier level, like, quality, but at a at a much affordable price point.
16:08I mean, that makes sense. So your your recommendation is basically, you know, there's it's almost like free trade versus protectionism.
16:17You know? The world you win you you know, not to be this isn't this isn't political. This is just an economic theory.
16:24Right? Which is, like, you know, when when when people are trading with amongst each other Mhmm. You know, maybe it's, you know, in Canada where you are, you know, you might want to trade with Florida because, you know, we got good oranges here.
16:39And we might want to trade you know, we don't get we can't make maple syrup here, so we'll get your maple syrup. Right? We'll get the milk maple syrup.
16:47Yeah. Exactly. Yeah.
16:47So make the best use of it. Yeah. Make the best use of it.
16:50So you're saying is, you know, using cursor and and basically, you know, you don't have to use cursor. Right?
16:58You can use whatever you'd like. Codec, pod code, yeah. Use one of those to basically say like, okay, for certain tasks, I'm gonna be using local models.
17:06For certain tasks, I'm gonna be using the best in class cloud models. And then together, ultimately, you're getting, you know, great results in terms of the output, but you're also not spending through the wazoo.
17:17And if you're if you're a token maxi or like you and me are, like, in the sense of, like, we we're we're always pushing into a limit around anything we're building to get the most out of AI because, you know, we don't wanna hire a 100 people, 500 people, and stuff like that, it's helpful to do that.
17:36And exactly, anecdotally, on two parts. Right?
17:39One, internally within our company. You know, I think Satya at Microsoft, you know, mentioned how, like, human capital plus token usage is now a big factor into what they're doing.
17:50Right? A lot of companies are now moving away from, you know, having direct access to the Cloud Code API to run the tokens because of how expensive it's become. Right?
17:59So they're canceling subscriptions. So we're seeing this firsthand with a lot of companies now are saying, okay. Cool.
18:04This first year was great. You know, we had the mandate, you know, AI adoption, token maxing. You know, that's how we're gonna measure success, and that's how we're gonna become AI native.
18:12Now they're like, wait a minute. Okay. Cool.
18:13We've done we've done this, but we're spending way too much money on tokens. How can we now be more effective? Right?
18:19And I'm seeing this firsthand too where it's like, especially now. Right?
18:24In a way, you can have some sort of direct ROI between the tokens you're spending within the engineering team because you're like, okay.
18:31Cool. We're saving a lot of time. There's an output.
18:34You know, engineers are expensive. We get that. But now you're providing the same level of harnesses and models to the non engineering teams that, you know, are one shotting, like, hey.
18:44Help me format this email, and they're using Opus 4.8 high thinking. They're like, maybe that's probably not the right model, and that's a governance issue. Right?
18:51That's a big Yeah. And I'm having these conversations with companies right now where they're saying, hey. Can you help us figure out, like, how to build governance and proper education on how to actually use the right models?
18:59And this is where I think model training is a big factor. Right? By the way, you know, John at marketing, maybe you shouldn't use Opens 4.8 to run this, like to just format this email for you and just helping them understand that.
19:11And I think, you know, I wouldn't be surprised if in a year from now, companies start you know, we've been thinking about it as well. We're like, hey. Why don't we just get our own machines and start running some local models because it's a lot more effective, especially how much money we're spending on tokens.
19:23What's the like, just to play devil's advocate, why wouldn't I just use OpenRouter and call it a day?
19:30Please do. Yeah. Yeah.
19:32Absolutely. I think they should. Because, you know, when I'm on on X, I see a lot of people being like, buy a Mac Studio or buy, you know, these expensive devices.
19:41You know, for for people listening, do they need to buy a local piece of hardware, like, if the price even goes up two x, or should they just use OpenRouter and Cursor or or or Cloudco, you know, whatever harness they they want?
19:58If I just have this right machine, I'll get to this, you know, result. That's not how it works. You know?
20:03No. You don't need a Mac Mini. You don't need this equipment.
20:05You can get started today. You know? And what I love about OpenRouter and all these other tools is, again, they're so agnostic.
20:11They make it easy for you to be able to access this in the cloud. They run the models locally. And it's credit based.
20:17Low $20, get it going, and easy to set up. Um, I highly, highly recommend if you're starting to dabble with this and token usage is a thing for you, get one of these agent hardnesses set up now that there are a lot of them are model agnostic.
20:30Run some tokens in OpenRouter, get these open models in there, and start start vibing. Just get some like, I love to experiment to see how far I can take this.
20:39What if I plan with Opus, review with GLM five point execute with 5.2, and then review with co Composer 2.5 or Codex 5.5.
20:48There's a lot of ways, and I think we can be really effective, and I think that's what the smart people are gonna be doing in the near future.
20:55Some people are saying, I don't care how much tokens cost because I think there's so much opportunity in building startups and and optimizing and AI arbitrage that I don't even care if it cost me whatever.
21:08What do you say to those people who are just basically ignoring this whole open source local AI movement?
21:15I used to be the exact same person, you know. In our first episode, you're like, yeah, how much does this all cost? I'm like, I don't know, I'm just vibe spending.
21:22And I think Yeah. My that mentality has changed now. I can see that my users are being hit faster, and my cost is going up, and now that our team is expanding internally as well.
21:32So I think as a solo person, it's a lot easier to build a case or rebuttal around why you should just token max as much as possible, which itself is kind of a like an it's ironic. You shouldn't be token maxing.
21:44You should be token minimizing as much as possible and output maxing instead. So my answer to that is if it works for you and you can directly have an ROI that you can show that, hey. I spent $200 and got a thousand out, great.
21:55Otherwise, sooner or later, the subsidy is gonna run out.
22:00Alright. Well, I think that's that's the episode, you know, unless there's anything else you wanna add before we bounce. Yeah.
22:06I mean, for people that are trying out, dabble with it, play around, have some, you know, see see what it can do at least in the front end for you and start working back in tests, and, yeah, I hope they they got they learned something from this.
22:17I'll include links for where to follow Amir. He's always one of my first calls whenever I'm trying out new stuff, and so I'm happy that you were able to jump on. I appreciate you.
22:26We appreciate you. Give give him a follow, like and comment this video, let us know what you think.
22:32We'll be in the in the comment section. Just just, you know, out there trying to help and learn and and Yeah.
22:40And thanks a lot, Amir. I'll catch you Thanks on the next
22:43for having me.
The Hook

The bait, then the rug-pull.

Every few months a new open-source model goes viral claiming to match the frontier. Most do not survive contact with real work. GLM 5.2 is different enough to warrant a tactical look — and this episode does what the hype threads do not: it shows you exactly how to plug it into the tools you already use and gives you the token math to decide whether it belongs in your workflow.

CTA Breakdown

How they asked for the click.

MENTIONED ON CAMERA
02:33productGLM 5.2 by ZAI
10:43toolCursor
22:22productHumblytics
FROM THE DESCRIPTION
PRIMARY CTAWhere the creator wants you to go next.
AFFILIATECommission earned if you click.
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this