Claude Fable 5 is BANNED. What to do?
A 25-minute field guide to local AI models, written the weekend a government letter erased the world's most powerful model overnight.
June 13thA 22-minute tactical breakdown of how to plug an open-source local model into your existing AI coding harness — and why the token math makes it worth doing now.
Running a single frontier model for every task is a governance failure; a fusion approach that sequences planning, execution, and review across models by price and capability delivers near-frontier output at 20-30% of the cost.
GLM 5.2 from ZAI ships with a 1M-token context window and lands within about 7 points of Opus 4.8 on Terminal Bench 2.1 while costing roughly one-fifth as much per token — 44 cents versus $2.38 for a comparable task. The core argument is that local models are not replacements but sequenceable components: a frontier model handles vision-heavy planning steps GLM cannot do, then GLM executes the bulk of the build at scale. Setup takes a single API key from zed.ai pasted into Cursor's OpenAI field with the endpoint overridden, or an OpenRouter profile in Codex. The longer warning is about AI token subsidies behaving like Uber's early ride pricing — built to hook you before the prices normalize.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
Greg frames the episode: a tactical guide to local AI and GLM 5.2 in 20 minutes or less.

Amir introduces ZAI's GLM 5.2 release and why it marks an inflection point for local models.

1M context window, 81 on Terminal Bench 2.1, about 4 points behind Opus 4.8; strong long-horizon task performance.

Both hosts admit benchmarks feel abstract; Amir's heuristic is to just build with the model and judge the output.

Step-by-step: get a ZAI API key from zed.ai, paste into Cursor's OpenAI field, override the endpoint, add GLM 5.2 as a custom model.

Alternative path: OpenRouter key into Codex profile with model name and context window; switch from the CLI.

Amir frames the economic case for local hardware: one-time cost, unlimited runs.

Real numbers: 50k input + 85k output tokens for near-Opus quality = 44c on GLM 5.2 vs $2.38 on Opus 4.8. Five-times difference at scale.

Greg frames AI token subsidies as Uber's early cheap rides. Amir argues the upfront hardware bet pays off when GLM 5.3/5.5 arrive.

Fusion approach in practice: Opus 4.8 reads screenshots and describes the layout, GLM 5.2 executes the changes.

Amir reframes the mindset: token minimization plus output maximization. OpenRouter is the easiest on-ramp — $20 in credits, no hardware needed.

Wrap-up and links to Amir's socials.
The default of using one powerful model for everything is both expensive and unnecessary — model chaining unlocks near-frontier quality at a fraction of the cost once you know which tasks require which tier.
“You shouldn't be token maxing. You should be token minimizing as much as possible and output maxing instead.”
“It will cost us 44 cents. Whereas with Opus 4.8, it costs you $2.38.”
“Think about Uber. When Uber first came out, they actually subsidized rides and they got you hooked onto the app. And then over time, they started increasing prices.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
Every few months a new open-source model goes viral claiming to match the frontier. Most do not survive contact with real work. GLM 5.2 is different enough to warrant a tactical look — and this episode does what the hype threads do not: it shows you exactly how to plug it into the tools you already use and gives you the token math to decide whether it belongs in your workflow.
00:01
00:20
00:35
00:51
01:19
01:30
01:38
01:56
02:09
02:33
02:49
03:05
03:21
03:37
03:53
04:09
04:25
04:41
04:57
05:13
05:30
05:46
06:02
06:18
06:34
06:51
07:11
07:31
07:50
08:10
08:30
08:51
09:11
09:28
09:48
10:08
10:29
10:43
11:00
11:16
11:33
11:50
12:06
12:22
12:39
12:55
13:11
13:27
13:44
14:02
14:19
14:37
14:54
15:07
15:30
15:47
16:07
16:22
16:40
16:57
17:14
17:31
17:48
18:06
18:23
18:40
18:57
19:14
19:32
19:50
20:08
20:26
20:44
21:02
21:18
21:34
21:50
22:05
22:16
22:34A 25-minute field guide to local AI models, written the weekend a government letter erased the world's most powerful model overnight.
June 13thA 19-year-old founder breaks down the exact framework he used to turn a wrestling app into $200K in revenue — with no coding background.
June 15thA 22-minute honest debrief on agentic loops — what they are, why well-funded builders swear by them, and the one case where they actually work.
June 9thA 57-minute masterclass on the three-layer system that separates companies that merely use AI from organizations that get smarter every day.
June 8thA Digg founder walks through the full pipeline of a personal Techmeme-clone he built alone — from RSS to vector clusters to an editorial gravity engine.
February 2ndAlex Finn walks through every surface of the new Hermes Desktop app and shares the session management insight that turns a $1,000/month bill into almost nothing.
June 6th