Elon won after all
A 23-minute supply-chain autopsy explaining why Elon's reckless GPU overbuy is now the most valuable compute position in the world.
June 9thA 33-minute first-take from a developer who spent $3,000 on inference in 24 hours — benchmarks, real demos, session math, and the hidden safety intervention that silently degrades the model without telling you.
Fable 5 is a qualitative leap in AI coding capability — but Anthropic's undisclosed safety interventions mean you may pay full price for a silently degraded model without any indication it happened.
Fable 5 is the best available coding model — not because it scores highest on every benchmark, but because it writes code with more taste, tolerates vaguer instructions, and can be trusted to explore open-ended problems and return with real results. The cost is double Opus at $10/M input tokens and $50/M output, aggressive enough to burn $100 in eight minutes on heavy agentic work. The deeper concern is Anthropic's hidden safety system: for certain topics like frontier LLM development, the model is silently degraded via prompt modification and steering vectors with no notification, meaning you pay full price for a lobotomized response. Despite these concerns, the current window where $200/month subscriptions include Fable access is the best moment to push the model hard and discover where its real ceiling sits.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
Establishes the scale of testing done and sets up why Fable 5 matters even though Mythos 5 is the full release.

Blacksmith CI sponsor read — faster GitHub Actions builds, better logs and monitoring.

SWE-Bench Pro (80%), Frontier CodeBench Diamond (30% vs GPT 5.5's 5.7%), SkateBench (79%), TerminalBench penalty from safety filters.

Anomalous reasoning-curve data in Frontier CodeBench raises credibility questions. DeepSWE shows Fable comparable to GPT 5.5. Pricing: $10/M input, $50/M output.

15,000-line codebase modernization — worked in ~5 turns. Attempted full stack swap to TanStack/Convex/Clerk. Got further than expected but broke core functionality.

Switched to usage-based billing to finish a workflow — spent $100 in 8 minutes. Then maxed a second sub in 2 hours. 1 session = ~25% of weekly limit.

Fable routes to Opus 4.8 transparently for some topics. For frontier LLM development it silently degrades via prompt modification, steering vectors, PEFT. Bullshit Bench: 33% refusals. Artificial Analysis recorded 8% fallback vs stated <5%.

Fable has better design instinct than prior models. Team demos: Rust terminal text adventure, t3 code ported to Rust TUI, Minecraft clone with procedural assets, multiplayer racing game.

Used Fable to analyze its own SkateBench failures. Identified that two problems involving 'caballero / full cab' are nearly unsolvable for most models because of a jargon-compression mismatch. GPT 5.5 found no interesting trends; Fable did.

All Fable 5 traffic now requires 30-day retention regardless of trusted-access status. Simon built a 14MB WASM binding and full dataset agent entirely with Fable. 'Big model smell' — feels bigger, more capable.

Argument to push the model with vague exploratory tasks, use fuzzers and worktrees, clean up stale PRs in bulk. The economics of software development have changed. Get as much usage as you can during the $200/month window.
Fable 5's ceiling is not higher benchmark scores — it is a model that compresses domain knowledge correctly, tolerates vague instructions, and can be trusted to explore rather than just execute.
“It just feels smarter, like it's writing better code. It feels like a better employee or a better person with more years of experience than previous models.”
“Eight minutes. Eight fucking minutes to do a $100.”
“They are intentionally making the model dumber when you try to use it for certain things and they don't tell you when that happens.”
“It's like going from a really cracked junior engineer to a kinda laid back senior one, where it knows enough to make good decisions by itself.”
“I've never felt more like I'm along for the ride it's taking me on, rather than I am the one steering the ship.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
Twenty-four hours. Three thousand dollars in inference. Two $200 accounts burned through their five-hour session limits simultaneously. That's the data behind this video — not a spec sheet review, but a working developer's reckoning with a model that might actually be different.
The session and weekly limits on $200/month plans are more restrictive than they appear. You can exhaust the weekly limit in four full heavy sessions.
Anthropic has two tiers of safety intervention. The transparent one tells you it switched models. The silent one makes the model worse without disclosure.
“Go play with this. Push it to its limits. Let me know if you think I'm insane. Let me know what you're able to build with it.”
Organic, no hard sell. Ends on community engagement — share your builds.
00:00
00:31
00:57
01:22
01:51
02:27
02:40
03:08
03:29
03:54
04:19
04:43
05:00
05:33
05:58
06:11
06:38
07:14
07:36
08:01
08:15
08:54
09:15
09:39
10:04
10:29
10:53
11:18
11:43
12:07
12:32
12:57
13:21
13:42
14:11
14:35
15:00
15:25
15:49
16:24
16:39
17:04
17:16
17:53
18:06
18:39
19:07
19:21
19:56
20:11
20:46
21:01
21:35
22:05
22:24
22:49
23:14
23:38
24:03
24:33
24:52
25:16
25:50
26:06
26:31
27:03
27:20
27:45
28:10
28:34
28:59
29:24
29:48
30:13
30:38
31:02
31:27
31:52
32:16
32:43A 23-minute supply-chain autopsy explaining why Elon's reckless GPU overbuy is now the most valuable compute position in the world.
June 9thTheo scraps cursor, plan mode, and Claude after five months — here is exactly what replaced them.
May 27thTheo goes all-in on Claude Code over the holiday break — six parallel instances, no IDE opened, two projects from scratch — and comes back with a changed worldview on writing code.
January 6thA first-day review from a builder who burned 700 million tokens in one session — benchmarks, demos, and the habits that make the usage limits survivable.
June 10thA 21-minute first-hours take on the public release of the Mythos-class model — what it does, what it costs, and a practical framework for deploying it without burning your token budget.
June 9thA first-look review of Claude Fable 5 and Mythos 5 from someone with early access: benchmarks, pricing, firsthand quirks, and two live multi-agent demos.
June 9th