Modern Creator
Matthew Berman · YouTube

MYTHOS MYTHOS MYTHOS

A first-look review of Claude Fable 5 and Mythos 5 from someone with early access: benchmarks, pricing, firsthand quirks, and two live multi-agent demos.

Posted
yesterday
Duration
Format
Review
educational
Views
38.6K
2.1K likes
Big Idea

The argument in one line.

Fable 5 is not just a better model but a qualitatively different kind of AI that demands a new workflow philosophy: start at the lowest effort setting, route ruthlessly by task difficulty, and expect the model to treat every prompt as a massive autonomous exploration rather than a quick answer.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • A developer or technical founder actively using Claude Code or similar agentic coding tools who wants a calibrated read from someone with early access.
  • Anyone evaluating whether to pay 50 dollars per million output tokens for Fable 5 versus routing back to Sonnet or Haiku.
  • A builder curious about Ultracode multi-agent parallel workflows and what running 60-plus sub-agents on a single prompt actually looks like live.
  • Someone who has heard the hype and wants a candid account of the friction: verbose output, clarifying question loops, and slow cold starts.
SKIP IF…
  • You want a formal peer-reviewed capability evaluation; this is one practitioner's firsthand impressions, not a systematic benchmark study.
  • You are not yet using AI in a coding or technical context; the pricing and routing discussion assumes you are already spending meaningfully on tokens.
TL;DR

The full version, fast.

Claude Fable 5 is the publicly available version of the Mythos-class model with guardrails re-applied. On coding benchmarks it leads the field, and in practice it approaches every task like a sprawling autonomous exploration. The friction is real: it is verbose to the point of being hard to read, it wants to ask clarifying questions on everything, and it starts slow before suddenly burning millions of tokens in parallel via Ultracode. The practical guidance: always start at the lowest effort setting, route simpler tasks back to cheaper models, and recognize that the real unlock comes from pairing Fable with Ultracode workflows and loop automation.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:42

01 · Hook and setup

Anthropic released Mythos publicly; host has early access and promises a real take.

00:4301:56

02 · What Fable 5 is

Fable is the Mythos-class model with guardrails; Mythos is the unrestricted version for security researchers only.

01:5704:19

03 · Benchmarks

SWE-bench Pro at 80.3 percent, Agenta Coding at 29.3 percent, GDP-val at 1932, computer use at 85 percent, terminal bench at 88 percent. Consistent lead across the board.

04:2006:47

04 · Firsthand experience

Every task feels like kicking off a massive exploration. Complex tasks completed without hiccup. Model felt insulted by the hardest prompt given.

06:4809:05

05 · Blog post walkthrough and pricing

10 dollars per million input, 50 per million output, less than half of Mythos Preview. Safeguards trigger under 5 percent of sessions. Model routing is the core skill to develop.

09:0611:07

06 · Long-horizon autonomy and token efficiency

Stripe compressed months of engineering into days on a 50M-line Ruby codebase. Information density so high output is hard for humans to parse.

11:0812:15

07 · Information density and AI language tangent

Speculative: future AI models could develop hyper-dense non-alphanumeric language that only other models can read, raising interpretability risks.

12:1614:59

08 · Effort levels and Ultracode

Start on lowest effort. Ultracode spawns hundreds of sub-agents. Noam Brown: no apparent ceiling on quality versus thinking tokens.

15:0017:09

09 · Demo showcases

Pokemon FireRed cleared with vision only. Solar system eclipse simulation. Model demos feel less meaningful now because all frontier models can do them.

17:1019:26

10 · Fear-based marketing and data retention

Six-month delay was intentional to accelerate Anthropic internal research. 30-day data retention for Mythos-class traffic. Distillation attempts fall back to Opus 4.8.

19:2724:22

11 · Quirks deep dive

Verbose and information-dense output. Clarifying question loops before any work starts. Cold start: 5 to 8 minutes at 1500 tokens then explosion to 1.5 million in 30 seconds.

24:2325:15

12 · Loops and software factories

Fable plus Ultracode plus loops equals software factories. Model overhang is real. Even the labs are not fully utilizing what is already there.

25:1627:43

13 · Live tests

Rubik cube: 3D interactive scramble and solve with realistic lighting. Fluid dynamics: 63 parallel agents, interactive browser simulation with adjustable dials.

27:4428:16

14 · Outro

Verdict: incredible, and what unlocks it is pairing with workflows and loops. CTA to the loops video.

Atomic Insights

Lines worth screenshotting.

  • Fable 5 is the Mythos-class model with guardrails applied; the unrestricted Mythos version is only available to vetted security researchers.
  • SWE-bench Pro shows Fable 5 at 80.3 percent, Claude Opus 4.8 at 69 percent, and GPT-5.5 at 58 percent, a ten-point gap at each tier.
  • Fable 5 costs 10 dollars per million input tokens and 50 dollars per million output tokens, less than half the price of Claude Mythos Preview.
  • Stripe migrated a 50-million-line Ruby codebase in one day with Fable 5, a task that would have taken a full team over two months by hand.
  • The model is so token-hungry that starting at the lowest effort setting is not optional; it over-engineers even trivial prompts at medium effort.
  • A single Ultracode prompt spun up 63 parallel sub-agents for a fluid dynamics task, burning 20 to 30 thousand tokens per agent in under two minutes.
  • The cold start is disorienting: 5 to 8 minutes at roughly 1500 tokens, then a jump to 1.5 million tokens in 30 seconds.
  • A single prompt triggers 3 to 5 clarifying questions, a summary confirmation, a spec review, and an approach confirmation before work begins.
  • Information density in Fable output is so high the host had to slow his reading pace and repeatedly ask for simpler explanations.
  • Noam Brown at OpenAI found no apparent ceiling on the quality-versus-thinking-tokens curve; throwing more compute at a problem keeps improving results.
  • The model overhang is real: even Anthropic and OpenAI are likely not fully utilizing what these models can already do.
  • Earlier Claude models needed a complex helper harness to play Pokemon FireRed; Fable 5 completed it with vision alone and no maps or game-state aids.
  • If distillation attempts on Fable 5 are detected, Anthropic classifiers silently fall back to serving Claude Opus 4.8 instead.
  • Loops plus Ultracode workflows plus Fable together represent software factories where the model burns tokens autonomously toward a goal without human checkpoints.
Takeaway

How to actually get value from a frontier model.

WHAT TO LEARN

Fable 5 is so capable that using it wrong produces expensive, slow, over-engineered results instead of the software factory it can become.

  • Always start at the lowest effort or thinking setting and dial up only when the output is genuinely insufficient; at medium effort the model treats a two-line task like a full engineering sprint.
  • Route by task difficulty, not by default; reserve frontier models for problems where the cost of a slower cheaper model is higher than the compute bill, and most prompts in a real workflow do not qualify.
  • The clarifying-question behavior is a signal that your prompt lacks a clear success criterion; writing tighter task specs with explicit scope eliminates most of the back-and-forth.
  • Information density in model output is not the same as quality; verbose technically dense explanations can obscure whether the model understood the task or is over-elaborating a near-miss.
  • Multi-agent parallelism changes the cost model entirely; 60-plus agents burning 20 to 30 thousand tokens each in parallel is a per-goal cost that needs to be budgeted as infrastructure, not a per-prompt expense.
  • The cold start period of 5 to 8 minutes at low token counts is real planning time; interrupting it resets the context and wastes the investment already made.
  • Workflow design and loop abstraction are now more valuable skills than prompt engineering alone; the gap between what these models can do and what most practitioners extract from them is large and growing.
Glossary

Terms worth knowing.

Fable 5
The publicly available Mythos-class model from Anthropic with safety guardrails applied. Same base capability as Mythos but restricted from certain security-research behaviors.
Mythos 5
The unrestricted version of the Mythos-class model, available only to vetted security researchers and organizations through a trusted access program.
Ultracode
A Claude Code workflow mode that launches a planning agent which then delegates work to potentially hundreds of parallel sub-agents simultaneously.
Loops
An abstraction layer above agentic engineering where a model runs continuously toward a defined goal until a completion condition is met, without waiting for human approval between steps.
Model routing
The practice of directing different tasks to different model tiers to manage cost while preserving quality where it matters most.
Model overhang
The gap between what current AI models are technically capable of and what practitioners are actually extracting from them; the models are ahead of the workflows built to use them.
Information density
The amount of meaningful content packed into each token or word of model output. High density improves compute efficiency but makes output harder for humans to parse.
SWE-bench Pro
A coding benchmark testing whether models can resolve real GitHub issues in large open-source repositories, used as a proxy for production-level software engineering capability.
FrontierCode
Cognition's evaluation that tests whether models can pass difficult coding tasks while meeting standards of high-quality production codebases, including accuracy-versus-cost tradeoffs.
Distillation
Training a smaller or competing model on the outputs of a frontier model to transfer its capabilities without direct access to the original training data or weights.
Resources

Things they pointed at.

06:23productHear.now
14:10toolFrontierCode by Cognition
14:14linkNoam Brown blog post on thinking token scaling
09:31linkStripe codebase migration case study
Quotables

Lines you could clip.

21:04
I felt dumb reading a fable telling me what it just did. I felt dumb. It was not a good feeling.
Visceral and specific - humanizes the verbosity problem in a way that resonates immediatelyTikTok hook↗ Tweet quote
24:40
When you have the ability for the model which is already incredibly autonomous, and then you also parallelize it with workflows, and then you wrap it in a loop - I just cannot imagine how powerful that is.
Clean thesis statement for the software factory concept, builds anticipationIG reel cold open↗ Tweet quote
25:00
Only a fraction of everybody doing agentic engineering is even scratching the surface of what is possible. I think not even Anthropic and OpenAI are fully utilizing what is there.
Contrarian claim from a credible early tester with strong pull-quote energynewsletter pull-quote↗ Tweet quote
05:16
You give it a small task, and it no longer felt small once you hit enter.
Short and punchy - captures the feel of the model better than any benchmark numberTikTok hook↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogystory
00:00Anthropic finally released Mythos. This is the model that Anthropic said was too dangerous to release publicly.
00:07And guess what? They released it publicly. And I'm one of just a few people who has had early access to this model, and I'm gonna tell you something.
00:16Before I even get into the video, the model is something else. It is absolutely a new class of artificial intelligence.
00:24And if you enjoy seeing me break down the latest model releases, be sure to like this video, subscribe to the channel. It really does help. And this video is brought to you by Heardot now.
00:34More on them later. Introducing Claude Fable five. This is Fable, the new family of models.
00:40So remember, you have Haiku, Sonnet, Opus. Now we have Fable.
00:45That is the the brand new generation of model trained in the class of Mythos model. So a Mythos class model that we've made safe for general use. Its capabilities exceed those of any model we've ever made generally available.
00:59Let's look at the benchmarks. Everybody wants to know about the benchmarks. That's what we're gonna be talking about first.
01:04Okay. So here we go. We have Claude Mythos five and Fable five, and they've basically grouped them into one category.
01:12Remember, the only difference between Mythos and Fable is that Fable has guardrails. Mythos has those guardrails removed.
01:19Mythos is given to the security community to help harden software, to help find bugs, and Fable, on the other hand, is kind of with guardrails, won't do those types of things, but it's good for everything else.
01:30So when we talk about benchmarks, I know benchmarks can be misleading in a lot of ways. I know we look at benchmarks, and it seems like every model release just increase the benchmark, increase the number, number goes up no matter what.
01:43But even when we use it, the vibes of the model don't reflect what we're seeing in the benchmark. And that's definitely the case, maybe with the exception of DeepSwee, which is a benchmark that came out a few weeks ago. Unfortunately, I do not see that benchmark here.
01:58So we're just gonna have to go on kind of the more traditional benchmarks. Agenza coating, SWE bench pro, 80% as compared to OPUS 4.8 at 69%, GPT 5.5 at 58%.
02:11Now here's the interesting thing. Everybody that I talk to says GPT 5.5 is more capable than OPUS 4.8.
02:18However, on SWEENBench Pro, we're seeing a 10 differential, and we're also seeing another 10 differential between Claude OPUS four eight and Mythos.
02:28Agenza coating, Frontier Code Diamond, 29.3%. Look at this. OPUS 4.8, half of that.
02:35GPT 5.5, 5.7. It doesn't feel like these benchmarks are accurate to what I was feeling using the actual models between OPUS four eight and g p t 5.5. GDP val.
02:46This is a benchmark created by OpenAI. G p t 5.5, seventeen sixty nine versus eighteen ninety with OPUS four eight, now nineteen thirty two. GDP val tests real world knowledge work.
02:58For spatial reasoning, g p t 5.5 did really well. Claude Opus underperformed, but now we have a new frontier model, 38.6 with Fable.
03:09Tool use got a nice few percentage point bump here. Computer use, 85% as compared to 7883.
03:18To be honest, GPT 5.5 feels like within the context of the Codex app, the best computer use model, the best browser use model that I've used.
03:28Legal agent benchmark, 2% versus 10% versus now 13%. Humanity's last exam, the most scarily named benchmark on the planet.
03:39We now have two new frontier models, two new first place models with a slight decrease with tools from the preview to what we're seeing today. Here's terminal bench, a very, very important benchmark if you're doing any type of agentic coating.
03:54We have 83.4 versus 88%. Okay.
03:57Very, very good. Alright. Let's look at the blog post.
04:00Of course, you know, take everything they say in the blog post with a grain of salt. Obviously, they're gonna talk it up. Obviously, they're gonna say it's the best model in the world.
04:07I can tell you from firsthand experience, it is a phenomenal model. It is more different than any other model that I've ever tested. A lot of the times, it's better.
04:16Sometimes, it's just weird and has these quirks to it. It feels very different.
04:21It feels like a new training run. And remember, this is a 10,000,000,000,000 parameter model, the first one of its kind.
04:28Alright. Fable five's capabilities exceed those of any model we've ever made generally available.
04:34It is state of the art on nearly all tested benchmarks of AI capability. Now I cannot wait till it gets tested on Deep Suite, showing exceptional performance software engineering, knowledge work, vision, scientific research, many other areas. The longer and more complex the task, the larger Fable five's lead over our other models.
04:52Now I wanna pause here. This is what I saw most of all. No matter what task I gave it, it felt like I was kicking off a massive exploration.
05:03Fable wanted to look at my entire code base, consider every single possible angle of every line of code that I ever wrote.
05:13Maybe it even checked projects that I haven't touched in years. It really just felt like I was kicking off this entire exploration.
05:21It was really it's such a weird feeling because you give it a small task, and it no longer felt small once you hit enter. And it was very capable.
05:29Now when I did give it tasks that I thought were extremely complex that would require long time horizons to complete, it was like no problem. It it didn't hiccup.
05:40It just it was like, okay. I got it. And in fact, even the most complex task that I gave it, it was almost, like, insulted by how simple it was.
05:49That's really what it felt like. And this video is brought to you by Here dot They helped me publish all of the tests I was doing during the livestream. Check it out.
05:58Here dot now is one of my favorite products to tell you about because, one, it is awesome and I actually use it, and, two, it has actually inspired the way that I think about the future of the Internet. So if you haven't heard about it before, hear.now is the easiest way to give publishing ability to your agent.
06:16Whether you use Cloud Code or Codex or OpenClaw or Hermes, all you have to do is tell your agent to go to hear.now and install the skill.
06:25Or you just come to this page right here, click the copy setup button, paste it into your agent, and it just knows how to do it. Then at that point, your agent can publish anything to the web on your behalf. And they also recently launched private storage, so you don't need to always just publish everything publicly.
06:43You can have your agent store pretty much anything on here.now. And then even more recently, they launched custom URLs. So rather than only having a here.now URL, you can use your custom domain with here.now and publish directly to it.
06:57And the best part, it's completely free right now. So go check it out. I'm gonna link all of it down below, but tier.now.
07:03It's super easy. So now back to the video. So to release the moddable safely and quickly, we've tuned these safeguards conservatively.
07:11They'll sometimes catch harmless requests, though they trigger on average in less than 5% of sessions. Now during my testing, I did not experience this once.
07:20I did get a heads up that the model might have many more false positives than what I was used to. I didn't have that once.
07:28Now I, you know, I wasn't really explicitly telling it to look over the security of my application, try to come up with ways to improve it, but I didn't experience that once. Alright.
07:39Let's keep going. Let's talk about the pricing. And if you are surprised by the pricing, don't be.
07:45I actually thought it was gonna be more expensive. So Fable five and Mythos five are being offered at $10 per million input tokens and $50 per million output tokens. This is incredibly expensive.
07:55But here's the thing. You don't actually need Fable for the vast majority of use cases. If you've been watching the channel, you know we've been talking about model routing a lot.
08:04You know we've been talking about efficiency, the kind of multimodal world that we're almost definitely going to be experiencing in the coming years.
08:13This is a perfect example. You have the absolute frontier with Fable, and you give it your most difficult problems, and you pay that $50 per million output tokens, and you say thank you, sir. And then for everything else, you don't need it.
08:25You can go back to Sonnet. You can go back to Haiku. I really encourage you to think about what task you're assigning to which model.
08:33The more you know about that, the more prepared you're gonna be for the coming years. We're already seeing companies balk at these crazy bills that they're receiving from Anthropic and OpenAI.
08:45So if you know how to route the task properly, you're gonna be in a good position. It is less than half the price of Claude Mythos preview. Very interesting.
08:52Fable five and Mythos five can work autonomously for longer than any previous Claude models. Below, we discuss how these skills apply to software engineering and cover the model's improved capabilities and knowledge work, vision, memory, and life sciences research. This is what I noticed most of all.
09:08The model was so capable of doing tasks for long periods of time. And as I mentioned, there really wasn't a task that I gave it that it turned around and just gave me a quick answer or turned around in in, like, two minutes and and gave me something.
09:22It never did that. It was, you know, five plus minutes minimum. During early testing, Stripe reported that Fable five compressed months of engineering into days.
09:31In a 50,000,000 line Ruby codebase, shout out Ruby, the model performed a codebase wide migration in a day that would otherwise have taken a whole team over two months by hand.
09:42And this is really it. Right? Like, if this is the type of task you're looking to complete, $50 per million output tokens is a bargain because otherwise, you're going to be paying a ton of engineers Stripe salaries to work on it for two months.
09:58And now that same team can oversee its progress and work on other things parallel. Here's something else that I'm gonna talk about.
10:06Fable five is also more token efficient than past Claude models. This is very interesting.
10:14Now let me share my experience. The information density coming out of Fable was unlike anything I've ever seen.
10:21And so it might be more token efficient from an algorithmic standpoint, but just simply from the output explanations of what it was accomplishing for me, the information density was hard to read at times if I'm being honest.
10:35Like, it was using extremely complex words, extremely complex descriptions. It was very verbose, but not only verbose, it was information dense.
10:45And so when you have information density, it is effectively increasing the intelligence of the model.
10:51Because then with those fewer words, you can do more. Thus, you could throw more compute at the inference. It can run for the same amount of time and get more out of that same compute time than it would for OPUS 4.8.
11:04So that is really interesting, and it actually leads me to think about something else. This is kinda, I think, way in the future, and maybe we won't allow it at all, but information density seems like something we're not really talking about all that much.
11:17How much information can you get out of every word conveyed by the model? Now imagine there was a way to increase information density even further.
11:25Now for me reading the output of Fable, I actually found it very difficult. I had to slow down my reading pace. I had to really think about every word that it was telling me, but another model doesn't have to do that.
11:38And so there's this argument that Fable or future AI models might actually develop their own language, a hyper information dense language that only it can read. Now, obviously, there are some big problems with that, some big risks.
11:53If we can't read it, then we have no idea what they're talking about. We have no idea what they're planning, what their intentions are, and so that becomes extremely risky.
12:02But there is this argument that we would actually have a much higher efficiency and be able to get so much more out of the models if they simply communicated in maybe a non alphanumeric language, symbols maybe.
12:16I I don't know. It's so interesting to think about though, and I only started thinking about this because of the information density out of the models. So on Cognition's frontier code evaluation, which tests whether models can pass difficult coding tasks while meeting the standards of high quality production code bases, Fable five scores highest among frontier models even at medium effort.
12:35I tried two different effort levels. Most of all, I used the extra effort level, and it was just overkill.
12:43It was slow, and it it felt like it was way too powerful. It felt like I can dial down the effort on the model to the lowest possible thing, and it might even still be too high of effort for what I needed. It's crazy to think about.
12:57Then Ultracode. So Ultracode, if you're not familiar, is their new workflows feature.
13:04It basically kicks off a planning agent which delegates out to potentially hundreds of sub agents in parallel. And I saw this live, and it is crazy to watch live.
13:15You see it start to think maybe a few minutes of just planning. And then it would literally I I gave it a task of review my entire code base. Give me a report.
13:24And it would almost spin up an individual sub agent for every single file I had in my entire code base. And I would see all 100 plus of them running in parallel and just obviously watching my token budget just explode.
13:39But fascinating to watch, and it feels like Fable is really good at utilizing the workflow's feature, parallel delegation of agents. And so I encourage you, as you're using this model, as you're using Fable, start on the lowest possible thinking effort setting.
13:56And most likely, it's gonna be sufficient for your use cases. Here's Frontier code, mean cost per task, log scale.
14:04This is the score. Here we go. Yeah.
14:07So one thing is interesting. I think it was a who was it yesterday who had that great blog post?
14:14Brown from OpenAI, maybe? So Noam Brown, who's a researcher at OpenAI, talked about this yesterday. He wrote this fantastic blog post about there really doesn't seem to be a limit in thinking tokens to quality relationship.
14:27Meaning, you can continue to throw tokens at a problem, and it will continue to improve the output. There doesn't really seem to be a limit or at least we have not found it yet. And so what you're seeing here is even when you kinda come up on the 100,000,000 tokens, the models are still improving.
14:45There just doesn't I mean, just continue to throw tokens at these problems, and it seems like the compute demand might actually be higher than we even anticipated, which was already high to begin with.
14:57Mythos, Fable really emphasizes this point where there doesn't seem to be a limit in the number of tokens you can throw at a problem.
15:04Fable in particular seems incredibly willing and eager to use all of those tokens. So really, like, dial down the effort and only dial it up when you need it for sure.
15:15Here's a video. Claude Fable five beats Pokemon FireRed only using vision. Wow.
15:20Full speed on this one. No maps, no navigation aids, or extra game state information. Earlier Claude models needed a complex helper harness to play Pokemon.
15:30Claude Fable five completed the game with vision alone. Very, very cool. Here is a simulation of the solar system and predicting solar eclipses.
15:39This doesn't seem all that crazy. We've seen stuff like this. Here's the problem with showing off demos of the model.
15:49All the models today, OPUS 4.8, GPT 5.5, they're very capable of building exactly this.
15:56That's actually one of the reasons I stopped doing model tests in my videos because I couldn't even come up with tests that were difficult enough for it, which is kinda wild to think about. There was no test that I could come up with where I was like, wow.
16:10This is a standout model because it was able to achieve this thing. The last time that happened and really one of the last model tests I ever ran was for Gemini 2.5 Pro where it was able to simulate a Rubik's cube and actually scramble it and solve it correctly. That was really the last time where I was blown away by something a model was able to do that previous versions were not.
16:30Now it's all about the vibe of the model. It's you have to actually get in and really start using it and see where those kind of edges are, and I haven't found any with Fable five yet.
16:41Mythos five, our internal protein design experts accelerated aspects of the drug design process by around 10 times.
16:49Very interesting. One other thing I wanna talk about that's super interesting is their release and fear based marketing. Right?
16:56So it was, like, a few months ago that we heard about Mythos for the first time. And then we just got reports that Anthropic has been testing Mythos since January. And even though we knew about it publicly when it Mythos was announced, nobody had it.
17:11And it wasn't until now 06/09/2026 that we actually had access to the model. I think this was very intentional.
17:18I think Anthropic wanted to keep the model to build their next model. That is the reason.
17:24They want to accelerate their own development. They want to accelerate their own research to the point where they feel comfortable that they have a sufficient lead, and then they can release the model to everybody else.
17:35And they have a history of this. If you remember, they cut off XAI's access to the Cloud family of models because they didn't want their competitors building using their own models, using their own technology.
17:49So they have a long history of this. So something to something to keep in mind. So when they're talking about the safety of the model, we've previously identified large scale attempts, and they link the article in which they talk about it, and we covered it in a video on this channel, to extract Claude's capabilities to train competing models in authoritarian countries.
18:08Distillation of Fable five's abilities could indirectly lead to the proliferation of near frontier AI capabilities. We already have near frontier. Now that I've seen Fable, I think open source is more than six months behind.
18:21I think it's probably closer to a year behind. And these could be released without the appropriate safeguards. Requests that are flagged by our classifiers as being part of such desolation attempts will fall back to Opus 4.8.
18:33Woah. They're like, oh, you wanna distill? Go ahead.
18:37You can have our old model. OPUS 4.8. That's so funny.
18:40A new data retention policy, changing the way that we handle business customer data for Fable five Mythos five and future models with similar or higher capability levels. We will require thirty day retention for all traffic on Mythos class models on both first and third party services.
18:57We won't use this data to train new cloud models or for any non safety related purpose. We've instituted new privacy protections, including logging all human access to the data and ensuring a solution after thirty days in almost all cases. And why do they do that?
19:12No. They are not using it to train. Although, if you're a direct customer, not a business customer, maybe they are.
19:19But this data will help us defend against complex and novel attacks, including new jailbreaks, cliney, I'm sure you're gonna love this, and attacks that operate across many requests as well as help us identify and reduce false positives. Fable five is available everywhere today.
19:35So if you wanna try it, you can try it right now. Pricing for both models, $10 per million input, 50 per million output. Developers can use Fable five in the cloud API.
19:44We expect demand for Fable five to be very high. Now here's the thing. As I was using it, it did feel very slow, and I wonder if that's a function of just the size of the model, also how many tokens it kind of was eager to use, how thorough it wanted to be.
19:59These are all part of what makes it feel slow. Now I wanna go over I wrote a a little review on Twitter. So I talked about the model being really good.
20:11Let's just get that out of the way. It is incredibly good. One of my favorite prompts to give a new model that I'm testing is review the entire code base, and it reviews it for security.
20:19It reviews it for documentation, logical gaps, edge cases, UX, UI, workflows, test coverage, everything.
20:29And it found things that no other model saw. And I will say this, it actually said that my code was in a really good place already. So it found things that it didn't find a ton of things, but it did find things, and all of it has been built with either OPUS four eight or GPT 5.5.
20:47So keep that in mind. But I know all of you know that the model is good, or at least, like, I've said it enough times. Let's talk about the quirks.
20:55It is incredibly verbose. I already talked about this.
20:59Explanations get super deep, super technical very quickly. And I had to update my Claude MD file multiple times to try to encourage it to simplify its explanation to me.
21:12I would often, more often than not, have to tell it, Please simplify the explanation. I felt dumb reading a fable telling me what it just did.
21:23I felt dumb. It was not a good feeling. Right?
21:26This is the first time where I was like, god. Can I just explain it like I'm five? And I've never really had to say that to another model.
21:34And I talked about the information density already. I talked about the fact that there's an argument because of how well information density works to increase efficiency of a model, maybe agents will develop AI will develop their own hyperdense language.
21:50Here's another really odd thing. It wanted to ask clarifying questions so very badly.
21:58It was so annoying. And, like, yeah, I'm okay with a couple questions for complex tasks, but no matter how difficult the task, it was asking me to clarify questions. So here was the flow.
22:09A single prompt would turn into it would ask me, let's say, three to five clarifying questions, sometimes more. Then it would summarize my answers and ask me to confirm the summary. Here's what you said.
22:21Is that what you just said? Okay. Great.
22:24Then it would say, I'm gonna write a spec. Is that okay with you? Yes.
22:29Just write the spec. Okay. Here's the spec.
22:32Is the spec correct? Please read it. Yes.
22:36Yes. The spec is correct. Keep going.
22:38Go. Okay. Then it would ask me to confirm its agentic approach.
22:44Should I spin off a bunch of agents in parallel, or should I do it sequentially? I don't know. You decide.
22:50Why are you asking me these things? Just go build. And I got so frustrated.
22:54And then finally, after all of those questions and confirmations, finally, it would go build it for me. And when it finally did, it was great.
23:02But that was very frustrating to watch. And then, yeah, finally, it just felt very slow.
23:08I've already talked about this. It was slow to start. I would watch so so in Cloud Code desktop, there's two things that it shows you, really.
23:16It gives you a timer on how long the task is taking, and then it tells you how many tokens it has used. And often what would happen, especially within the first five minutes, it would go to, like, 1,500 tokens.
23:28That's one five zero zero fifteen hundred, and the timer would just tick up. And I'd be like, is it doing anything?
23:35I couldn't figure it out. I don't know what it was actually accomplishing during those first, like, five to eight minutes. And this would happen on every single prompt.
23:44I would just sit there kinda scratching my chin thinking, I is is it going? Like, I I wanted to poke it. And then, like, in a blink of an eye, especially in workflows mode, it would go from, like, 15 100 to 1,500,000 within thirty seconds.
24:00It was it was crazy to watch. One last little thing. I promise I'm gonna get to the testing after this.
24:05I wanna talk about loops. I just made a video on loops. If you don't know what a loop is, it is the thing that Peter Steinberger.
24:10It is the thing that Boris Czerny has been talking about. The next abstraction layer above agentic engineering. It's much simpler once you understand the concept.
24:20But when I think about loops, workflows, and Fable together, it really feels like nobody understands what's coming.
24:29I can't stress that enough. When you have the ability for the model which is already incredibly autonomous, and then you also parallelize it with workflows, and then you wrap it in a loop in which it will just continue to burn tokens trying to build whatever it is you set it to build until it reaches some goal.
24:48I I just can't imagine how powerful that is. The idea of building software factories is very much here, and only a fraction of everybody doing agentic engineering is even scratching the surface of what's possible.
25:02I I think not even Anthropic and OpenAI are fully utilizing what is there. The model overhang, the concept of the models being so good and we don't even know how to get the most out of them is so very real.
25:15And it became much more visceral to me after using Fable. And so I ran a few tests with Fable.
25:22Let me show those to you. First, here's the Rubik's cube simulation, which it absolutely crushed. Builds an interactive playable three d Rubik's cube that runs in the browser.
25:31I should be able to scramble it, turn the faces, and solve it, make reasonable choices on the rest. Don't ask me any questions.
25:40That is the most realistic looking Rubik's cube that I've seen. K. I can grab a side and I can rotate it.
25:50Very cool. Let's do scramble. Yeah.
25:55Absolutely beautiful. So the scramble worked. Now solve.
26:04There it is. That is a stunning success.
26:08I mean, look at the light, the reflection, the shadows. Really nice.
26:13Really, really nice. And next, in maybe one of the most impressive demos I've ever seen, here is a fluid dynamic simulation that ran flawlessly.
26:24Build an interactive real time fluid simulation that runs in the browser. I should be able to push the fluid around with my mouse and watch it react, choose the method and controls, add lots of settings, dials to it. Don't ask me any questions.
26:3863 different agents running in parallel.
26:43Look at that. Look at all these tools. So they ran each for, let's say, two minutes on average by the look of it.
26:50They burned between twenty and thirty thousand tokens each.
26:56Oh, that's so sick. Here, sim resolution extreme. Wow.
27:03Look at that. So cool. My god.
27:08Dye resolution maximum. Yeah, man. This is by far the best fluid dynamic simulation I've ever created using a model.
27:19Wow. Let's say timescale. Very cool.
27:25Velocity fade. Okay. It fades much more quickly.
27:30Let's say boom. Die fade swirl vertex.
27:36Let's turn it up. Yeah. Look at that.
27:43Splat radius. Let's turn that up. Yeah.
27:45There we go. Alright. This is definitely a winner.
27:49Very nicely done. And once again, thank you to Hear dot now for sponsoring this video. I'm gonna drop all of the tests that I ran published to dot now down below in a pinned comment.
28:00You can also check out hear dot now directly in the description. So clearly, Fable is incredible.
28:06But what really unlocks it is using it with workflows and using it with loops. And if you don't know what loops are, I made an entire video right here explaining.
The Hook

The bait, then the rug-pull.

When Anthropic announced a model too dangerous to release publicly, then quietly released it publicly six months later, the gap between those two sentences is the whole story. This is a review from someone who had early access, not a press release walkthrough, but a practitioner account of what it actually feels like when the model treats every prompt like a mission.

Frameworks

Named ideas worth stealing.

08:07model

Model Routing by Task Difficulty

Match model tier to task difficulty: Fable for frontier-hard problems, Sonnet or Haiku for everything routine. Routing discipline will separate cost-efficient teams from those with exploding AI bills.

Steal forAny team or solo builder trying to use frontier AI without blowing their budget
13:37concept

Effort Level Discipline

Always start at the lowest effort or thinking setting and dial up only when results are insufficient. Fable at medium effort over-engineers even simple problems.

Steal forClaude Code or any extended thinking API usage
24:40model

Software Factory Stack

  1. Frontier model (Fable 5)
  2. Parallel multi-agent delegation (Ultracode)
  3. Loop abstraction for autonomous goal pursuit

Three layers combine into a software factory: the frontier model provides raw intelligence, Ultracode parallelizes work across sub-agents, and loops wrap the whole thing in autonomous goal-directed execution.

Steal forAny developer building autonomous coding pipelines or long-horizon agentic tasks
CTA Breakdown

How they asked for the click.

VERBAL ASK
28:00next-video
And if you do not know what loops are, I made an entire video right here explaining.

Clean outro CTA pointing to a companion video on loops, natural given loops was the intellectual high point of the review.

Storyboard

Visual structure at a glance.

open
hookopen00:00
benchmarks table
valuebenchmarks table00:59
blog post
valueblog post03:58
sponsor read
sponsorsponsor read06:40
FrontierCode chart
valueFrontierCode chart14:10
quirks review
valuequirks review20:10
Rubik cube demo
demoRubik cube demo25:27
fluid sim demo
demofluid sim demo27:16
outro and CTA
ctaoutro and CTA27:44
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

44:52
Matthew Berman · Essay

It's starting...

A 45-minute walk through Anthropic's internal data showing AI crossed from coding assistant to primary engineer — and a frank read on what that means for humans.

June 5th
33:44
Matthew Berman · Tutorial

21 INSANE Use Cases For OpenClaw

How one MacBook running Claude Opus 4.6 replaced a CRM, a security firm, a content team, and a personal chef -- with the exact prompts to copy every piece.

February 17th
12:42
Alex Finn · Tutorial

Claude Opus 4.8 actually blew my mind

A 12-minute field report on every change in the new model — benchmarks, pricing, Dynamic Workflows, Ultracode — plus a live one-shot 3D game demo and a concrete recommendations ladder.

May 28th
Chat about this