Why Modern Creator?

Jan Marshal · YouTube

I Spent $200 Testing Claude Fable 5 (I'm Not Sure It's Worth It)

A head-to-head build of the same finance app in Fable 5, Opus 4.8, and GPT 5.5 - same prompts, same workflow, $200 in 24 hours.

Posted

June 11th

1 months ago

Duration

28:44

Format

Review

educational

Views

985

37 likes

Part of the collectionThe Fable 5 PlaybookAll 45 Fable 5 breakdowns, synthesized into one page.

Read the playbook

Big Idea

The argument in one line.

Claude Fable 5 is the best coding model available, but the quality delta over Opus 4.8 is small enough that its 2x+ price and lack of fast mode make it a hard sell for anyone who codes with AI every day.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You already use Claude Code or Cursor daily and are deciding whether to upgrade to Fable 5 API credits.
You are benchmarking frontier models for a real shipping workflow, not synthetic evals.
You want a side-by-side UI comparison of what Fable 5 vs Opus 4.8 actually produce from the same prompt.
You are evaluating GPT 5.5 as an alternative to Anthropic models for front-end code generation.

SKIP IF…

You are on a free or Pro plan and just want to know if Fable 5 is included - the answer is yes until June 22.
You need a deep architectural review of code quality - this is a workflow and UI comparison, not static analysis.
You are new to AI coding tools and have no baseline for comparison.

TL;DR

The full version, fast.

Fable 5 produces cleaner code, tighter PRD questions, and better UI than Opus 4.8 - but the margin is narrower than the price suggests. In a direct build-off of the same Finance Hub app, Fable 5 asked 5 precise questions where Opus asked 21, generated DRY-er code, and delivered better UI out of the box. The catch: it is slow (30 minutes per complex module vs. 7 in Opus fast mode), has no fast mode, and will cost 2-3x more per month for daily use. GPT 5.5 finished last on UI quality by a significant margin. For most builders, Opus 4.8 with fast mode is the better daily driver.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:47

01 · Cold open

Setup the premise: $200 spent, same app built three times with Fable 5, GPT 5.5, and Opus 4.8.

00:47 – 04:20

02 · What is Fable 5 and the Mythos class

Explains Mythos class via Project Glasswing blog. Internal preview model had too-strict safeguards; Fable 5 is the public release with additional guardrails.

04:20 – 07:00

03 · Pricing and the June 22 deadline

Free on Pro/Max/Team through June 22. After that: $10/M input, $50/M output, usage credits required. Creator spent $200 in 24 hours.

07:00 – 07:48

04 · CursorBench: Fable 5 vs the field

Chart walkthrough. Fable 5 is number 1 at high/max reasoning effort. Medium and low effort: use Opus 4.8 instead.

07:48 – 07:56

05 · Sponsor: TestSprite

AI testing agent sponsor segment. Not core content.

07:56 – 12:55

06 · Live app comparison: Finance Hub UI

Side-by-side demo of all three Finance Hub builds. Fable 5 rated best. Opus 4.8 close behind. GPT 5.5 called ugly and unshippable.

12:55 – 16:40

07 · PRD workflow comparison

Fable 5: 5 precise questions. Opus 4.8: 21 questions, some redundant. GPT 5.5: 0 questions without explicit prompting.

16:40 – 20:00

08 · Speed and the fast mode gap

Fable 5 has no fast mode. Complex Fable 5 back-end module: 30 minutes. Same module in Opus fast mode: 7 minutes.

20:00 – 23:20

09 · Code quality deep dive

Fable 5 produces cleaner DRY-er code with typed props. Opus adds redundant console.log useEffects. GPT 5.5 scatters types into 600-line files.

23:20 – 25:50

10 · Safety filter encounter

Security audit request hit a safety filter mid-session and auto-switched to Opus 4.8. Guardrails currently too aggressive; expected to loosen.

25:50 – 28:44

11 · Final verdict and sign-off

Fable 5 is the best model. Not groundbreakingly better. Expensive, slow, temporary on subsidized plans. Creator returning to Opus 4.8 as daily driver.

Atomic Insights

Lines worth screenshotting.

Fable 5 asked 5 precise PRD questions; Opus 4.8 asked 21 - including many that were unnecessary.
A single Fable 5 back-end module took 30 minutes to generate; Opus 4.8 with fast mode did the same in 7.
Fable 5 has no fast mode - you get one speed: slow, expensive, and thorough.
GPT 5.5 generated a 600-line file with types scattered throughout instead of shared modules - a maintainability problem at scale.
The creator spent $200 in Fable 5 API credits in 24 hours; at that rate, daily use costs $2,000-3,000 per month.
Fable 5 is free on Pro/Max/Team plans through June 22 - after that it requires usage credits at $10 per million input tokens.
Opus 4.8 fast mode costs 2x but runs 4x faster - the cost/time ratio beats Fable 5 for most workflows.
The Fable 5 safety filter blocked a security audit mid-session and auto-switched to Opus 4.8 - the guardrails are currently too aggressive.
Fable 5 Finance Hub used Shadcn UI charts; Opus used Recharts - both shipped production-ready UI, just different stacks.
If you show both UIs to a non-technical user, they would say they look similar - the delta is real but not groundbreaking.
Fable 5 is a Mythos-class model - the same underlying capability as the invitation-only Claude Mythos Preview, with safety guardrails added for public release.
GPT 5.5 did not ask any questions during PRD creation until the creator explicitly invoked a grill-the-doc skill - the PRD was weak without it.

Takeaway

When the better model is not the right daily driver

MODEL SELECTION

A head-to-head build proves Fable 5 is the best coding model available - and also reveals exactly why most people should not use it every day.

The quality difference between Fable 5 and Opus 4.8 is real but incremental - a non-technical person looking at both UIs would not call it groundbreaking.
How many precise questions a model asks before starting a PRD is a fast, cheap proxy for its reasoning depth on your specific workflow.
Fable 5 has no fast mode, making it 3-4x slower than Opus 4.8 for complex generation tasks - speed is a real cost at daily workflow scale.
At medium or low reasoning effort, Fable 5 does not beat Opus 4.8 on benchmarks - high effort is the only tier where the premium is justified.
Safety guardrails on new frontier models start aggressive and loosen over time - a model that blocks your security audit today may handle it fine in a month.
GPT 5.5 fails on code architecture: it scatters type definitions into large individual files rather than shared modules, creating a maintainability problem invisible in demos.
The cheapest way to evaluate a new model is to build something real - synthetic benchmarks do not catch workflow-level friction like PRD quality or slow generation.
The difference between Fable 5 and Opus 4.8 as a daily driver compounds to roughly $1,500-2,000 per month - a number that changes the calculus for most independent builders.

Glossary

Terms worth knowing.

Mythos class model: A category designation Anthropic uses for its most capable frontier models. Fable 5 is the first public Mythos-class release, following the invitation-only Mythos Preview (Project Glasswing).
Fast mode: A Claude Code setting that doubles cost but significantly increases generation speed. Available on Opus-tier models, not Fable 5.
CursorBench: A community benchmark measuring AI model performance on real coding tasks in Cursor IDE, scored across different reasoning effort levels.
PRD (Product Requirements Document): A structured document defining what a software product should do before any code is written. In AI coding workflows, the model often generates and critiques its own PRD.
Grill the doc: A workflow skill where the AI model interrogates a PRD by asking clarifying questions, surfacing ambiguities before implementation begins.
DRY principle: Don't Repeat Yourself - a software design principle that each piece of knowledge should have a single, unambiguous representation. GPT 5.5 violated this by duplicating types and functions.
Shadcn UI: A component library for React providing pre-built, customizable UI components. Used by the Fable 5 build in this comparison.
Recharts: A composable charting library built on React. Used by the Opus 4.8 build instead of Shadcn UI charts.

Resources

Things they pointed at.

00:52linkProject Glasswing (Anthropic blog)

07:00toolCursorBench

07:48productTestSprite

Quotables

Lines you could clip.

01:29

“If you want to use this model every single day, then be ready to spend about 2 to $3,000 per month.”

Concrete number that reframes the pricing debate instantly→ TikTok hook↗ Tweet quote

24:45

“The back end dashboard creation literally took thirty minutes to create. On the other hand, Opus did it in about seven minutes.”

Specific time numbers make an abstract speed complaint real→ IG reel cold open↗ Tweet quote

15:25

“Both will get you from a to z. One will do it a bit better. But is it worth it? I don't think so.”

Punchy verdict that stands alone with no context→ TikTok hook↗ Tweet quote

25:45

“Fable is not a fast model. It's slow. It reasons a lot. It gets you good results, but it's not really a joy to use.”

Honest nuanced critique from someone who actually shipped with it→ newsletter pull-quote↗ Tweet quote

12:10

“Oh my god. What is this? Oh, this looks so ugly. It's hideous. Just look at it.”

Visceral reaction to GPT 5.5 UI - genuinely funny and shareable→ TikTok hook↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogystory

Anthropic just released Fable five, their new Mythos class model. And apparently, it might be the smartest coding model available right now.

So naturally, I made a very financially responsible decision.

I spent $200 in API credits to build the same finance tracker three times.

Once with Fable five, once with GPT 5.5, and finally, once with Opus 4.8.

Same app idea, same workflow, same prompts, same development process.

And after testing all three, I can say this. Fable five is probably the best coding model I've ever tested. Period.

But I'm not sure if I will actually keep using it going forward. But okay.

What even is Fable five and why is it interesting? Well, as you already know, Fable five is a normal model, an LM. But what makes different to OPUS 4.8 and GPT 5.5?

Well, essentially, Fable five is a Mythos class model. But what is a Mythos class model? Well, here's the thing.

Anthropic released this blog article about two months ago. Project Glasswing.

Securing critical software for the AI era. Blah blah blah.

Whatever that means. And inside of here, they said, hey. Today we are releasing a new model called Claude Miffus Preview.

It's the best model we have ever created. And inside of here, Anthropic said, hey. The model is so good that we can't release it to the public.

It will just end up with a huge mess and people will do the wrong things. So they released this model just to the biggest companies out on the market, enterprises. But hey, surprise surprise.

Two months later, we now get this model, Fable five, which is a Mythos class model. So now you might ask me, okay, Jan. And what has now changed to two months ago?

Why is now Fable five a public release? Well, essentially, they now added a lot of safeguards. If you want to just test your application, the coding agent, the model might say, hey.

I'm not sure if this is safe. I will now stop the session, and I can't continue any further. The safeguards right now are not good at all.

They are way too strict, and they don't allow you to just use or work on your application normally. So here, they have a full on section on the new safeguards. Again, currently, they are not very good.

They are not very lenient. Hopefully, this will change in the future, but for now, it is what it is.

But now you might say, okay. Change in the future. This means the model will be available forever.

Right? No. No.

No. Not so fast, my friend. Now this is something that I find super funny because this model won't be available forever.

So from today through the June 22, Fable five is included on pro, max, team, and seed based enterprise plans at no extra cost. But on the June 23, they will remove Fable five from these plans, And using it after that will require usage credits.

So if you are a Claude code user and if you are subscribed to any of these plans, then, yes, you get access right now. But in two weeks, it's time to say goodbye, best model.

Bye bye. Because after that, you will only be able to use the model based on API usage. This means it will require usage credits, and we both know that this is quite expensive because this also means that you won't get any subsidized usage.

Oh, man. This is annoying. So now you might say, okay, Jen.

So if I don't get any subsidized usage in two weeks anymore, is the model at least cheap? No.

Not really. So Fable five is being offered at $10 per million input tokens and $50 per million output tokens, less than half the price of Claude Miffe's preview.

So is this cheap? No. Is it cheaper than the preview model that we never got access to?

Yes. Is it also cheaper than OPUS 4.1, I think it was?

Yes. Because when OPUS 4.1 got released, it was super expensive. So this model, in general, is not cheap.

It's not super expensive. It will cost you quite a bit if you use it based on API usage, but it's not a huge deal breaker.

Nevertheless, in my testing alone, in twenty four hours, I spent over $200 in API usage and API credits. So that's important to remember.

If you want to use this model every single day, then be ready to spend about 2 to $3,000 per month if you use it every single day. And for most people, 99 percent of people, this is a deal breaker, including me.

I won't pay that at all. So as a next step, you might say, okay, Chen. Mhmm.

Mhmm. This makes all sense, but how does the model perform? Well, let's look at CursorBench.

As you see here, I added a few models. Fable five, Opus 4.8, GPT 5.5, and Composer 2.5.

Now I think we both see right away that Fable five is the best model right here. Now please look at the y axis. The higher the score, the better.

Then look at the x axis. The more it goes to the left, the more expensive everything is. And inside of here, you will see, yes, Fable five is the best, especially if you use the highest reasoning effort max.

But it's also the most expensive. Surprise. Surprise.

Now one thing I have realized after testing Fable five and also looking at multiple benchmarks and tests is that Fable five only makes sense if you at least use high reasoning effort. So as you see here, if you use medium or low reasoning effort, I wouldn't recommend using Fable five. It makes more sense to use Opus 4.8 with either high or extra high reasoning.

The results are super similar, but it's way cheaper. So this is one recommendation that I have and something that I have also learned while testing the model quite thoroughly.

And here you will also see that GPT 5.5 is just no comparison. It's way worse.

Composer 2.5 is also a way worse model. And if we now add, I don't know, where is Gemini, the model I hate the most, then you will see here that Gemini well, let's forget it.

It's not a comparison. It isn't able to compete with Fable five. Fable five is definitely the best model out on the market, and Fropic did not lie with this blog article where they said, hey.

This is the best model we have ever created. It is the best model, but don't forget, it's the most expensive model. And I wouldn't say that there is a huge jump between OPUS 4.8 and Fable five.

It's better, but it's not crazy better. Now before we continue, there is one thing we have to talk about first. Testing.

Because with models like Fable five, code generation is not really the bottleneck anymore. The real question is, can we catch the problems before they hit production? And that's where today's sponsor TestSprite comes in.

TestSprite is an AI testing agent for engineering teams.

You can connect it to your app through the MCP workflow or the web portal. Give it a PRD, a spec, or just the area you want tested. And it builds a focused test plan for you.

But the important part is this. Test Sprite does not just read your code and guess. No.

No. No. It actually opens the app, clicks through the real user flow, tests the front end, checks back end behavior, and runs everything in parallel.

In the new test sprite three point o dashboard, you can watch those agents explore your app life, see the tests running, and when something fails, you get the error, trace cause, and suggested fix in one place. You can even replay the agent session and watch exactly where the bug happened.

So when AI generated code breaks something, you're not just staring at a vague failed check, but you can actually see what went wrong. So at this point, the question isn't anymore can the model write code? It can.

The real question is can we trust the shipped code? And in this case, TestSprite is exactly what helps us close the loop. And the best part, TestSprite has a super generous free tier, so you can already get started today.

Check it out using the link below. But now, let's continue with the video. Now you might say, okay, gents.

So we now looked at the theoretical side of things, and we can definitely say that Fable five is a good model. Again, looking at this chart, we can see that Fable five outperforms every single model out on the market with every reasoning effort besides low.

And looking again at the cost, yes, max reasoning effort is expensive. I wouldn't recommend using it, but FableFiFi is a very nice middle ground, which is not super expensive and kinda manageable.

But here's the thing. Benchmarks don't always portray reality. Some models might perform very well in benchmarks.

Gemini, I'm looking at you. But they might perform quite bad in real life with your specific workflow, with your specific ideas.

And that's exactly why I took the time and tested the three best models out on the market. Fable five, OPUS 4.8, and GPT 5.5.

I built the same application three times Finance Hub, and, essentially, it's the one stop solution to all of your finances.

Now let's first of all analyze the very first example, the very first application, and I won't tell you what model it is. And let's for now focus on the UI.

I will do a hard refresh. We have a beautiful animation, this nice little mock up, then we get a few cards.

This all looks good. It's clean, and it does not look tacky or something like that. I can now also sign in, and I will get redirected to the dashboard.

We get a suspense fallback. I get a few cards. This chart, which of course works, then this chart.

Then we have recent activities. This looks fine. The layout is not great, but it's manageable.

Right? Let's go to the transaction. Let's get a general feeling of the UI.

This looks good. Let's look at the spending. Oh, nice.

We have a beautiful chart. Let's go to the assets. What do we get here?

This all is fine. Right? It works.

It looks good. Everything is fast. We have a very uniform design.

We have one specific design language. This does not look bad. This is something that you can ship into production, production and users won't complain, which is important.

Let's now look at the second example. So this again is Finance Hub. If I do a hard refresh, we don't really get a beautiful animation, which isn't good, but hey, it is what it is.

We get again a very similar mock up, but which is actually interactive, which is something that you don't see every day. We again have a few animations, looks good, and it's very similar to the first example.

Let's check out the dashboard. What do we have here? Now this already looks way better, at least in my opinion.

We, of course, have a theme toggle, and this here has way more info. Net worth cash flow. We can have a chart, then right here, these charts.

And what I see instantly is that this here uses shared CNUI charts. If I go back to the first example, this does not use shared CNUI.

This uses read charts. So this is something important to remember. Let's go to the transactions.

Right here, we get two issues, but that's fine. Let's go to the spending. Uh-huh.

Interesting. Okay. Let's go to the subscriptions.

So this again looks very similar to the first example. Though I would say that this here is a bit more refined. Again, if I do a hard refresh, look at this.

It looks beautiful. And I would also say that this application is a bit faster. So there is probably better code.

And now let's go ahead and check out the third application. 321. Oh my god.

What is this? Oh, this looks so ugly. It's hideous.

Just look at it. The theme is ugly. Everything is ugly.

We don't even have to really dive into it. Let's quickly open the dashboard just to look at this hideous dashboard. Wow.

I mean, this is something that you should never ship into production. This will make every user unsubscribe instantly minus m r.

No. I'm joking. But I think we both know what model created this application.

It was GPT 5.5 with high reasoning effort. I even used my signature workflow.

I used all of the skills. I used the same workflow as with the other two applications, and still the UI is hideous.

It's not comparable, and this is just something that does not look good. It's as simple as that.

And again, this gives GPT 5.5, a leading model, and still we get such a bad result. Let's again go back to the previous example.

So this is the second application I showed you guys, And this is probably my favorite one because it looks very refined and there are a lot of small details which are very important. Like, if we go back to the net worth page, we have the net worth history. But I can also instantly view the assets versus debts.

These are the small details which change a lot, and that's very important. So what model created this application here? Well, surprise surprise, it was Fable five with high reasoning effort.

This here is a very good result, and this is something that you can definitely ship into production. And with that, the third application has been created by Opus four point eight with high reasoning effort. And let's again look at it.

This here looks good. Right? This is something that you can ship into production.

And the most important thing is that this here looks very similar to Fable five. And that's something that I have realized while testing the two models. The results are always super similar.

The Fable results are better. That's important to remember. But the foundation is always the same.

The responses by the models are very similar. Yes. Fable outperforms the model in every shape and form in terms of code, in terms of UI, in terms of reasoning.

But it's always very similar. And that's something that I like because Opus models in general work very well for my specific workflow. And Fable five, in this case, also works very well with my signature workflow.

It's very similar, and I get similar results, which are even better than what I'm used to. But this still does not answer the question, is Fable five worth it? Especially considering the price.

And I'm not sure because it gives you better results. But is it groundbreaking? Does it give you a huge leap?

Not really. If you would now show these two web sites to one person who does not know what models are, what LMs are, and does not really know what UI and UX is, would the person say that this here is a huge leap and that this is groundbreakingly better than this?

Probably not. This person would say, yeah. Both look kinda similar.

This here looks a bit better. It has maybe better small details, but it does not really matter. And I can give you another example.

Let's look at these two phones. This year's OPUS 4.8. This is Fable five.

Like, this year's a better phone. It has a better camera, a better processor, a better everything.

But at the same time, they are kinda the same thing, the same foundation, the same brand, the same operating system. This year will give you a better result, but it isn't groundbreakingly better than this result.

And that's what I want you to kinda remember. Both will get you from a to z. One will do it a bit better.

But is it worth it? I don't think so. And that's my honest opinion.

Some people will disagree with me. That's totally fine. But I don't see the need for Fable five, especially for the price.

Again, let's look at Cursorbench. The cost for Fable five high is $10.80. The cost for Opus 4.8 high, let's see where is it, is $4.40.

So Fable five costs more than double. Is it worth it? I don't think so.

But now you might say, Jan, wait wait wait. The UI isn't everything. What about the code quality?

How does the model feel? Is it quick? Is it fast to respond?

Well, let's check it out. As mentioned, for every application, I used the same workflow.

I let the agent create a PRD. I grilled the PRD. I then created the back end implementation plan.

I let the agent create the authentication pages. I used this skill very heavily when letting the agent implement front end UI, so I always used the same workflow. And what I've realized is that the results between Opus 4.8 and Fable five are almost the same.

So this here is my OPUS 4.8 session where I asked the l m, the agent, to create a PRD. And inside of here, I then let the agent also grill me.

Now please look at the questions. The word account is overloaded. The asset is also overloaded.

Then what else do subscriptions double count against expenses? What else do we have here? When does the subscription hit a specific month's expenses?

Then also please look at the general structure. We get these bullet points, then my recommended answer, blah blah blah. This here looks fine, I guess.

Right? Now let's go to the Fable example. Inside of here, I first of all ask to the agent to create a PRD.

It did so nice. As a next step, I again asked it to grill me. And look at the questions.

Checkings account, one concept or two. We again have the same format, and again my recommendation. Then inside of here, do transactions move asset values?

We again have the same layout. And the thing is also, if you compare the questions, they are very similar.

I got the same questions asked by Opus 4.8 with different wording. The one thing that I've realized is that the fable is a little bit more, you could say, refined.

It's a bit smarter. Inside of here, it only asked me five questions, which were super important.

But if I go back to the Opus example, it asked me 21 questions, and a lot of them were not needed.

So Opus does not think as well as Fable, and we saw that from the benchmarks. But is that again a huge game changer? No.

Not really. It's fine. I can answer 21 questions if the result is similar.

And it is. Yes. Fable five in this case was a bit faster because I was able to just get over this whole PRT quicker.

I did not have to answer as many questions, but it's not a huge leap. Another thing is the model speed in general.

When using Opus 4.8, I always use fast mode as you see here. Now fast mode is more expensive, two times more expensive, but you get significantly faster speeds.

Now if I go back to the Fable example, the thing is you don't get access to fast mode. Like, you only have normal mode and that's it.

And the thing is Fable five, with a 1,000,000 token context window and high reasoning effort, is slow as hell. It takes forever.

The back end dashboard creation literally took thirty minutes to create. On the other hand, Opus did it in about seven minutes.

Again, I used fast mode and all of that, but still, I feel like Fable is not a fast model. It's slow. It reasons a lot.

It gets you good results, but it's not really a joy to use. I like fast models, and I get fast mode with Opus 4.8, which I use every single day.

So if you want a snappy model, then Fable five is not for you. Let's also quickly look at GPT 5.5. Inside of here, I also asked the agent to create a PRD and then to grill me.

And here's the thing, I wasn't a fan of the experience. First of all, GPT 5.5 did not ask me any questions when creating the PRD.

Only once I invoked the grill with doc skill, it started asking me valid and needed questions. And this also means initially the PRD was very lackluster and very bad.

Only after invoking the squirrel with Docskill, I got a somewhat good result. Still, I did not like how the model felt. It did not work very well with my specific workflow.

It just was kinda I don't know. It just wasn't a joy to use. Another thing you will realize is that GPT 5.5 does not really explain things very well.

Like, here we have question and then the recommended answer. That's it. But if I go back to the fable example, it first of all explains the question, then it gives me options, the recommended option.

This here is way more detailed. This here gives me options. It explains things, and it works better with my workflow.

As an engineer, I want to know what options I have and what option is maybe recommended. With GBT 5.5, I just get question and a recommended answer.

It does not give me any room to breathe, which I don't like. GPT 5.5, in general, is just a model that I don't love.

It gives okay ish results in the back end. The front end is trash. Let's forget it.

But it's not a good all rounder. If you want an all rounder, then don't use GPT 5.5. You won't get the results that you need.

So my takeaway from this first session was that I instantly felt a difference. Not because Fable did something magical. No.

But because it knew when to stop, when to not ask any further questions. Because if I go back to the GPT 5.5 session, I had to do something funny.

Let me scroll to the bottom. After the twenty sixth question, I had to say, hey.

This is enough questions. Stop. I can't anymore.

And that's something that you will just realize when using Fable five. It is a smart model, and it can't just do things or it can reason about things better, and it feels way more like a senior engineer that is confident in its ability, which I appreciate a lot.

Another thing we have to talk about is the general code quality. And surprise surprise, Fable five in general creates better code than Opus 4.8 and GPT 5.5. Now is this perfect code?

No. I don't want you now do a full on code review, but the code in general is clean. You can use it.

Nevertheless, I would still recommend you to review all of the code because the agent, the l m in this case, still likes to duplicate things, and it does not always follow the dry principle. Don't repeat yourself. But it's definitely better than Opus 4.8 and GPT 5.5.

One reason for that is probably, again, the high reasoning and the general smartness, if that makes sense. The model understands things way better on a way deeper level, and therefore, the code also becomes better.

In this example, we always have metadata set. We have props. The props are always typed.

So this is good. This is good code that you can ship. I would still recommend you to review it because nothing is safe, but it's better than what the competitors do.

Opus 4.8 also generates good code. Is it as good as the code created by Fable? Definitely not.

There are duplicate functions. The agent loves to repeat itself, create weird use effects that are not needed. Like, for example, look at this use effect console log error.

I don't need that. This here is not needed, but the agent created the code. Opus 4.8 is not as smart.

You feel it right away, but there is not a huge margin. The code in most cases is good enough, but it likes to add weird use effects, weird comments, weird console logs, weird just duplicate functions that you have to clean up yourself.

This is something that you have to know. And GPT 5.5, well, this is my least favorite model.

I don't hate OpenAI, by the way. Great company.

But GPT 5.5 performs the worst out of the bunch. As an example, this here is a random file, financial something, and we have this type, account type. Why do we have it inside of here?

I am 100% sure that this type has been defined somewhere else. This should live in some sort of shared file.

It doesn't. Why do we have so many types inside of here? This is not needed.

Again, constant account type labels, constant account type icons, constant blah blah blah, function blah blah blah. We don't need all of these functions inside of this one file.

This file is huge, like 600 lines. These types shouldn't live in this file. And that's something that you will just realize when using GPT 5.5.

It does not provide or it does not generate the cleanest code. You have to be very on top of the agent to get good code. The code works.

That's fine. But it isn't clean. And if you don't do a thorough review, then good luck with your project in a couple of months because it won't be maintainable.

You will have functions like, again, look at this investment asset types. Why does it live inside of here?

Why does this not live in some sort of shared file? Why do we hard code this right here? This is not needed.

And that's what I don't like about GPT 5.5. And that's why I pretty much stopped using it. Another thing I want to talk about quickly are the safeguards and restrictions.

Right now, Fable five is kinda annoying to use. If you want to do a security review, then good luck.

In this case, this user wanted to run a security vulnerability bug performance and privacy audit. And as you see here, Fable five hit a safety filter, and the conversation was automatically switched to Claude Opus 4.8.

Now this is something that will probably change in the future, and probably the model will become a bit more lenient. But right now, the filters are, as the user says, crazy. It's not nice.

It's not enjoyable to use. And you have to always think like, hey, how should I, like, maybe instruct the agent so that it does not get the wrong idea of what I want? I don't need any filters.

Blah blah blah. It's not good. But I think it will be fixed in the future, so this is something to look forward to.

But, yeah, that's all I did. These are my thoughts on Fable five. It's a very good model, and it's the best model out on the market.

It writes better code. It asks better questions. It has fewer bugs.

The UI looks better. It works right away. This year was a one shot, and this is quite cool to see.

You see how powerful LMs can be. Nevertheless, it's slow, it is expensive, and it's kinda temporary.

Because if you won't pay for the API usage, then please say bye bye to the model. I personally will probably move back to Opus 4.8 because it's a great daily driver.

I can get close enough to these results, and it's way cheaper. And that's kinda important because I already pay a lot for Opus 4.8, and Fable would just make the cost double, which would lead then to thousands of dollars per month, which I can't spend on an LM.

But, yeah, what do you think? Please let me know in the comments. I would love to know what your thoughts are.

What you think of Fable five? How it compares to all of the models that you used previously? What you think about the pricing?

The security restrictions? The general two weeks timeline, which I find annoying and weird. But, hey, it is what it is.

Let me know. I read every comment. Also, please don't forget to like and subscribe.

It would mean a lot to me and my heart. So please do it. And with that out of the way, enjoy your day and see you in the next video.

Over and out. Bye bye.

The Hook

The bait, then the rug-pull.

Two hundred dollars in API credits. Twenty-four hours. One app built three times with three different models. That is what it took to get an honest answer about whether Fable 5 - Anthropic's new Mythos-class model - actually earns its price tag over the Opus 4.8 most builders already rely on.

Frameworks

Named ideas worth stealing.

00:35model

Same-app three-model build-off

Same app (Finance Hub)
Same workflow: PRD to grill to backend to auth to frontend
Same prompts
Three models: Fable 5, Opus 4.8, GPT 5.5

Build the identical product end-to-end with each model to control for workflow variables and isolate model quality differences.

Steal forAny model evaluation - removes confirmation bias by forcing you to ship the same thing three times

13:05concept

PRD question count as a reasoning proxy

How many questions a model asks before writing a PRD reveals its reasoning depth. Fable 5: 5 precise. Opus 4.8: 21. GPT 5.5: 0 without prompting.

Steal forUse question count and precision as a cheap proxy benchmark for model quality before committing to API costs

07:00model

Reasoning effort sweet spot matrix

Fable 5 max: best results, most expensive
Fable 5 high: best value within Fable ($10.80 per task)
Opus 4.8 high: 80% of Fable quality at $4.40 per task
Fable 5 medium/low: not recommended over Opus

At medium or low reasoning effort, Fable 5 does not justify its premium over Opus 4.8. Only at high effort does the quality gap become defensible.

Steal forModel routing rules: Opus for routine daily work, Fable 5 only for highest-stakes single sessions

CTA Breakdown

How they asked for the click.

VERBAL ASK

27:40subscribe

“please don't forget to like and subscribe. It would mean a lot to me and my heart.”

Standard subscribe CTA at the very end. No product pitch, no newsletter. Sponsor (TestSprite) handled mid-video.

FROM THE DESCRIPTION

AFFILIATECommission earned if you click.

🔥 Try TestSprite for FREE ↗

OTHER LINKSAlso linked in the description.

Storyboard

Visual structure at a glance.

open

hookopen00:00

Project Glasswing reveal

contextProject Glasswing reveal00:52

pricing window

valuepricing window04:20

CursorBench chart

evidenceCursorBench chart07:00

live app demos

demolive app demos07:56

PRD comparison

valuePRD comparison12:55

safety filter hit

caveatsafety filter hit23:20

verdict

ctaverdict25:50

Frame Gallery

Visual moments.

open

Frame at 00:35 from I Spent $200 Testing Claude Fable 5 (I'm Not Sure It's Worth It)

Project Glasswing reveal

Frame at 01:14 from I Spent $200 Testing Claude Fable 5 (I'm Not Sure It's Worth It)

Frame at 01:36 from I Spent $200 Testing Claude Fable 5 (I'm Not Sure It's Worth It)

Frame at 01:56 from I Spent $200 Testing Claude Fable 5 (I'm Not Sure It's Worth It)

Chat about this