Cut LLM cost by 95%, replace ElevenLabs, and 10 top GitHub repos
Andrew Warner and Peter Cooper rank the week's top 10 AI GitHub repos and debunk most of the headlines.
June 5thZapier's Automation Bench ran Claude Fable 5.0 against hundreds of realistic business workflows — here's what the numbers actually mean.
Fable's lead in operations isn't about raw capability — it's that the model treats a failed API call as information and immediately routes around the problem, while every other model keeps hammering the same dead end.
Zapier's Automation Bench tested Claude Fable 5.0 across five business domains and found it scores 27% on operations tasks — a 7-point jump over the previous best, Gemini 3.5 Flash at 20%. The three behaviors that drive the gap: Fable tries a failing endpoint once and immediately pivots to alternate data sources (GPT-55 tried the same endpoint 22 times; Gemini tried 5); it qualifies and routes sales leads more precisely than competing models; and it filters relevant signals from noisy, off-topic Slack channels without pulling in unrelated threads. The video closes with the confirmation that Fable is the first generally available model in Anthropic's Mythos architecture line.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
Host Wade frames the stakes: not 'is Fable good?' but 'can it actually get work done with multiple tools?'

Zapier's exec introduces the benchmark: 15 years of real workflow data, five business domains, tested on realistic multi-step automation tasks. Lead routing used as a sample workflow.

Fable scores 27% on operations-specific tasks — 7 points above the previous best (Gemini 3.5 Flash at 20%). No other domain touches 20%. HR is next at exactly 20%.

The 'experiment, recover' pattern explained via an HR benefits harmonization example. Fable pivots after one 404; GPT-55 retried 22 times; Gemini retried 5 times.

Fable can extract only the relevant budget alerts from a noisy Slack instance without pulling unrelated threads. Prior models wander and include off-topic context.

Overall: Fable is more precise and more resourceful. Host raises the valid counterpoint: for deterministic tasks, ask whether an LLM is the right tool at all. Closes with Mythos confirmation.
The gap between models on real automation tasks comes down to one behavior: what happens when something breaks mid-task.
“The definition of insanity is trying the same thing over and over again, expecting a different result.”
“GPT five five would hit the site, get four zero four, then do it again, do it again, do it again. A total of 22 times would get the error, and just keep trying.”
“I still think there's an open ended question for this type of use case. Do you wanna use a model, or do you wanna use code or deterministic workflows?”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
The question no one was answering about Claude Fable 5.0 wasn't whether it benchmarked well on evals — it was whether it could actually execute a real business workflow start to finish. Zapier's founder brought the data: 600+ test runs, five operational domains, and three specific failure modes that separate the models that finish from the ones that loop.
Zapier's framing for agentic error handling — the loop that separates models that finish tasks from models that stall. Named after the behavior Fable executes better than any tested alternative.
The five operational domains Zapier uses to categorize and score AI model performance on real business automation tasks.
00:00
00:08
00:13
00:19
00:25
00:32
00:35
00:41
00:49
00:51
00:58
01:03
01:09
01:14
01:21
01:25
01:33
01:38
01:42
01:47
01:53
01:58
02:04
02:09
02:15
02:22
02:28
02:31
02:36
02:42
02:48
02:54
02:59
03:07
03:08
03:16
03:21
03:27
03:32
03:38
03:43
03:49
03:56
04:00
04:05
04:11
04:16
04:21
04:27
04:33
04:40
04:44
04:50
04:55
05:01
05:04
05:10
05:17
05:23
05:28
05:33
05:39
05:45
05:50
05:56
05:59
06:07
06:12
06:18
06:23
06:29
06:33
06:40
06:46
06:51
06:57
07:02
07:08
07:13
07:19Andrew Warner and Peter Cooper rank the week's top 10 AI GitHub repos and debunk most of the headlines.
June 5thA 10-minute screen-recording breakdown of Claude Fable 5 -- benchmarks, a live flight simulator demo, the sandbox escape security story, and a clear framework for when to skip the upgrade.
June 9thA screen-share walkthrough of Anthropic's dual model drop: Fable 5 for everyone, Mythos 5 for Glasswing partners only -- and why the host saw it coming.
June 9thA 27-minute briefing on Anthropic's unreleased frontier model and the five-step preparation playbook for using it before your competitors do.
June 7thA 16-minute screen-share tour of how to build a four-department AI operating system inside Claude Cowork Projects — no IDE required.
March 22ndA complete zero-to-hero tutorial on Claude Desktop's agentic mode: five real use cases, three core primitives, and honest caveats about where it falls short.
April 9th