Modern Creator
Matthew Berman · YouTube

It's starting...

A 45-minute walk through Anthropic's internal data showing AI crossed from coding assistant to primary engineer — and a frank read on what that means for humans.

Posted
today
Duration
Format
Essay
educational
Views
17.7K
1K likes
Big Idea

The argument in one line.

Anthropic's own production data confirms that AI has crossed from coding assistant to primary engineer, and the last human advantage — research taste and the ability to have genuinely novel ideas — is the only thing left standing between compounding automation and full recursive self-improvement.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You write code professionally and want a grounded read on where AI assistance is actually headed, backed by internal Anthropic data rather than speculation.
  • You're building AI products and want to understand the productivity math behind 8x code output with only 4x perceived value gains.
  • You follow the AI safety and alignment conversation and want to stress-test Anthropic's 'slow down' argument with a skeptical lens.
  • You're curious about what 'research taste' means as an economic asset when execution is becoming fully automated.
SKIP IF…
  • You want a high-level intro to AI — this assumes familiarity with Claude, Claude Code, AGI framing, and the current coding-agent landscape.
  • You're looking for hands-on tutorials or workflow demonstrations — this is purely analytical commentary on a research paper.
TL;DR

The full version, fast.

Anthropic published a paper on recursive self-improvement that doubles as an internal progress report: Claude writes 80%+ of their merged code as of May 2026, up from single digits a year earlier; task horizon doubling time has accelerated from seven months to four; and Claude-written code has gone from clearly worse than human to roughly at parity, with better-than-human expected this year. The remaining human edge is research taste — knowing which problems to pursue, which results to trust — not execution. The host's editorial through-line is that Anthropic calling for a global AI slowdown while sitting in first place and using an unreleased internal model (Mythos) to accelerate their own development is structurally self-serving, no matter how accurate their safety framing may be.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:00

01 · Cold open

Hook: AI is literally building itself, Anthropic says slow down, host calls it self-serving.

01:0003:00

02 · Paper framing

Paper intro — RSI is not inevitable, but trending there. One missing ingredient: novel ideas.

03:0005:00

03 · Abstraction progression

Diagram: human writes code → chatbots → coding agents → autonomous agents → closing the loop (no human).

05:0008:00

04 · Sponsor + task horizons

DigitalOcean ad. Then: task horizon doubling time from 7 months to 4. Opus 3 (4 min) → Sonnet 3.7 (90 min) → Opus 4.6 (12 hr).

08:0011:00

05 · CoreBench + research gap

AI reproducing novel research: 20% (2024) to near 100% (15 months later). But origination — novel ideas — still the missing ingredient.

11:0015:00

06 · Engineering vs. research

Two tracks: engineering (code, infra, model training) where AI dominates; research (deciding what to do, interpreting results) where humans still lead.

15:0019:00

07 · 80% of code is Claude-written

As of May 2026, 80%+ of merged Anthropic code is Claude-authored. Before Claude Code launched (Feb 2025): low single digits.

19:0023:00

08 · Lines of code per engineer

The chart: Q3/Q4 2025 explosion. 8x code output per engineer in Q2 2026. Anthropic caveats this is imperfect — measures quantity not quality.

23:0027:00

09 · Mythos and the banned competitor

Anthropic cut x.AI's access to their models, then used their own unreleased Mythos internally. Host's read: winning the race quietly while saying 'we should slow down'.

27:0031:00

10 · Productivity paradox

8x code, 4x perceived productivity. Median Anthropic employee: 4x more output with Mythos Preview. The gap implies Claude code is half as valuable per line.

31:0034:00

11 · Understanding vs. thinking

'You can outsource your thinking but not your understanding.' Claude judges Claude-written code. Humans increasingly disconnected from what they're building.

34:0037:00

12 · Human role narrowing

Once code quality reaches parity, humans stop writing and only review. But review rate can't match generation rate — human review becomes the next bottleneck.

37:0040:00

13 · Ideas vs. execution

Edison: 1% inspiration, 99% perspiration. Perspiration is now automated. Does that make inspiration more or less valuable? Research taste is still human territory.

40:0043:00

14 · Three futures

Future 1: trend stalls (unlikely). Future 2: compounding automation, humans set direction. Future 3: full RSI — compute is the only bottleneck, capital wins forever.

43:0044:52

15 · Anthropic's slow-down argument

Critique: calling for a global slowdown while leading the race, using an unreleased internal model, and having banned competitors is structurally self-serving — even if the safety reasoning is sound.

Atomic Insights

Lines worth screenshotting.

  • AI task horizon at Anthropic is doubling every four months, down from seven months — the acceleration is itself accelerating.
  • 80% of Anthropic's merged code is now Claude-authored, up from low single digits before Claude Code launched in February 2025.
  • 8x code output with only 4x perceived productivity gain means Claude-written code produces roughly half the value per line of human-written code.
  • Claude-written code quality was clearly below human in late 2025, is at parity today, and is expected to exceed human quality within the year.
  • The only remaining human comparative advantage in AI development is research taste: choosing which problems matter, which results to trust, which direction to abandon.
  • Novel ideas are definitionally not possible from a model trained on existing data — the missing ingredient for full recursive self-improvement is not execution capability, it's genuine novelty.
  • Once human and Claude code quality reach parity, humans stop writing code and shift to reviewing it — but humans cannot review as fast as Claude generates, making review the next bottleneck.
  • You can outsource your thinking to AI, but you cannot outsource your understanding — and as humans become more abstracted from the systems they build, comprehension failure becomes the alignment risk.
  • Anthropic banned x.AI from using their models internally, then used their own unreleased Mythos model to accelerate internal development — calling for a slowdown from that position is a structurally privileged argument.
  • In a full recursive self-improvement scenario, the only bottleneck is compute, which means capital becomes the only moat — whoever holds compute at the moment RSI triggers stays permanently ahead.
  • A 130-person Anthropic poll found median 4x productivity gains with Mythos Preview — half what the 8x code output number would imply, suggesting much of the additional code is being discarded or requires rework.
  • AI systems already succeed at reproducing novel research papers at near 100%, up from 20% two years ago — the gap between replication and origination is the last meaningful frontier.
  • A global AI slowdown would require every well-resourced lab in every country to agree and be independently verifiable — which is harder than nuclear arms control because training runs are far easier to conceal than missile silos.
  • Anthropic argues that even if recursive self-improvement never happens, today's AI capabilities are already so underutilized that major world changes will occur from capability overhang alone.
Takeaway

The human advantage is narrowing faster than most planned for.

WHAT TO LEARN

Anthropic's own internal data draws a clear line from 'humans write code' to 'Claude writes 80% of code' — and the remaining human edge, research taste and judgment, is already being measured and shrinking.

  • AI task horizon at Anthropic is doubling every four months, down from seven months — the acceleration is itself accelerating, not just the capability.
  • 80% of Anthropic's merged code is now Claude-authored, up from low single digits before Claude Code launched in February 2025 — a full order-of-magnitude shift in under a year.
  • 8x code output with only 4x perceived productivity gain means Claude-written code produces roughly half the value per line of human-written code — more output isn't the same as more value.
  • The remaining human comparative advantage is research taste: choosing which problems matter, which results to trust, and which directions to abandon — not execution.
  • Novel ideas are definitionally not something a derivative model can originate — the gap between replication (now near 100%) and origination is the last structural moat.
  • Once AI and human code quality reach parity, humans stop writing and shift to reviewing — but review speed cannot match generation speed, making human review the next bottleneck.
  • Becoming disconnected from the systems you're building is itself an alignment risk: you can outsource thinking, but you cannot outsource understanding.
  • In a full recursive self-improvement scenario, compute becomes the only bottleneck — which means capital becomes the only moat, and whoever holds compute at that moment stays permanently ahead.
  • Even if AI capabilities stopped improving today, capability overhang — the gap between what models can do and what workflows have caught up to use — would still produce major economic disruption.
  • Calling for a global AI slowdown from an undisputed first-place position, while using an unreleased internal model, is structurally self-serving — even if the safety reasoning behind it is correct.
Glossary

Terms worth knowing.

Recursive self-improvement (RSI)
A hypothetical stage where an AI system becomes capable of designing and training its own successor models, removing humans from the loop entirely. The only remaining bottleneck would be available compute.
CoreBench
A benchmark that tests whether an AI can read a published research paper and successfully reproduce its experimental results from scratch — measuring ability to execute on described novel methods, not to originate them.
Mythos
An internal Anthropic model, more capable than publicly released models, used internally to accelerate their own development but never released to outside developers.
Task horizon
A measure of how long a skilled human would take to complete the same task an AI can reliably finish autonomously — used to compare capability across model generations over time.
Capability overhang
The gap between what AI models are technically capable of and what the surrounding systems, businesses, and workflows have caught up to using — the idea that even frozen model capability would still produce major economic disruption.
Research taste
The judgment required to choose which experiments are worth running, which directions to abandon, and which results to trust — distinct from the ability to execute experiments, which AI increasingly handles.
Permanent underclass
The concept that at the moment of full recursive self-improvement, whoever controls compute (and therefore AI capability) locks in a permanent advantage — those without capital at that moment have no mechanism to close the gap.
Resources

Things they pointed at.

00:30linkAnthropic 'When AI Builds Itself' paper
08:00toolCoreBench
09:00linkauto-research project (Andrej Karpathy)
09:20linkMeta harness paper (self-improving harness)
35:00channelBox / Aaron Levy (net-new work framing)
Quotables

Lines you could clip.

26:07
You can outsource your thinking, but you cannot outsource your understanding.
Complete thought, no setup needed, immediately quotable — from a tweet Berman citesIG reel cold open↗ Tweet quote
27:00
As of May 2026, more than 80% of the code we merged into Anthropic's codebase was authored by Claude.
Specific, verifiable, stunning data point from Anthropic themselvesTikTok hook↗ Tweet quote
36:00
Work and life ran on a gift economy of small favors between humans. 'Can you help me get this script running?' — each one created a little debt, a little mutual awareness. Claude has eaten the favors.
Poetic, specific, hits on something people feel but haven't namednewsletter pull-quote↗ Tweet quote
37:20
Ideas are the important part. Interesting to think about.
Pithy reversal of the 'ideas are cheap' Silicon Valley clicheTikTok hook↗ Tweet quote
42:14
If you're an Olympian and you have 10 other competitors racing the 800-meter dash, and you're in first place halfway through, and you say, 'Hey, guys, why don't we all slow down?' — you're always going to be in first place.
Clean analogy, complete thought, no context needed to landIG reel cold open↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogystory
00:00AI is now literally building itself. And according to Anthropic, that means two things. One, society is not prepared.
00:09And two, we should actually slow down development. That is incredibly self serving by Anthropic, but let me explain it all.
00:17And if you like me breaking down things like this, please subscribe to the channel, like the video. It really does help us grow, and let's get on to the paper. When AI builds itself, our progress toward recursive self improvement and its implications.
00:34So it starts out just straight to the point. We are delegating a growing share of AI development to AI systems. Given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor.
00:52Now interestingly enough, they say recursive self improvement is not inevitable. It definitely seems inevitable at this point, but there is one missing ingredient that we don't have today.
01:03And so stick around because I'm gonna explain what that is later in this video. And, of course, they have to mention safety because they're anthropic.
01:11Full recursive self improvement also might increase the risk of humans losing control over AI systems. That's putting it lightly. I definitely agree.
01:20This is a major alignment problem that needs to be solved. Alright. Now what they're gonna show in these little graphics that they put together, which, by the way, beautiful, just top tier graphics on the Anthropic team, they are going to show the progression of humans being abstracted further and further away from the actual development of artificial intelligence, from the actual code being written, the research being done, everything, and and it's wild to watch.
01:48So first, what we're gonna see, building the first cloud. And look. This is just a few years ago.
01:54This is not that long ago. In the early days, work at Anthropic looked like work at any other tech company. You know, you're an engineer.
02:02You write code, and that code gets shipped to the end user. That's what it was like.
02:07So there it is. K? So you have the human developer writing directly into a computer, and the computer is building Claude.
02:18Great. Looks like good old fashioned development work. But things changed, and they changed quickly.
02:24In the subsequent couple years, we moved to chatbots.
02:29K? This is ChatGPT. This was the ChatGPT moment.
02:33This is when humans started talking to a computer.
02:37The computer was really just a chatbot, and that was building clot. Okay. Then that is the point at which we really started seeing something different happen.
02:47The 2025 to 2026 era of coding agents no longer was the human writing code directly.
02:56We had the person talking to a computer, which was really a chatbot, which delegated out to an agent, which wrote code to build Claude.
03:07Now check this out. What we see here, and if you look at kind of the density of the pixels in the Claude logo, it's very, very big.
03:15Then they start to get more dense, more pixels, and it's exactly telling as to what's going on.
03:21As the human becomes more and more abstracted away from the actual problem, it's able to we are able to actually produce a lot more code, a lot more research, and that's what we're seeing here.
03:34So then we had autonomous agents. The human spoke to a computer, typed in into a computer.
03:40That computer delegated out to a chatbot, delegated to an agent. That agent delegated to sub agents, to workers. Now all of a sudden, we had major parallelization happening.
03:52That means a human typing one little thing, a little prompt, could result in a tremendous amount of code.
04:00Now lines of code is not the measure. That is not the measure of quality. It is one signal to look at.
04:07We're gonna get to that in a little bit. K? And then look at this.
04:10Once again, the Claude logo becomes even more dense with pixels. Now here is where we close the loop.
04:17In the future, agents could become capable enough to build and train models themselves. If this happens, future versions of Claude could be continuously improved by Claude itself.
04:27And there is only one bottleneck at that point. Can you guess what it is? Because if you look at this little graphic right here, there is something missing, the human.
04:38In every previous iteration of this graphic and of the evolution of how humans work with artificial intelligence, there's a human involved. And then finally, we close the loop, when we develop recursive self improvement, that is the point at which humans are no longer involved.
04:56And as you can see here, the only bottleneck is compute. How much compute can you throw at the problem? How much parallelization can you have?
05:05And so I talked about how writing the code is actually not the hard part anymore. It's everything around that. And the sponsor of today's video handles everything around that if you're building an AI product.
05:17Here's a little bit about DigitalOcean. Building and scaling AI in production is harder than it should be. If you watch this channel, you know that better than anybody.
05:25You're either going to a hyperscaler that is overly complex, or you're building on raw GPUs which have no software and you're doing everything yourself. Meanwhile, specialized inference wrappers are merely software layers renting GPUs from Neo Cloud middlemen. But there is a better way.
05:43DigitalOcean's AgenTeq inference cloud is purpose built for production AI. I've actually been using DigitalOcean for a long time in my previous businesses as well, so super excited to tell you about them today.
05:56With DigitalOcean's AgenTic Inference Cloud, you get the best of both worlds. You get reliability and simplicity, and this is inference optimized infrastructure.
06:06If you're an AI developer, you get everything you need to deploy AI models, scale to millions of requests, and operate reliably in production without the complexity or cost of a hyperscaler and without having to write all the software yourself like going to a bare metal GPU. AI companies running at high volume are already experiencing gains in throughput, latency, and cost efficiency.
06:29So if you're building scaling AI in production, check out DigitalOcean's AgenTeq inference cloud today. Tell them I sent you.
06:36Link down below. Alright. So next, what we're gonna talk about is the progression.
06:40What they are actually seeing inside Anthropic as they're building these tools, they're developing Claude code. They're developing Claude AI, they're developing the actual model of Claude. What are they seeing?
06:51They actually shared a lot of really interesting information. Let's look. The length of tasks that they can reliably complete on their own, that is AI agents, has been doubling roughly every four months up from an earlier trend of doubling every seven months.
07:07That is the definition of accelerating. The acceleration is accelerating.
07:13That is kind of the strongest signal of recursive self improvement that I've ever seen. Now they are very clear in this article that recursive self improvement is not here yet, but we are seeing hints of it.
07:25And I've been talking about this for a while. You know, whether you're talking about the auto research project from Carpathi, which we're gonna talk about later in this video, whether you're talking about meta harness, which was an incredible paper that I covered, which actually allowed the model and the harness to improve itself.
07:42Right? These are all self improving signals that we're seeing. Okay.
07:46So let's talk about the progression. March 2024, which is, you know, a little over a year from when ChatGPT was first launched. March 2024, Claude Opus three could complete software tasks that take humans about four minutes to complete.
08:01That doesn't describe how long it takes the model, the AI model, to actually complete that task. But, basically, what it's trying to do is set a standard of a long horizon task and how much it could complete.
08:12And what it's saying, what they are saying, is that if if a human had a task that took them four minutes, let's say, you know, writing a function, now Opus three could do it, or back then, Opus three could do it.
08:28And four minutes is not that long. But, again, just like a year previous, all we had was tab complete. So, like, a model being able to complete something that takes a human four minutes was actually a big deal, but we've obviously come quite far since then.
08:42A year later, Sonnet 3.7, which is interesting that they didn't mention Opus again, but Sonnet, their workhorse model, they're slightly less capable but much cheaper model, managed tasks that took about an hour and a half.
08:57So it is now able to complete a much longer Horizon task that humans took about ninety minutes to complete. Then one year later, which is pretty recent, Opus four six managed twelve hour tasks.
09:12And let me tell you something. I've been finally coding and developing new projects, and, boy, I'll give some of the like Codecs or Claude Code, I'll give them a task, and they just go off.
09:24I've had tasks run for forty hours successfully. Let's keep going down the progression. If this trend holds, tasks that take a skilled person days could come into range this year, 2026, this year.
09:37In 2027, AI systems could be capable of tasks that take a person weeks. I think we are closer to that than most people realize.
09:46I really do. Every single time we hit some milestone, the next milestone comes faster than even the absolute experts predicted, and that is what we continue to see.
09:56Now they start to talk about something called CoreBench, which I've reviewed on this channel. But CoreBench tests the ability of a model to reproduce existing research.
10:05Why is that important? Okay. So if a researcher, a human researcher, does all this work and writes a research paper, a white paper, and describe a method in which to reproduce the results, the ability of a model to be able to read the white paper and successfully reproduce the results is an incredible skill to have.
10:27Now it is missing one thing, and they talk about this later. I'm gonna give a hint at what it is now, which is the actual task of discovering that novel research in the first place, the thing that the human did to write the white paper.
10:41That is something that the models are not really doing today. That is the missing ingredient. But if the model can read and understand novel research that likely is not in its training set, reproduce the results, that is a very valuable signal as to what's coming.
11:02So AI systems went from succeeding at reproducing the results roughly 20% of the time in 2024, two years ago, to saturating the benchmarks fifteen months later. AI models can now successfully reproduce AI research paper, novel ideas successfully nearly a 100% of the time.
11:21Crazy. So then they get to what's actually happening within Anthropic. And there's, again, two pieces now, and I've already kind of described the difference.
11:29There's the engineering piece, which is how much code, how much of the total code, and how much new code is Claude actually writing. That's the engineering bit.
11:39Writing code, standing up the infrastructure, and overseeing model training. That is likely what most of us, you, me, are using these models for. Then there's research, and this is the part where models are kind of only scratching the surface of capability right now.
11:54Deciding what experiments to run, interpreting what comes back, and figuring out which ideas to try next. This is very important to keep in mind. This is the difference between having a new novel idea and knowing how to verify it, having taste in research, taste in a project that you're building versus actually going to execute that project, going to actually build it.
12:19We are at the point in AI right now in which the actual development is now the easy part. The ideas is becoming the hard part. I'm going to touch more on that in a moment.
12:30We've now gotten to the point where Claude can be handed an underspecified problem and figure out how to solve it. We are still handing the problem to Claude. The problem is still the realm of humans to determine, but even a very unspecified problem can be determined and developed by Claude.
12:48And I know firsthand that that is the case because I basically just say, like, my prompts are so underspecified. It's crazy. Let me know if yours are too.
12:58Like, I literally just take a screenshot of an error, put it in Claude code, put it in Codecs, and just say fix it. K? Like, these are highly underspecified prompts, and they do just fine.
13:09Large performance gaps persist when it comes to Claude exercising judgment in choosing goals in both engineering and research, and I found this as well. If you've built any kind of project with any of the modern AI models, you've probably noticed that you give it a big problem, but you still have to tell it the problem when it goes out and builds it.
13:28But actually knowing what to build next, it's not really doing that today. So, like, this is the type of prompts that I give. The export button isn't working.
13:36Please fix it. Now this is, you know, pretty specified because you're saying this is the problem. The export button isn't working.
13:43Although it's short, like, please fix it. You're not telling it exactly what's wrong. But then the kind of the next step, the way you can also do it is investigate why the network slows down under heavy load.
13:53Here's the problem, and it's it's just a bigger problem. But the thing that these models can't do quite yet is what should the team build next quarter? And I found that.
14:02I said, okay. What do we build next? And it kinda gives me some decent answers, but it's not great.
14:07And I I think that's probably a reflection of the fact that truly novel ideas are not coming out of these models yet. And that is the missing ingredient that I talked about earlier. Truly novel ideas, by definition, are not possible when everything coming out of the model is a derivative of the data that was put into the model.
14:30Right? It's such an important concept to note. Now will we get to a point in which they can come up with novel ideas?
14:37With current architecture, possibly. Maybe it does require some kind of new architecture.
14:42Maybe LLMs just won't get there. We'll see.
14:45Alright. So here's an interesting thing. We probably already knew this.
14:49And I think at least if you're coding regularly, you're probably already feeling this yourself. But listen to this. As of May 2026, more than 80% of the code we merged into Anthropix codebase was authored by Claude.
15:04Now this is super interesting. I just spoke to a CEO of a tech company, 300 people, and he tends to be a little bit more pessimistic about AI and its ability.
15:17And I told him, and I was talking to him about it, I was like, hey, the vast majority of code being written by Anthropic is written by Claude. And he looked at me and he goes, yeah. That's why it's so buggy.
15:28And so, like, he he might have a point. But nonetheless, Claude is shipping the vast majority of lines of code for Anthropic, and they are moving faster than any other company on the planet, bar none.
15:43So before Claude Code launched in research preview in February 2025, this number was in low single digits.
15:49So low single digits up to 80% in basically a year. That is crazy. That is crazy to think about.
15:57That shift also shows up in the amount of output per engineer. This is where you guys, especially if you're traditional engineers, traditional developers might start to cringe hard. Let's talk about it.
16:08Code contributed per person by quarter. Lines of code.
16:14The ultimate annoyance, the ultimate metric that really pisses engineers off. This is like the old school way where, like, nineties, early two thousands, you had nontechnical managers managing developers, and the nontechnical managers would measure many engineers based on how many lines of code written.
16:35And it is such a bad measure because it doesn't matter how many lines of code you write. It's about how high quality is a code, how readable is the code, how efficient is it.
16:47Right? Does it actually get the job done? In fact, there's an argument that less lines of code that can achieve the same outcome, achieve the same output, is actually much more valuable.
16:59And so, again, they do say this. They do Anthropic does caveat. Okay?
17:04So caveat by Anthropic lines of code is an imperfect measure. That is an understatement to say the least as it measures quantity over quality, but it is one signal. Okay?
17:14And it's interesting to see this acceleration. Look right around q three, q four of last year. That is when their lines of code per engineer absolutely exploded, and that is when the vibe shift in Agencik engineering happened.
17:30And if you've been following AI for at least six months, nine months, you felt it. You felt something happening in October, November, December of last year.
17:42Something changed. And there were really two things that changed. Opus 4.5 came out and GPT five came out.
17:50Both of these models were better, like significantly better at coding than all previous models. It was a step change.
17:59It really was. And so we saw it. From q four to q one, we had a massive increase, two and a half times the amount of code to 5.8.
18:06Now what's interesting is they're using Methos internally, and I think about why they might be doing that. I think about this is like there's so I have so many thoughts going on right now. I'm gonna try to break it all down for you.
18:16They have been using Mythos since q one. They're using this Frontier model, and they did not release it publicly.
18:24Now this is the most anthropic coded thing I've ever heard because, first of all, there was let me see if I can find it. Anthropic cut access to x AI.
18:34Shout out Kylie who broke this story, I believe.
18:40But this was 01/09/2026. X AI staff had been using Anthropix models internally through Cursor until Anthropic cut off the start ups access this week. They said, no.
18:52No. You can't use our models to code your models, which is insane to me. That is insane.
18:57That is massive platform risk. It was reported Anthropic basically banned their competitors from using their own models, and at the same time, they were starting to use Mythos internally. Now what happened again?
19:08Rather than doing the same thing, rather than releasing Mythos and then having to turn it off at x AI, turn it off at whatever other competitors there were out there, instead of doing that, they simply didn't release it.
19:21They're using it internally. They're accelerating their code development.
19:26They're accelerating their AI research internally. They're basically trying to, and I know everybody loves this, but win the race. They're trying to do this, and they did it without kinda looking bad.
19:38And and it's like they it's so self serving. Mythos is so scary, we can't release it.
19:44But we're gonna be using it, by the way. Don't worry. We gotcha.
19:47And we're gonna be using it safely. We're gonna be using it to accelerate our own AI models, but, also, that lets us get further ahead.
19:56We didn't intend that. Don't worry about that. Don't don't think about that too much.
20:00It's crazy. And so I really think that's what's happening. And so they got the fear based marketing going.
20:06They got to accelerate their own development, all while not sharing any of that with the broader market.
20:12Cool. Thanks, Anthropic. Here is important, and I agree with this.
20:15Eight times the number of lines per code per engineer per day in the second quarter of twenty twenty six is almost certainly an overstatement of the true productivity gains. That statement is doing a lot of work.
20:27And I'm not sure everybody realized it because, in fact, when myself and my team were going over this, I don't think I realized it. And I I think it was Jonah who pointed it out.
20:38He was like, that actually means that the code being written by Claude is worse.
20:45It's worse than human written code. And maybe that's true. There's actually a strong likelihood that's true.
20:51There's also an argument that they simply don't know what to do with all of that code. And this is something that I talked about in a previous video, I think actually my last video, where it's like, you can write as many lines of code. You can develop as many features as you want.
21:05And and if all of this stuff happens at an accelerating rate, the bottleneck becomes everything else involved in releasing a new feature, The marketing, the sales, the adoption, the documentation.
21:18In any machine, in any system, when you bottleneck one part of it, you expose the next biggest bottleneck, and that's what we're seeing here.
21:27That's probably also in conjunction with Claude not writing as good code as a human can write, but it's probably also they're not able to actually deploy that code as successfully. In March 2026, in a poll, a 130 employees from across anthropic research teams, the median respondent estimated that they produced around four times as much output with Methode's preview as they would have without access to any AI model.
21:54That is a significant improvement, four times your productivity. But remember, they're producing eight times as much code.
22:01So that code is half as valuable as purely human written code. Now overall, they're still much more productive as they would be otherwise.
22:10And that actually just shows me that with a little bit of tweaking and finding other areas to remove bottlenecks from, they're going to be so much more productive. Even if the model capability stops today, they still have room not only to deploy the model, increase adoption, improve the scaffolding, the harness around the model, but actually unblock other areas of the business to deploy the code effectively.
22:41So this code, Mythos Preview, is 50 less productive than human code.
22:48Very interesting. A significant fraction of Anthropic technical staff is accomplishing their core work multiple times faster than they could without AI assistance. And actually, we didn't even talk about this with my team, but what does this tell me?
23:00If all of a sudden the development team, the technical staff is able to produce so much more code and develop features at such a higher clip, then they need more people to market it. They need more salespeople to go sell it.
23:14They need more customer support people to service their customers. So that's kind of an argument against this kind of AI is going to have this huge jobpocalypse. Right?
23:25There is a strong argument that as productivity increases, so does the need for humans. And we're gonna continue on that thread throughout the rest of this paper.
23:35You're going to see multiple examples in which this is just shown with data. Alright.
23:40Let's keep going. Here's another important part on that thread. We also see evidence that people at Anthropic are using Claude to do work that simply wouldn't have happened otherwise.
23:50It's not just that they're able to automate things that they were already doing. It's that they are doing new work, net new work. That is very exciting.
24:02That's the frontier. Unfortunately, there are some companies that are thinking, let me automate this thing so that I don't need as many people.
24:10They're going to lose. Those companies that have that point of view, that have that posture towards their employees are going to lose. Aaron Levy has talked about this.
24:21Shout out to Box, sponsor of my channel.
24:25They're just a great partner, so shout out to them. He's like, the old way of thinking about it, the old as being two years ago, is like, Okay, what work do we already do that we can automate?
24:35The right way to think about it is what work would we not have done because we don't have these tools? What net new value can we be driving? That's the exciting part to me.
24:45Here's a an anonymous Anthropic employee. It's now been five months since I last wrote AnyCode myself. Now the part that they're not describing is the actual reviewing of code, and they're gonna talk about that later.
24:58Are humans reviewing code? Is it even possible to review AI written code? Because if AI is writing eight times as much code, you need technically eight times as many people to review that code, and I guarantee they're not doing that.
25:12So how do they do it? Well, of course, they use AI. We're gonna get to that in a moment.
25:16Okay. So here's ClaudeCode Sessions success rate. Here is where we've seen the biggest improvement.
25:21Open ended problems. This is where we really saw the improvement. This is really where agentic engineering became crazy valuable.
25:29Now here's something super interesting. We talked about how do you actually review AI code. How do you actually do it?
25:36If it's producing so much more code, how do you do it? Session success is determined by a Claude judge.
25:43This is crazy to think about. As I said earlier in this video, the human in the loop is becoming more and more abstracted away from the core problem.
25:52And so we have AI writing code, and now we have AI reviewing code.
25:58And we're just getting further and further away from understanding the details. That actually reminds me of another tweet, which was like, I I think about this daily, this tweet. You can outsource your thinking, but you cannot outsource your understanding.
26:13So as we become more disconnected from the systems, more disconnected from the details, how do we maintain our understanding?
26:21You can offload your thinking, meaning you can have AI build the systems, determine what research projects to take on, actually go and execute the research projects.
26:32But ultimately, a human needs to understand it. If they don't, that is the recipe for AI misalignment. And if they do, humans become and continue to be the bottleneck in the entire system.
26:46And as long as humans are the bottleneck, then it is the kind of recursive self improvement is rate limited by human cognition. So I I found this to be very interesting that Claude is the judge.
26:57Very, very interesting. So we we talked about that discrepancy between the increase in the number of lines of code and their perceived output increase, the value, the actual productivity as the, like, perceived by anthropic employees.
27:13And we noted eight x as much code, but four x as much productivity, which, of course, leads us to believe the code being written is not as valuable. It's not as good.
27:24They're writing more code to accomplish the same amount of value. And listen to this. There isn't full consensus among staff at Anthropic, but many believe that the Claude written code was still worse in quality than human written code at Anthropic in late twenty twenty five and is roughly at parity today.
27:42We expect it to be better within the year. I sure hope so. So that's the number.
27:46As there is eight x as much output in code, there should be eight x as much output in value. Now that number is never going to match because every other part of the system from documentation to graphics, marketing, sales, customer success, customer support, all of that needs to also keep up.
28:06And the the the vanguard, the tip of the spear, is the code being written. The next thing that they talk about is how Claude and AI models in general are accelerating research.
28:17And we already talked about that research decisions, the novel ideas of what direction, the taste, what direction should we head in, what should we experiment with.
28:28That is still the realm of humans. So they developed a test, a miniature version of an experimental research loop.
28:36The the job given is to find speed ups by rewriting the code, running it, timing it, and repeating. So they're basically looking for ways to improve the latency of code. In May 2025, Opus four averaged three x speed up over the starting code.
28:52In April 2026, brace yourselves, Mythos preview was achieving 52 x.
29:0152 x speed up, up from one year prior, less than one year prior of an average of three x speed up. Now how does that compare to humans?
29:11Well, they told us. A skilled human researcher would need four to eight hours to reach four x.
29:18Although AI today, they aren't developing, they aren't coming up with new and novel ideas for areas of research, or at least not successfully, they are accelerating humans.
29:31They are allowing humans to be much more productive in deciding, designing, and executing experiments.
29:38Anthropic does say Claude is getting better at proposing its own experiments. And when you have unlimited compute, when you have the ability to parallelize to incredible numbers, it's kinda like that that notion that if you give, you know, a a million monkeys with typewriters would be able to come up with the the best, you know, works of writing in human history.
30:01It's kinda like that. If you give them enough compute, if you let them come up with enough experiments, it doesn't really matter if they have good taste or not.
30:09Taste is important when you have limited resources. Now, obviously, companies still have limited resources, but still, if we have unlimited resources, taste does not matter anymore because they will just figure out, they being the AI, figure out everything to test, test it, and just find the best.
30:26But it is getting better. It is getting better proposing its own experiments. But for now, the taste of deciding which experiments to run is the realm of humans.
30:35Now it took me a little while to understand what this chart meant, so I'm gonna try to break it down. Basically, what they're showing is they took past experiments to try to, let's say, improve AI or, you know, let's say, make it faster, make it more make it better, improve the development of it, and they basically applied AI to it to try to see, okay, where the human failed, when the human made a poor decision about the direction of an experiment, if we applied AI, would AI have gotten it right?
31:05Now Claude Haiku three back in March 2024, 22% of the time it would have done better. Fast forward to Claude Mythos preview, it is now at 64% of the time.
31:16So there is this notion that Claude or AI in general is going to be able to decide directions or decisions later in the experimentation process better than humans can.
31:29So we view this result as an early signal that AI systems are getting better at making the kinds of judgment calls that AI research depends on. Now if you've been using artificial intelligence at all, you probably know this. You probably see it.
31:44The comparative advantage of humans as of right now is still in seeing the bigger picture and thinking beyond the confines of the immediate task.
31:54This is prompting and verifying. So what does this actually mean? What does it mean for humans, especially humans at Anthropic, which they're going to talk about, but humans in general?
32:03Because you can extrapolate what they're learning from how humans and AI are interacting to develop software and develop AI systems to broader knowledge work. So let's see. The evidence suggests that the human role is narrowing at each step in the AI development process.
32:20Once human and AI authored code quality reach parity, humans will stop writing code entirely and shift to only reviewing it. Now here's the important part that I mentioned earlier.
32:30But if they can't review code as quickly as Claude can generate it, human review will become the bottleneck to AI development. This is something that we talked about as a team, on my team. There was this notion for a long time in Silicon Valley that ideas are cheap.
32:45Right? Anybody could have their the next billion dollar idea. The hard part is actually going out and building it.
32:52And not just the code, but recruiting, marketing, sales, fundraising, kind of the grind, the years and years of grind a founder needs to go do to be successful.
33:04The idea changes over time. The idea, you know, it's called a pivot.
33:09It's kind of a a very normal thing to do in Silicon Valley, a pivot. You have one idea, didn't really work out how you thought, and thus, I'm going to pivot to this other direction.
33:19But execution, which was the hard part, can now be done by artificial intelligence, at least so far the development of code. And so does that make ideas even more valuable or actually valuable?
33:31Because if you have a great idea and you could just push a button and let the AI go and execute it all, sounds like ideas are the important part. Interesting to think about. But they still say research taste and judgment, including choosing which problems matter, which results to trust, and when an approach is a dead end is still the realm of humans.
33:51Alright. So next, they talk about what if they're wrong. What if all the things they're talking about self improving AI is wrong or it just doesn't happen?
33:57So it is unclear today whether the today's training methods and architectures could unlock that capacity. That means kind of end to end recursive self improvement.
34:09And this is kind of the idea versus execution trade off that I mentioned earlier. Edison said that genius is 1% inspiration, that's the idea, and 99 perspiration, that's the execution.
34:20But we see perspiration becoming increasingly automated. What a what a wild time. Every notion that I had kinda, let's say, quote, unquote, growing up in Silicon Valley is is, like, just being completely, like, flipped right in front of my eyes.
34:35It's it's so wild to think about. So even if we suppose that Claude never achieves good taste, good research taste, a conservative reading of our evidence still implies compounding acceleration. So it just means if humans are forever prompting and verifying, that prompting and verifying is going to have kind of this massive multiplication effect.
34:58Humans spend most of their time on the single digit fraction of work direction setting, and I believe that's gonna be the case for a long time. Now in domains that are fully verifiable, that's when it's like, okay.
35:11Infinite compute kind of removes the necessity of taste, but we're never gonna have infinite compute. Like, at least not for the, uh, like, a very long time because we're still bottlenecked by, first of all, just the production of silicon and data centers and, most of all, energy.
35:27You know? Unless we somehow discover unlimited energy, we will never have unlimited compute.
35:33So then they talk about the three possible futures. Number one, the trend stalls, but today's AI capabilities are widely diffused. That is assuming we have this s curve, curves up, gets really exciting, and then all of a sudden just flattens out, and we do not get any more progress.
35:49That is one potential outcome, but they say that is unlikely in their opinion. And something I've been saying for a while, even if model capabilities were frozen at today's level, we would expect major changes to occur in the world.
36:04That is the capability overhang. That means the models are so capable, everything around them has not caught up yet.
36:12And that includes the scaffolding, the harness.
36:16That includes the other areas of business and society that haven't even caught up enough to leverage all of that additional code being written. Second potential future, AI labs continue to see compounding efficiency gains. AI development becomes substantially automated, but humans continue to set research directions and judge results.
36:35K? So this is the outcome in which end to end does not happen. Taste, judgment, that is still critically important for the foreseeable future according to this outcome.
36:46But it does mean that a 100 person company can do the work of a 10,000 or even a 100,000 person organization. That is major productivity gains for the world. Very exciting future.
36:58But, and this is what I talked about, speeding up one part of the process often just shifts the bottleneck elsewhere. Overall pace is capped by the parts that haven't sped up. Okay.
37:07Now the third outcome, and this is both the most exciting and also the most scary. AI systems themselves become capable of full recursive self improvement and begin building their successors. If technical trends in advancing capabilities continue and AI systems are able to develop the capabilities inherent to transformative human ingenuity, then it is plausible that AI systems could design and refine themselves.
37:35That is recursive self improvement. That is the intelligence explosion, super intelligence, whatever words you want to apply to it, that is what they're talking about here.
37:44In this world, pace of progress in AI development becomes determined entirely by the availability of compute for AI systems. Now, boy, does that sound like something?
37:55Who has heard the term permanent underclass? This is the concept that once we hit recursive self improvement of models or even AGI, wherever you are in the societal classes, the the class structure in society, wherever you are, that is where you are forever.
38:13If you've made it above the permanent underclass, you're good. If you're below it, sorry. Now there's a lot to unpack there, and I think this is probably pretty reasonable.
38:24If we get to the point at which AI can do end to end recursive self improvement, humans are no longer needed. The only bottleneck left is compute. And if we say compute, really what we mean is energy.
38:37And thus, how do you acquire compute? Well, you need capital. You need to buy it.
38:42And so at the moment in which recursive self improvement happens, whoever has capital at that moment will just buy up as much compute as they can.
38:53They will buy up as much intelligence as they can. And if you have capital, you're in a good place.
38:59But if you don't, you're not. And that is the whole notion, the whole concept of the permanent underclass. That is a scary future.
39:06It really is. It might even be at the point at which that happens. It's not about capital.
39:11It's whoever had the compute, had the energy production mechanisms to begin with.
39:17Whoever had it at that moment, it's just all of society freezes. Now here's the scary part and very anthropic coded.
39:25We do not have good intuitions for what this world would look like because our economy is currently driven by humans and human built tools. How would they know? What does it look like if you have infinite intelligence?
39:38What about if you had embodied intelligence, aka robots, and infinite robots, infinite labor. What happens in that world?
39:46Anthropic does not claim to know. They might be genuine in their admission that they don't know, but that is also very much fear based marketing in my mind.
39:55They talk about embodied intelligence robots right here, and they think they expect that robots might quickly follow recursive intelligence and follow a similar path of increasing returns at decreasing cost.
40:08Now they say something here that I don't necessarily agree with. Achieving recursive self improvement alone does not suggest an immediate change in how industrial production occurs, societies organize, or markets function. More intelligence can't learn what a drug does over decades of use.
40:25Is that true? What about if we could just run simulations, unlimited simulations?
40:31Can't we predict with pretty darn high accuracy what a drug will do over decades of use?
40:38I don't really understand that. Alright. Now here is the part in which brace yourselves, Anthropic does Anthropic stuff.
40:46If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing. They are saying, we think we should slow down.
41:00We think we should slow down. Now they're gonna go on to say, we'll only slow down if everybody else slows down, which is completely understandable, logical even, but that's a nice thing to say when you're in the literal lead, when you are winning the race.
41:18If you're an Olympian and you have 10 other competitors that you're racing the 800 meter dash on, and you're in first place halfway through, and you say, hey, guys. Why don't we all slow down?
41:30Equally slow down. You're always gonna be in first place. Of course, you would wanna do that.
41:35But I guarantee x AI doesn't wanna slow down anymore. I guarantee other countries don't wanna slow down anymore.
41:43But Anthropic, so nice of you to say that you think we should slow down because you're in first place. And they go on to talk about it pretty pretty accurately.
41:53But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe. So everybody's gotta slow down to the pace that Anthropic dictates.
42:06And they're right, though. If Anthropic being in the lead slows down, I guarantee the third place, fourth place, fifth place, they're not gonna slow down. Other countries, they're not gonna slow down.
42:16That is not how human nature works. Without a global coordination mechanism, companies and governments will have to make difficult decisions about safety while under competitive and geopolitical pressures. We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology.
42:42Fear based marketing.
42:45I just don't get it. It it it hurts to read this stuff because it's, like, it's so self serving from Anthropic. They can be the good guys by saying, oh, well, we we said, you know, we said we should slow down.
42:56We said it. Look. Look.
42:57We can point back to that essay that we wrote about recursive self improvement. We said we should slow down. We should give society a chance, but nobody agreed with us.
43:05You know, we were in the right. We were the good guys. We had all the morals.
43:08You know? No. It was everybody else.
43:10Boy, that's a nice position to be in. I guarantee you they would not have said that if they did not have the absolute frontier of AI right now. But they do say it's not impossible for the entire world to slow down.
43:22It would be extremely difficult, but not impossible. It would require multiple well resourced labs at or near the frontier in multiple countries agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped.
43:37How do you even do that? They go on to say it's actually much easier to determine if nuclear proliferation, if nuclear development has stopped or continues, than it is to determine whether AI systems are continuing to be developed or not.
43:52The detectability element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos.
44:03None of this is impossible in principle. Here's some more fear for you. If you didn't have enough fear from reading all of this, here's some more.
44:09Those regimes those regimes detecting whether other countries were building new missiles, new nuclear systems, they took decades to build both the infrastructure and the trust.
44:20We don't have that long. AI is developing and evolving faster than any of those other technologies. And, yeah, they're saying a unilateral pause by one lab would be basically useless.
44:31That lab would pause, everybody else would catch up, and excel. So that's it. That's the essay.
44:37Okay. So I made so many points in this video that really showed humans are gonna be in the loop for a while. In fact, we're seeing quite the opposite effect in the job market.
44:47I just made an entire video about this specific topic. Check it out right here.
The Hook

The bait, then the rug-pull.

AI is now literally building itself — and Anthropic published a paper to say so. Matthew Berman walks through that paper line by line, tracking the moment Anthropic's internal data crossed from aspiration to fact: Claude authors more than 80% of the code merged into their codebase as of May 2026, task horizons are doubling every four months, and code quality is at or near human parity. The editorial through-line is one the paper itself doesn't fully reckon with: calling for a global slowdown from a position of undisputed first place is a structurally different argument than it would be from second.

Frameworks

Named ideas worth stealing.

03:00model

AI Development Abstraction Layers

  1. Human writes code directly
  2. Human uses chatbot to assist
  3. Human delegates to coding agent
  4. Human prompts autonomous agent swarm
  5. Agents train their successors (recursive self-improvement — not yet reached)

Anthropic's diagram of how humans have become progressively more abstracted from the actual AI development process, with the Claude logo growing denser (more capable) at each stage.

Steal forany talk on the progression of AI capability or the future of software engineering
11:00model

Engineering vs. Research Split

  1. Engineering track: writing code, standing up infrastructure, overseeing model training — AI dominant
  2. Research track: deciding experiments, interpreting results, choosing direction — still human

The two-track framework Anthropic uses to distinguish where AI already dominates from where human judgment still leads.

Steal forpositioning human value in an AI-heavy workflow or org
40:00list

Three Futures

  1. Trend stalls: AI capabilities plateau, but capability overhang still reshapes the world
  2. Compounding automation: development substantially automated, humans retain research taste and direction-setting, 100-person company does work of 100,000
  3. Full RSI: AI builds its successors, compute is the only bottleneck, capital freezes class structure permanently

Anthropic's three scenario framework for how AI development could unfold, ranging from stall to intelligence explosion.

Steal forany presentation or writing on AI futures or scenario planning
CTA Breakdown

How they asked for the click.

VERBAL ASK
44:40next-video
I just made an entire video about this specific topic. Check it out right here.

Standard YouTube end-card CTA pointing to a related video about AI and the job market.

MENTIONED ON CAMERA
FROM THE DESCRIPTION
PRIMARY CTAWhere the creator wants you to go next.
Storyboard

Visual structure at a glance.

hook
hookhook00:00
paper intro
promisepaper intro01:00
abstraction
valueabstraction03:00
80% stat
value80% stat15:00
8x code chart
value8x code chart19:00
parity
valueparity31:00
human narrows
valuehuman narrows34:00
slow down
ctaslow down42:14
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

33:44
Matthew Berman · Tutorial

21 INSANE Use Cases For OpenClaw

How one MacBook running Claude Opus 4.6 replaced a CRM, a security firm, a content team, and a personal chef -- with the exact prompts to copy every piece.

February 17th
Chat about this