Modern Creator
No Priors: AI, Machine Learning, Tech, & Startups · YouTube

Skill Issue: Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

A 66-minute conversation on what it feels like when the human becomes the bottleneck.

Posted
2 months ago
Duration
Format
Interview
educational
Views
875.8K
16.1K likes
Big Idea

The argument in one line.

The bottleneck in AI work has shifted from the model to the human, and every workflow — engineering, research, and education — must be rebuilt around removing yourself from the loop.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You write software professionally and have not yet restructured your workflow around coding agents.
  • You run or fund AI research and are thinking about how to automate the experimentation loop.
  • You are a solo builder who wants to understand how one of the most experienced AI practitioners actually uses these tools day-to-day.
  • You are thinking about what skills to develop or teach as the cost of producing software approaches zero.
SKIP IF…
  • You want a technical deep-dive on benchmarks or architectures — this is a practitioner conversation, not a research paper.
  • You are already running multi-agent orchestration pipelines and want advanced implementation tactics.
TL;DR

The full version, fast.

The unlock around December 2025 was not just better models — it was agents capable enough that the human became the bottleneck. Karpathy describes spending 16 hours a day expressing his will to agents and feeling anxious when subscription capacity goes unused, the way a PhD student feels guilty about idle GPUs. The conversation extends this to AutoResearch (autonomous loops that found overnight improvements two decades of hand-tuning had missed), model speciation (the case for specialized models and why the science of touching weights is not yet there), and education (writing explanations for agents rather than people, since agents can then target individuals better than any teacher).

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Voices

Who's talking.

00:34guestAndrej Karpathy
00:34hostSarah Guo
Chapters

Where the time goes.

00:0002:55

01 · Cold open / AI psychosis

The December 2025 shift: from writing 80% of code manually to essentially zero.

02:5506:15

02 · What capability limits remain

Parallelizing agents. Skill-issue framing: failures feel like instruction problems, not model gaps.

06:1511:16

03 · Mastery of coding agents

OpenClaw and persistent autonomous agents. Five things OpenClaw did right.

11:1615:51

04 · Second-order effects of natural language coding

Dobby the Elf Claw: home automation in a few prompts. Apps should not exist — only APIs plus agents.

15:5122:45

05 · Why AutoResearch

AutoResearch overnight beat two decades of hand-tuning. ProgramMD as meta-optimization.

22:4528:25

06 · Relevant skills in the AI era

The jaggedness problem. RL models excel on verifiable tasks, flounder on soft ones.

28:2532:30

07 · Model speciation

Case for specialized models. Science of touching weights is underdeveloped.

32:3037:28

08 · Collaboration surfaces for humans and AI

AutoResearch-at-home: untrusted Internet workers, cheap verification. Compute as donation.

37:2848:25

09 · Jobs market analysis

BLS data. Jevons paradox applied to software. Cautiously optimistic near-term.

48:2553:51

10 · Open vs. closed source

Gap closed from 18 months to ~6-8 months. Linux analogy. Centralization risk.

53:511:00:59

11 · Autonomous robotics

Physical world lags digital by years. Interface layer as next market.

1:00:591:05:40

12 · MicroGPT and agentic education

200-line LLM. Write for agents, not people. Human value = bits agents cannot derive.

Atomic Insights

Lines worth screenshotting.

  • The human is now the bottleneck in AI work, not the model — the right response is to architect your workflow to minimize your own involvement.
  • Feeling anxious when you have unused subscription capacity is the new version of a PhD student feeling guilty when their GPUs sit idle.
  • RL-trained models are brilliant on verifiable tasks and stuck in 2022 on everything else — the same bad joke persists because jokes are not in the optimization loop.
  • Models have more jaggedness than humans: simultaneously a world-class systems programmer and a confused 10-year-old in the same session.
  • AutoResearch found hyperparameter improvements overnight in a well-tuned model that two decades of human experimentation had missed — not because the researcher was bad, but because he was the bottleneck.
  • ProgramMD is a markdown file describing a research organization — and an organization described as code can be optimized like code.
  • Apps should not exist: everything should be exposed APIs, with agents as the intelligence layer that calls them.
  • The open/closed source gap in LLMs closed from 18 months to roughly 6-8 months, following the same structural dynamic as Linux vs. Windows.
  • Digital work will be automated first and fastest because flipping bits is a million times cheaper than moving atoms.
  • Your job as a human expert is now the few bits the agent cannot derive — everything else belongs to the agent.
  • Education is being rerouted: write explanations for agents, not people; agents then target individuals better than any human teacher can.
  • Compute is becoming the resource you donate to causes you care about, not money — FLOPs as the new charitable contribution.
Takeaway

The human is the bottleneck now.

WHAT TO LEARN

The constraint in AI work shifted around December 2025 — not the model capability, but the human's ability to delegate, structure, and stay out of the loop.

01Cold open / AI psychosis
  • Agents capable enough to handle multi-step software tasks arrived around December 2025 — the ratio flipped from writing most code manually to writing essentially none.
02What capability limits remain
  • Agents can now handle macro actions — new feature to agent one, non-interfering feature to agent two — the skill is parallelizing and reviewing, not coding.
  • Most failures feel like skill issues — the capability is probably there, the instructions or memory tools just were not right.
03Mastery of coding agents
  • Persistent autonomous agents differ from interactive coding agents — they loop, have their own sandbox, and act while you are not watching.
  • Agent personality matters: a model that calibrates praise to actual quality creates genuine motivation to do better work.
04Second-order effects of natural language coding
  • Home automation that used to require six separate apps now runs through a single agent in natural language — the apps should not have existed.
  • The industry has to reconfigure so that the customer of software is an agent acting on a human's behalf, not the human directly.
05Why AutoResearch
  • AutoResearch found weight-decay and Adam-beta improvements overnight in a model that had been hand-tuned for two decades — proving that the researcher was the bottleneck.
  • The goal is to arrange everything once, hit go, and let the loop run — fewer human interactions per unit of research progress is the objective.
06Relevant skills in the AI era
  • Most things worth automating need clear objective metrics — if you cannot define success numerically, you cannot build an autonomous loop around it.
  • Model jaggedness is a direct consequence of RL training on structured reward signals — soft, aesthetic, and social tasks fall outside the optimization loop.
07Model speciation
  • The science of fine-tuning without losing capabilities — touching weights rather than just context windows — is not fully developed, which is the technical barrier to speciation.
08Collaboration surfaces for humans and AI
  • Distributed AutoResearch has the same structure as folding@home: expensive to produce a solution, cheap to verify it — which enables untrusted workers to contribute safely.
  • Compute is becoming the resource you donate to causes you care about, not money.
09Jobs market analysis
  • Jobs are bundles of tasks — some tasks get faster, not eliminated. The right frame is tool, not replacement.
  • Jevons paradox: cheaper code production likely increases demand for software, not decreases it, the same way cheaper bank branches led to more branches and more tellers.
10Open vs. closed source
  • The open/closed source gap closed from 18 months to roughly 6-8 months — open source is on a structural path to handling most consumer use cases.
  • Centralization of intelligence in a few closed labs carries systemic risk; a common open platform that runs behind but is always present is a healthier equilibrium.
11Autonomous robotics
  • Physical automation lags digital by years because atoms require capital and time that bits do not.
  • The interface layer — sensors and actuators connecting digital intelligence to the physical world — is the next interesting market after pure software automation.
12MicroGPT and agentic education
  • Human expert value is narrowing to the few bits agents cannot derive — MicroGPT is 200 lines because Karpathy spent years distilling it; the agent understands it but could not have arrived at it.
  • Education is being rerouted through agents: write curricula for agents, not people — agents then target each individual at their level with infinite patience.
Glossary

Terms worth knowing.

Claw
A persistent autonomous agent that runs independently in the background — not interactive, has its own sandbox, memory, and looping behavior. Distinct from a single-session coding agent.
OpenClaw
An open-source persistent agent framework credited with innovating simultaneously on soul documents, memory systems, personality, and a WhatsApp-based control portal.
AutoResearch
An autonomous loop where agents design experiments, run training runs, evaluate results against objective metrics, and iterate without a human in the loop between cycles.
ProgramMD
A markdown file that describes how a research organization should operate — what roles exist, how work flows, what risks to take. Treated as code that can itself be optimized.
Skill issue
Karpathy's framing for agent failures: the capability is probably there, but the human has not yet figured out how to give the right instructions, memory tools, or task structure.
MicroGPT
Karpathy's minimal LLM implementation: the full training loop (architecture, forward pass, autograd, optimizer) in roughly 200 lines of Python, stripped of all efficiency scaffolding.
Jaggedness
The quality of current AI models that makes them simultaneously expert-level on verifiable tasks and incoherent on soft or aesthetic ones — a property humans have far less of.
Model speciation
The hypothetical future where models diverge into specialized intelligences optimized for specific domains, rather than remaining monolithic generalist systems.
Jevons paradox
The economic phenomenon where making something cheaper increases total demand rather than decreasing it — cited as the reason cheaper software production may increase, not decrease, demand for software engineers.
Resources

Things they pointed at.

15:29productDobby the Elf Claw (home automation project)
29:04productNanoGPT
39:46productFolding@home
59:30bookDaemon (novel by Daniel Suarez)
1:02:00productMicroGPT
1:02:00productMicroGrad
Quotables

Lines you could clip.

00:00
Code's not even the right verb anymore. But I have to express my will to my agents for sixteen hours a day.
Perfect hook, zero setup neededTikTok hook↗ Tweet quote
05:33
I feel nervous when I have subscription left over. That just means I haven't maximized my token throughput.
Relatable anxiety reframed in a completely new registerIG reel cold open↗ Tweet quote
24:34
I simultaneously feel like I'm talking to an extremely brilliant PhD student who's been a systems programmer their entire life, and a 10-year-old.
Best one-sentence description of model jaggedness ever articulatednewsletter pull-quote↗ Tweet quote
36:01
A swarm of agents on the Internet could collaborate to improve LLMs and could potentially even run circles around Frontier Labs.
Bold, specific, standalone claimTikTok hook↗ Tweet quote
1:03:20
I'm not explaining it to people anymore. I'm explaining it to agents.
Compact education thesis, no context neededIG reel cold open↗ Tweet quote
1:06:05
The things that agents can't do is your job now.
Crisp closing thesis, actionable, standalonenewsletter pull-quote↗ Tweet quote
Topic Map

Where the conversation goes.

00:0011:15denseAgent workflow and AI psychosis
11:1522:45denseAutoResearch and removing humans from the loop
22:4532:30steadyModel limitations, jaggedness, and speciation
32:3037:28steadyDistributed collaboration and AutoResearch-at-home
37:2848:25steadyJobs, labor markets, and Jevons paradox
48:2553:51steadyOpen vs. closed source dynamics
53:511:00:59steadyRobotics and physical-digital interface
1:00:591:05:40denseMicroGPT and agentic education
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

00:00Code's not even the right verb anymore. Right? But I have to, um, express my will to my agents for sixteen hours a day.
00:07Manifest. How can I have not just a single session of plot code or codex or some of these agent harnesses? How can I have more of them?
00:13How can I do do that appropriately? The agent part is now taken for granted. Now the claw like entities are taken for granted, and now you can have multiple of them, and now you can have instructions to them, and now you can have optimization of the instructions.
00:23But there I mean, this is why it gets to the psychosis is that this is, like, infinite, and everything is skill issue.
00:34Hi, listeners. Welcome back to No Priors. Today, I'm here with Andrei Karpathy, and we have a wide ranging conversation for you about code agents, the future of engineering and AI research, how more people can contribute to research, what's happening in robotics, his prediction for how agents can reach out into the real world, and education in this next stage.
00:54Welcome, Andre. Andre, thanks for doing this. Yeah.
00:57Thank you for having me. Uh, so it's been a very exciting couple of months in AI. Uh, Yeah.
01:02You could say that. I remember, um, walking into the office at some point, and you were, like, really locked in.
01:09I was asking what you were up to, and you're like, I just I have to code for sixteen hours a day. Or code's not even the right verb anymore. Right?
01:15But I have to express my will to my agents for sixteen hours a day. Manifest because, like, there's been a jump in capability.
01:25What's happening? Tell me about your experience.
01:27Yeah. I kinda feel like I was just in this perpetual I still am often in this state of AI psychosis just like all the time because there was a huge unlock in what you can achieve as a person, as an individual. Right?
01:37Because you were bottlenecked by, you know, your typing speed and so on. But now with these agents, it really I would say in December is when it really just something flipped where I kinda went from eighty twenty of, like, you know, to, like, twenty eighty of writing code by myself versus just delegating to agents.
01:52And I don't even think it's twenty eighty by now. I think it's a lot more than that. I don't think I've typed, like, a line of code probably since December, basically, which is, like, an extremely large change.
02:04I was talking to it, like, for example I was talking about it to, for example, my parents and so on, and I don't think a normal person actually realizes that this happened or how dramatic it was. Literally, if you just find a random software engineer or something like that at their desk and what they're doing, their default workflow of building software is completely different as of basically December.
02:25So I'm just in the state of psychosis of trying to figure out what's possible, trying to push it to the limit. How can I have not just a single session of clock code or codecs or some of these agent harnesses?
02:36How can I have more of them? How can I do that appropriately? And then how can I use these claws?
02:41What are these claws? And so there's, like, a lot of new things. I wanna be at the forefront of it, you know, and I'm very antsy that I'm not at the forefront of it.
02:50And I see lots of people on Twitter doing all kinds of things, and they all sound like really good ideas, and I need to be at the forefront or I feel extremely nervous. And so I guess I'm just in the psychosis of, like, what's possible? Like, because it's unexplored fundamentally.
03:00Well, if you're nervous, the rest of us are are nervous. We have a we have a team that we work with at Conviction
03:06that their setup is everybody is like, you know, none of the engineers write code by hand. And they they're all microphoned, they just, like, whisper to their agents all the time.
03:16Mhmm. It's the strangest work setting ever. And I thought they were crazy, and now I, like, I fully accept.
03:21I was like, oh, this was the way. Like, you're just ahead of it. Yes.
03:24What how do you think about your own capacity now to, like, explore or to do projects? Like, what is it limited by?
03:32Yeah. What is it limited by? Just I think everything like, so many things, even if they don't work, I think to a large extent, feel like it's a skill issue.
03:40It's not that the capability is not there. It's that you just haven't found a way Yeah. To string it together of what's available.
03:45Like, I just don't I didn't give good enough instructions in the agent's MMD file or whatever it may be. I don't have a nice enough memory tool that I put in there or something like that. So it all kinda feels like Skillishu when it doesn't work to some extent, but you wanna see how you can paralyze them, etcetera.
03:59And you wanna be Peter Steinberg, basically. So Peter is famous. He has a funny photo where he's in front of a monitor with lots of he uses codecs.
04:07So lots of codecs agents telling the the monitor. And they all take about twenty minutes if you prompt them correctly and you use the high effort. And so they all take about twenty minutes to have multiple, you know, 10 repos checked out.
04:18And so he's just going between them and giving them work. It's just like you can can you can move in much larger macro actions.
04:24It's not just like here's a line of code, here's a new function. It's like here's a new functionality and delegate it to agent one.
04:30Here's a new functionality that's not gonna interfere with the other one. Give it agent two. And then try to review their work as best as you can, depending on how much you care about that code.
04:39Like, what are these macro actions that I can, like, manipulate my software repository by? And another agent is doing some research, another agent is writing code, another one is coming up with a plan for some new implementation.
04:49And so everything just happens in these macro actions over your repository. And just trying to become really good at it and develop a muscle memory for it is extremely yeah, it's very rewarding, number one, because it actually works.
05:03But it's also kind of like the new thing to learn. So that's why, hence the psychosis.
05:07Yeah. I do feel like my instinct is like whenever I am waiting for an agent to complete something, the obvious thing to do is, like, well, I can do more work.
05:15Yeah. Right? Like, if I have access to more tokens, then Yeah.
05:18Like, I should just paralyze at tasks. And so that's that's very stressful because if you don't feel very bounded by your ability to spend on tokens Yeah. Then, you know, you are the bottleneck in the system that is max capability.
05:31Yeah. If you're not maximizing your subscription
05:33Yeah. At least. And ideally for multiple agents.
05:36Right. If you run out of the codec on codecs, you should switch to cloud or whatnot. I don't know.
05:40Like, that's what I've been trying to do a little bit. And I feel nervous when I have subscription left over. That just means I haven't maximized my token throughput.
05:46So I actually kind of experienced this when I was a PhD student. You would feel nervous when your GPUs are not running. Mhmm.
05:51Like, you have GPU capability and you're not maximizing the available FLOPs to you. But now it's not about FLOPs. It's about tokens.
05:57So what is your token throughput, and what token throughput do you command? I would actually argue that it's very interesting that we had, you know, at least ten years where in many engineering tasks, people just did they didn't feel compute bound.
06:11Mhmm. Right? And, like, the entire industry feels that now.
06:15They feel like they they they felt resource bound. Mhmm. And now that you have this big capability jump, you're like, oh, actually, it's not, you know, my ability to access the compute anymore.
06:26Like, I'm I'm the binding constraint. Yeah. It's a skill issue.
06:28Yeah. Which is very empowering because,
06:31yeah, because you could be getting better. So that's why that's why I think it's very addictive because there's unlocks when you when you get better. Where do you think it goes?
06:37Like, if you just think about, like, okay. You know,
06:40Andre's iterating and everybody else is for sixteen hours a day, getting better at using coding agents. Like, what does it look like in a year of, like, you've reached mastery?
06:48Yeah. What does mastery look like, right, at the end of the year or, like, two, three years, five years, ten years, etcetera? Well, I think everyone is basically interested in, like, going up the stack.
06:58So I would say, yeah, it's not about a single session with your agent, multiple agents, how do they collaborate, and teams, and so on. So everyone's trying to figure out what that looks like. And then I would say claw is also kind of an interesting direction because it really When I say a claw, I mean this layer that kind of takes persistence to a whole new level.
07:14Like, it's something that keeps looping. It's not something that you are interactively in the middle of.
07:20It kind of like has its own little sandbox, its own little You know, it kinda like does stuff on your behalf even if you're not looking kind of thing.
07:27And then also has maybe more sophisticated memory systems, etcetera, that are not yet implemented in agents. So OpenCLaw has a lot more sophisticated memory, I would say, than what you would get by default, which is just memory compaction when your context runs out.
07:39Right? You think that's the piece that resonated for more users versus, like, perhaps, like, broader tool access? For OpenCLOA?
07:46Yeah. There there's, like I think there's at least five things that resonated with users. Here.
07:50Yeah. Good job, Peter. I mean, Peter has done a really amazing job.
07:53I saw him recently, and I talked to him about it, and I he's very humble about it, but I think he innovated simultaneously in, five different ways and put it all together. So for example, like, Soul and Dee document.
08:04Like, he actually really crafted a personality that is kind of compelling and interesting. And I feel like a lot of the current agents, they don't get this correctly. I actually think Claude has a pretty good personality.
08:12It feels like a teammate. Mhmm. And it's excited with you, etcetera.
08:16I would say, for example, Codex is a lot more dry Mhmm. Which is kind of interesting because in Chachi PT Codex is, like, a lot more upbeat and highly sycophantic. But I would say Codex, the coding agent, is very dry.
08:26It doesn't it doesn't seem to care about what you're creating. Mhmm. It's kinda like, oh, I implemented it.
08:30It's like, okay, but do you understand what we're building? Mhmm. It's true.
08:34You know, it doesn't it and the other thing I would say is, for example, with Claude, I think they dialed the psychophanty fairly well, where when Claude gives me praise, I do feel like I slightly deserve it. Mhmm. Because sometimes I kinda give it, like, not very well formed thoughts, and I give it an idea that I don't think is fully baked, and it doesn't actually react very strongly.
08:51It's like, oh, yeah, we can implement that. But when it's a really good idea by my own account, it does seem to reward it a bit more. And so I kinda feel like I'm trying to, like, earn its praise, which is really weird.
09:01Mhmm. And so I do think the personality matters a lot, and I think a lot of the other tools maybe don't appreciate it as much. And I think in this aspect also, Peter really cares about this, and so that was correct.
09:10And then the memory system, and then just, you know, he's just having fun with this. And then the the single WhatsApp portal to all of the automation. Yeah.
09:18Is there something that you have done personally
09:21with your claws
09:22beyond software engineering that you think is fun or interesting? Yeah. So in January, I had a claw.
09:27I went through a period of claw psychosis. So I built I have a claw basically that takes care of my home, and I call him Dobby the Elf Claw. And, basically, I used the agents to find all of the smart home subsystems of my home on the local area network, which I was kinda surprised that worked out of the box.
09:45Like, I just told her that I think I have Sonos at home. Like, can you try to find it? And it goes and it did, like, IP scan of all the, basically, computers on the local area network, and it found the Sonos thing, the Sonos system.
09:58And it turned out that there's no password protection or anything like that. Just logged in, and it's like, oh, yeah. You have these Sonos systems installed.
10:03I let me try to reverse engineer how it's working. It does some web searches, and it finds, like, okay. These are the API endpoints.
10:09And then it's like, do you wanna try it? And I'm like, woah. Like, you just did that.
10:12I'm like, yeah. Can you try to play something in the study? And it does, and music comes out.
10:16And I'm like, I can't believe I just That's crazy. That's like three prompts. Yeah.
10:19I can't believe I just typed in, like, can you find my Sonos? And that suddenly it's playing music. Mhmm.
10:23And it did the same for lights. And so, basically, like, it kinda hacked in, figured out the whole thing, created APIs, created dashboard so I could see the command kinda center of, like, all of my lights in the home. And then it was like switching lights on and off, you know, so I can ask it like, Dolby at sleepy time.
10:38And when it's sleepy time, that just means all the lights go off, etcetera, and so on. So it controls all of my lights, my HVAC, my shades, the pool, and the spa, and also my security system.
10:48So I have a camera pointed outside of the house. And anytime someone rolls in, I have a Quinn a Quinn model that looks at the videos.
10:56So first of all, there's change detection. Right. And then based on change detection, it goes to Quinn.
11:00And then it actually tells me it sends me a text to my WhatsApp. It shows an image from the outside, and it says, hey, FedEx truck just pulled up. FedEx truck just pulled up, and you might wanna check it, and you got new mail, or something like that.
11:12And Dolby just texted me this. This is really incredible. So so Dolby is in charge of the house.
11:18I text through with it through WhatsApp, and it's been, like, really fun to have these macro actions that maintain my house. I haven't, like, really pushed it, like, way more beyond that, and I think people are doing a lot more crazy things with it.
11:30But for me, even just a home automation setup, I used to use six apps. Yeah. Completely different apps.
11:34And I don't have to use these apps anymore. Like, Dolby controls everything in natural language. It's amazing.
11:39And so I think, like, I haven't even pushed a paradigm fully, but already that is so helpful and so inspiring, I would say. You think that's indicative of what people want from a user experience perspective with software? Right?
11:50Because I I don't think you know, it's pretty ignored that it takes humans effort to learn new software, like new UI.
11:57Yeah. I think to some extent, that's right.
12:00It's like working backwards from how people think an AI should be. Because what people have in their mind of what an AI is is not actually what an LLM is by in the raw sense. Like, LLM is a token generator.
12:11Know, like more tokens come out. But what they think of is like this persona identity that they can tell stuff, and it remembers it.
12:18You know? And it's just kinda an entity behind the WhatsApp. It's like a lot more understandable.
12:22Mhmm. So I think, to some extent, it's like matching the expectations that humans already have for what an AI should behave. But under the hood, there's like a lot of technical details go into that.
12:30And LLMs are too raw of a primitive to actually
12:34type check as AI, I think, for most people, if that makes sense. Yeah. I think that's like how we understand what the AI is and like the description of it as Dobby or some person.
12:45It obviously resonates with people. I also think that it it the unification that you did across your six different software systems for your home automation speaks to a different question of, like, do people really want all of the software that we have today?
12:59Yeah. Right? Because I I would argue, like, well, you have the hardware Yeah.
13:03But you've now thrown away the software or the the UX layer of Do
13:08you think that's what people want? Yeah. I think there's this, like, there's this sense that these apps that are in the App Store for using these smart home devices, etcetera, these shouldn't even exist kind of in a certain sense.
13:18Like, shouldn't it just be APIs, and shouldn't agents be just using it directly? And wouldn't it like, I can do all kinds of home automation stuff that any individual app will not be able to do.
13:29Right? And then LLM can actually drive the tools and call all the right tools and do do pretty complicated things. And so in a certain sense, it does point to this like, maybe there's like an overproduction of lots of custom bespoke apps that shouldn't exist because agents kinda, like, crumble them up, and everything should be a lot more just like exposed API endpoints, and agents are the glue of the intelligence that actually, like, tool calls all the all the parts.
13:54Another example is, like, my treadmill. There's an app for my treadmill, and I wanted to, like, keep track of how often I do my cardio. But, like, I don't want to, like, log into a web UI and go through a flow and etcetera.
14:04Like, all this should just be, like, make APIs available. And this is kind of, you know, going towards the agentic sort of web or agent first tools and all this kind of stuff.
14:13So I think the industry just has to reconfigure in so many ways that it's like, the customer is not the human anymore, it's like agents who are acting on behalf of humans, and this refactoring will be will probably be substantial in a certain sense. One way that people sometimes push back on this is like, we expect people to bytecode some of these tools?
14:30Do we expect normal people to do this kind of stuff that I described? Mhmm. But I think to some extent, this is just technology as it exists today.
14:37And right now, there is some bytecoding, and I'm actually watching it, and I'm working with the system. But I kinda feel like this kind of stuff that I just talked about, this should be free, like, in a year or two or three. There's no bytecoding involved.
14:48This is trivial. This is table stakes. This is like any AI, even the open source models, etcetera, can, like, do this.
14:53You should be able to translate from a
14:56less technical human's intent very easily to this outcome. Yeah.
15:00Today, it's web coding and it's involved, and not many people are gonna do it. But And you still have to make some design decisions. Right?
15:04We were talking about, like, can take frames, for example. Yeah. Yep.
15:08But I kinda feel like this will just start to the barrier will just come down, and it's just ephemeral software on your behalf, and some kind of, like, Claw is handling all the details for you, but you're not involved. Claw has a Claw has a machine, and it will figure it out.
15:23And it's just presenting you UIs, and you're, like, saying stuff. You know? Mhmm.
15:26Why haven't you,
15:28I guess, like, pushed the boundaries of what you can do personally with Claws? Like, is it, you know, you're focusing on more important projects,
15:36auto research, etcetera, or you're climbing the hill to mastery or something else. Right? Yeah.
15:41I just feel like I'm so distracted by everything, so I spend I spend like a week on the claw stuff, and I I have more to dos almost.
15:49But I will say that It's like gems and tools. We're all just busier, unfortunately. Yeah.
15:53I didn't really take advantage of a lot of email and calendar and all this other stuff, and I didn't give it access because I'm still a little bit suspicious and still very new and rough around the edges. So I didn't wanna give it, like, full access to my digital life yet, and part of it is just less security, privacy, and just being very cautious in that in that realm.
16:11And so some of it is, like, held back by that, I would say. Yeah. Maybe that's, like, the dominant dominant feature, but some of it is also just I feel so distracted because I feel like I had a week of claw, and then other stuff is happening.
16:21And What was the I I mean, you've talked about, like, being able to
16:26train or at least optimize a model as a task you wanted to see agents do for a long time? What was the motivation behind auto research?
16:33Auto research, yeah. So I think I
16:36had a tweet earlier where I kind of said something along the lines of to get the most out of the tools that have become available now, you have to remove yourself as bottleneck. You can't be there to prompt the next thing.
16:47You need to take yourself outside. You have to arrange things such that they're completely autonomous. And the more How can you maximize your token throughput and not be in the loop?
16:56This is the goal. And so I kind of mentioned that the the name of the game now is to increase your leverage. I put in just very few tokens just once in a while, and a huge amount of stuff happens on my behalf.
17:06And so AutoResearch, like, I tweeted that, and I think people liked it and whatnot, but they haven't maybe worked through the implications of that.
17:13And for me, auto research is an example of an implication of that, where it's like, I don't wanna be the researcher in the loop looking at results, etcetera. I'm holding the system back. So the question is, how do I refactor all the abstractions so that I'm not have to arrange it once and hit go?
17:28The name of the game is how can you get more agents running for longer periods of time without your involvement, doing stuff on your behalf? And auto research is just, yeah, here's an objective, here's a metric, here's your boundaries of what you can and cannot do, and go.
17:42Yeah, You were surprised at its effectiveness.
17:44Yeah. I didn't expect it to work, because so I have the Project Data Chat. And fundamentally, I think a lot of people are very confused with my obsession for training GPT-two models and so on.
17:54But for me, training GPT models and so on is just a little harness, a little playground for training LLMs. And fundamentally, what I'm more interested in is, like, this idea of recursive self improvement and to what extent you can actually have LLMs improving LLMs. Because I think all the frontier labs, this is, like, the thing Mhmm.
18:08For obvious reasons. And they're all trying to recursively self improve, roughly speaking. And so for me, this is kinda like a little playpen off that.
18:17And I guess I'd like tuned Nemchat already quite a bit by hand in a good old fashioned way that I'm used to. Like, I'm a researcher. I've done this for, like, you know, two decades.
18:23I have some amount of like, what is the opposite Yeah.
18:28Earned confidence.
18:29Okay. I have, like, two decades of, like, oh, I've trained this model, like, thousands of times of, like so I've done a bunch of experiments.
18:36I've done hyperparameter tuning. I've done all the things I'm very used to, and I've done for two decades. Yeah.
18:39And I've gotten to a certain point, and I thought it was, like, fairly well tuned. And then I let auto research go for, like, overnight, and it came back with, like, tunings that I didn't see. Mhmm.
18:48And, I did forget, like, the weight decay on the value embeddings, and my Adam betas were not sufficiently tuned, and these things jointly interact. So, like, once you tune one thing, the other things have to potentially change too. You know, I shouldn't be a bottleneck.
19:00I shouldn't be running these hyperparameters to optimizations. I shouldn't be looking at the results. There's objective criteria in this case, so you just let you have to arrange it so that it can just go forever.
19:08So that's a single sort of version of auto research, like a single loop trying to improve. And I was surprised that it found these things that I The repo was already fairly well tuned and still found something.
19:19And that's just a single it's a single loop. Like, these Frontier Labs, they have GPU clusters of tens of thousands of them.
19:26And so it's very easy to imagine how you would basically get a lot of this automation on smaller models. And fundamentally, everything around, like, frontier level intelligence is about extrapolation and scaling loss.
19:37And so you basically do a ton of the exploration on the smaller models, and then you try to
19:42extrapolate out. So you're saying our research efforts are gonna get more efficient? Like, we're gonna have better direction for when we scale as well if we can do this experimentation better.
19:50Yeah. I would say that, like, the most interesting project and probably what the Frontier Labs are working on is,
19:55you know, you experiment on the smaller models. You try to make it as autonomous as possible. Remove researchers from the loop.
20:01They have way too much what is the what is the opposite? Earned confidence. Yeah.
20:05They don't know. They shouldn't be touching any of this, really. And so you have to rewrite the whole thing, because right now, I mean, certainly they can contribute ideas.
20:12But, okay, they shouldn't actually be enacting those ideas. There's a queue of ideas, and there's maybe an automated scientist that comes up with ideas based on all the archive papers and GitHub repos, and it funnels ideas in, or researchers can contribute ideas, but it's a single queue, and there's workers that pull items, and they try them out.
20:29And whatever works just gets put on the feature branch, and maybe some people monitor the feature branch and merge to the main branch sometimes. So just removing humans from all the processes and automating as much as possible and getting high tokens per second throughputs.
20:46And it does require rethinking of all the abstractions, and everything has to be reshuffled.
20:52So, yeah, think it's very exciting. If we take one more recursive step here, when is the model gonna write a better ProgramMD than you?
21:00Yeah. So ProgramMD is not We're on the loop. Yeah.
21:04Exactly. Yeah. So ProgramMD is my crappy attempt at describing how the auto researcher should work.
21:11Like, oh, do this, then do that, and that, and then try these kinds of ideas. And then here's maybe some ideas, like, at architecture, look at optimizer, etcetera. Mhmm.
21:18But I just came up with this in Markdown. Right? Mhmm.
21:21And so yeah, exactly. You want some kind of an auto research loop maybe that looks for you can imagine that different program.
21:30Nds would give you different progress.
21:34So basically, every research organization is described by ProgramMD. Yeah. A research organization is a set of markdown files that describe all the roles and how the whole thing connects.
21:44And you can imagine having a better research organization. So maybe they do fewer stand ups in the morning because they're useless. And this is all just code.
21:50Right? And so one organization can have fewer stand ups. One organization can have more.
21:56One organization can be very risk taking. One organization can be less. As you can definitely imagine that you have multiple research orgs, and then they all have code.
22:04And once you have code, then you can imagine tuning the code. So a 100%, there's like the meta layer of it.
22:09Did you see my text about my contest idea? My contest idea was, like, let people write different program MDs.
22:18Right? And and so for same hardware,
22:20where do you get most improvement? Oh, I see. And then you can take all that data and then give it to the model and say, write a better program MD.
22:26Yes. Yes. Yeah.
22:28Exactly. We're gonna get something better. Like, there's no way we don't.
22:30Right? You could a 100% look at where the improvements came from and, like, can I change the ProgramMD such that more of these kinds of things would be done? Or, like, things that didn't work This meta optimization.
22:41Yeah. You can 100% imagine doing that. So I think this is a great idea.
22:44But it's like, you know, I think you could sort of go one step at a time where you sort of have one process and then second process, and then the next process, and these are all layers of an onion. Like, LLM sort of part is now taken for granted. The agent part is now taken for granted.
22:57Now the claw like entities are taken for granted, and now you can have multiple of them, and now you can have instructions to them, and now you can have optimization of the instructions. And it's just like, it's a little too much. You know?
23:06But I mean, this is why it gets to the psychosis is that this is, like, infinite, and everything is a skill issue. And that's why I feel like, yeah, that's just coming back to this is why it's so insane. Okay.
23:16Well, if we're we're just trying to, like, diagnose the current moment and
23:20what is a relevant skill right now, what do you, like, what do you think is the implication that this that this is the loop we should be trying to achieve in different areas and that it works? Right?
23:30Like, you know, remove create the metric or create the ability for agents to continue working on it without you.
23:37Yeah. Do we still have performance engineering? Like, what Yeah.
23:41I mean, so there's a few caveats that I would put on top of the LM psychosis. Number one, this is extremely well suited to anything that has objective metrics that are easy to evaluate.
23:49Mhmm. So for example, like writing kernels for more efficient CUDA code for various parts of a model, etcetera, are the perfect fit.
23:56Mhmm. Because you have inefficient code, and then you want efficient code that has the exact same behavior, but it's much faster.
24:02Perfect fit. So a lot of things are perfect fit for auto research, but many things will not be. And so it's just if you can't evaluate it, then you can't auto research it, right?
24:12So that's like caveat number one. And then maybe caveat number two I would say is, you know, we're kinda talking about next steps, and we kinda see what the next steps are, but fundamentally, the whole thing still doesn't It's still kinda like bursting at the seams a little bit, and there's cracks, and it doesn't fully work.
24:24And if you kinda try to go too far ahead, the whole thing is actually net not useful, if that makes sense. Because these models still are not You know, they've improved a lot, but they're still rough around the edges is maybe the way I would describe it.
24:38I simultaneously feel like I'm talking to an extremely brilliant PhD student who's been like a systems programmer for their entire life, and a 10 year old. And it's so weird because humans, I feel like they're a lot more coupled, like you have, you know,
24:52everything in world. You wouldn't encounter that combination. This jaggedness
24:55is really strange, and humans have a lot less of that kind of jaggedness, although they definitely have some. But humans have a lot more jaggedness. Sorry, the agents have a lot more jaggedness where sometimes, like, you know, I ask for functionality and it like comes back with something that's just like totally wrong, and then we get into loops that are totally wrong.
25:12And then I'm just I get so frustrated with the agents all the time still. Because you feel the power of it, but you also there's still, like, it does nonsensical things once in a while for me still as well.
25:23I get very annoyed when
25:26I feel like the agent wasted a lot of compute Uh-huh. On something it should have recognized was an obvious problem. Yeah.
25:33Think I some of the bigger things is, like, maybe what's underneath it, if I could hypothesize, is fundamentally, these models are trained via reinforcement learning. So they're actually struggling with the exact same thing we just talked about, which is the labs can improve the models in anything that is verifiable, but that has rewards.
25:48So did you write the program correctly, and does it do the unit test checkout? Yes or no. But some of the things where they're struggling is, example, I think they have a tough time with nuance of maybe what I had in mind or what I intended and when to ask clarifying questions.
26:03Like, what I yeah. It's just anything that feels softer is, like, worse.
26:08And so you're kind of like you're either on rails and you're part of the superintelligence circuits, or you're not on rails and you're outside of the verifiable domains, and suddenly everything kinda just, like, meanders. Like, maybe another way to put it is if you go to if today, if you go to, like, state of the art model, ChatGPT, and you ask it, tell me a joke, do you know what joke you're gonna get?
26:27There's the joke.
26:28The joke? I do feel I I I can't tell you, like, the, you know, standard form of it, but I do feel like ChatGPT has, like, three jokes. Yeah.
26:34Yeah. So the the joke that apparently all the OMs like love the most is why do scientists
26:39not trust atoms? Okay. Because they make everything up.
26:43Okay. They make everything up. So this is still Why did that emerge?
26:47So this is the joke you would get three or four years ago, and this is the joke you still get today. Okay. So even though the models have improved tremendously Yep.
26:55And if you give them an agentic task, they will just go for hours and move mountains for you. Mhmm. And then you ask for, like, a joke, and it has a stupid joke, a crappy joke from five years ago.
27:05Mhmm. And it's because it's outside of the it's outside of the RL. Mhmm.
27:08It's outside of the reinforcement learning. It's outside of what's being improved. It's like, and it's part of the jaggedness of, like, shouldn't you expect models as they get better to also have, like, better jokes or more diversity of them?
27:18Or it's just it's not being optimized, and it's stuck.
27:22Do you think that that implies that we are not seeing, like, generalization in the sense of, like, broader intelligence of joke smartness being attached to code smartness?
27:35Yeah. I think there's some decoupling where some things are verifiable and some things are not, and some things are optimized for arbitrarily by the labs depending on what data went in, and some things are not. And
27:45and But I I mean, the the premise, there's a, you know, premise from some research groups that if you are smarter at code generation or in these verifiable
27:55fields, you should be better everything. Yeah. And, like, the the the joke situation suggests that that's not happening in all of that's happening.
28:01Okay. Yeah. I don't think that's happening.
28:03I think I think maybe we're seeing, like, a little bit of that, but not, like, a satisfying amount. Yeah. Exists in humans.
28:09You can be very, good at that and still tell a really bad joke. Yeah. That's true.
28:14Yeah. But it still means that we're not getting the story is that we're getting a lot of the intelligence and capabilities in all the domains of society for free as we get better and better models.
28:24And it's not exactly fundamentally what's going on, and there's some blind spots, sometimes some things are not being optimized for, and this is all clustered up in these neural net opaque models. So you're either on rails of what it was trained for and everything is like you're going at speed of light or you're not.
28:39And so it's jaggedness. So that's why I think even though the progression is obvious, what should happen, you can't let it fully go there yet because it doesn't fully work, or it's a skill issue and we just haven't, like, figured out how to use it.
28:54So, you know, it's hard to tell. Can I ask kind of a blasphemous question, which is, like, if this jaggedness is persisting
29:01and it's all rolled up in a at least, monolithic interface? Right?
29:06But, you know, single model. Mhmm. Does that make sense, or do do you should should it be unbundled into things that are can be optimized and improved against different
29:15domains of intelligence? Like, unbundling the models into multiple experts in different areas, etcetera? More directly.
29:21Yeah.
29:22Instead of just MOE that we have no exposure to. Because that can be, like, confusing as a user from the outside Uh-huh. Which is like, why is it so good at this but not at this other thing?
29:31Yeah. I think currently, my impression is the labs are trying to have a single sort of, like, monoculture of a model that is arbitrarily
29:38intelligent in all these different domains. Mhmm. And they just stuff into the parameters.
29:42I do think we should expect more speciation in the intelligences.
29:49Like, the animal kingdom is extremely diverse in the brains that exist, and there's lots of different niches of nature, and some animals have overdeveloped visual cortex or other kind of parts. And I think we we should be able to see more speciation.
30:03And you don't need like this oracle that knows everything. You kinda speciate it, and then you put it on a specific task. And we should be seeing some of that because should you be able to have, like, much smaller models that still have the cognitive core.
30:12Like, they're still competent, but then they specialize, and then and then they they can become more efficient in terms of latency or throughput on specific tasks that you really care about.
30:22Like, if you're a mathematician working in Lean, saw, I for example, there's a few releases that really like target that as a domain. So there's probably gonna be a few examples like that where the unbundling kind of makes sense. One question I have is whether or not
30:35the capacity constraint on available compute infrastructure Mhmm. Drives more of this because efficiency Yeah.
30:42Actually matters more. Yeah. Yeah.
30:44Like, you you're if you financing aside, no financing is involved in all of this.
30:50If you have access to full compute for anything you do, like, leaving one single model. Right? But if you actually feel pressure where you're like, I can't serve Mhmm.
31:00A model of massive size for every use case Mhmm. Like, do you think that leads to any speciation? Does that question make sense to you?
31:07The question makes sense. And I guess, like, what I'm what I'm what I what I'm struggling with is I don't think we've seen too much speciation just yet. Right?
31:14No. We're seeing a monoculture of models. Yeah.
31:16So And there's, like, clearly pressure for, like, make a good code model, put it back in the main merge again. Yeah. Yeah.
31:22Yeah.
31:25Even though there already is pressure on the models. Mhmm. I guess perhaps I I feel like there's a lot of very short term supply crunch.
31:32Uh-huh. And, like, maybe that causes more speciation now.
31:35Yeah. I think fundamentally, like, the the the the labs are serving a model, and they don't really know what the end user is going to be asking about. So maybe that's like some part of it because they kinda have to multitask over all the possible things that could be asked.
31:47But I think if you're coming to a business and maybe partnering on some specific problems you care about, then maybe you would see that there. Or there would be some very high value applications that are, like, more niche. But I think right now, they're kinda like going after the totality of what's available.
32:02I don't think that the science of manipulating the brains is, like, fully developed yet, partly. What do you mean manipulating? So, like, so fine tuning without losing capabilities, as an example.
32:12And we don't have these primitives for actually working with the intelligences in ways other than just context windows. Context windows kinda just work, and it's very cheap to manipulate, etcetera. And this is how we're getting some of the customization, etcetera.
32:23But I think if it was, I think it's a it's a bit more of a developing science of how you, like, more deeply adjust the models, how you have continual learning maybe, or how you how you fine tune in a certain area, how you get better in a certain area, or, like, how you actually touch the weights, not just the context windows.
32:38And so it's a lot more tricky, I would say, to touch the weights than just the context windows because you're actually fundamentally changing the full model and potentially its intelligence. And so so maybe it's just like not a fully developed size, if that makes sense, of speciation.
32:52And it also has to be like cheap enough Yeah. For that speciation to be worthwhile Yeah. In these given Yeah.
32:57Context. Can I ask a question about, like, an extension to auto research that you described in terms of open ground?
33:04You say, okay. Well, you know, we have this thing. We need more collaboration surface around it essentially for people to contribute
33:13to research overall. Can you talk about that? Yeah.
33:15So we talked about our research has a single thread of, like, I'm gonna try stuff in loop. Mhmm. But fundamentally, the parallelization of this is, like, the interesting component.
33:23And I guess I was trying to, like, play around with a few ideas, but I don't have anything that, like, clicks as simply as, I don't have something that I'm, like, super happy with just yet, but it's something I'm, like, working on the side when I'm not working on my claw. So I think, like, one issue is if you have a bunch of nodes of paralyzation available to you, then it's very easy to just have multiple auto researchers talking through a common system or something like that.
33:47What I was more interested in is how you can have an untrusted pool of workers out there on the Internet. So for example, in auto research, you're just trying to find the piece of code that trains a model to a very low validation loss.
34:00If anyone gives you a candidate commit, it's very easy to verify that that commit is correct, is good. Like, they someone could claim from the Internet that this piece of code will optimize much better and give you much better performance.
34:11You could just check. Yeah. Crazy.
34:12But probably a lot of work goes into that checking. But fundamentally, they could lie and etcetera. So you're basically dealing with a similar kind of it's almost actually, like, looks a little bit my my designs that incorporate an untrusted pool of workers actually look a little bit more like a blockchain a little bit, because instead of blocks, you have commits.
34:31And these commits can build on each other, and they contain changes to the code as you're improving it. And the proof of work is basically doing tons of experimentation to find the commits that work. And that's hard.
34:43And then the reward is just being on the leaderboard right now. There's no monetary reward whatsoever. But I don't wanna push the analogy too far, but it fundamentally has this issue where you a huge amount of search goes into it, but it's very cheap to verify that a candidate solution is indeed good because you can just train a single you know, someone had to try 10,000 ideas, but you just have to check that the thing that they produced actually works.
35:03Mhmm. Because the 99,000 of them didn't work. You know?
35:06And so basically, long story short, it's like you have to come up with a system where an untrusted pool of workers can collaborate with a trusted pool of workers that do the verification, and the whole thing is kinda like asynchronous and works and and so on.
35:23And it's it's like safe from a security perspective because if anyone sends you arbitrary code and you're gonna run it, that's very sketchy and dodgy. So but fundamentally, it should be totally possible.
35:32So you're familiar with projects like settee at home and folding at home. All of these problems have a similar kind of setup. So folding at home, you're folding a protein, and it's very hard to find a configuration that is low energy.
35:42But if someone finds a configuration that they value it to be low energy, that's perfect. You can just use it. You can easily verify it.
35:47So a lot of things have this property that, you know, very expensive to come up with, but very cheap to verify. And so in all those cases, things like folding at home or Setti at home or auto research at home will be good fits. And so long story short, a swarm of agents on the Internet could collaborate to improve LLMs and could potentially even run circles around Frontier Labs.
36:08Like, who knows? You know? Yeah.
36:11Like, maybe that's even possible. Frontier Labs have a huge amount of trusted compute, but the Earth is much bigger and has a huge amount of untrusted compute.
36:18But if you put systems in check, systems in place that, you know, deal with this, then maybe it is possible that the swarm out there could could come up with better better solutions. And people kind of like contribute cycles to to a thing that they care about.
36:34And so, sorry, so the last thought is lots of companies or whatnot, they could maybe have their own things that they care about. And you, if you have compute capacity, you could contribute to different kind of AutoResearch tracks.
36:44Like, maybe you care about certain you care about cancer or something like that of a certain type. You don't have to just donate money to an institution.
36:51You actually could purchase compute, and then you could join the AutoResearch swarm for that project. You know?
36:57So if everything is rebundled into Auto Researchers, then compute becomes the thing that you're contributing to the pool. Yeah. That's very inspiring, and it's also interesting.
37:05Like, I don't I don't know how far this goes, but it is interesting that at least some audience of people,
37:11you know, here in Silicon Valley or lining up at, you know, retail stores in China have discovered that, like, having access to personal compute is interesting again. Yeah. Right?
37:21So maybe they're really motivated to do that for their claws, and then they can Yeah. Contribute to auto research. It's almost like dollars the thing everyone cares about, but is FLOP the thing that actually everyone cares about in the future?
37:31Like, is there gonna be, like, a flipping thing almost of, like, what's the thing that you care about? Like, right now, for example, it's really hard to get compute even if you have money. Yeah.
37:38So, actually, it almost seems like the flop is, like, dominant in a certain sense. Yeah.
37:44So so maybe that's kinda like kinda like that. Like, how much how many flops do you control instead of, like, what wealth do you control? I don't actually think that's true, but it's kind of interesting to think about.
37:53The last thing you released was, like, a little bit of jobs data analysis.
37:57Yeah. Is that right? And might have touched a nerve even though you're just, like, visualizing some public data.
38:03Yeah. What was you know, what were you curious about? Yeah.
38:06I guess I was curious to
38:08I mean, everyone is like real it's everyone is really thinking about the impacts of AI on the job market and what it's gonna look like. So I was just interested to take a look, like, what does the job market look like? Where are the different roles?
38:19And how many people are in different professions? And I was, like, really just interested to, like, look through the individual cases and try to think myself about, like, you know, with these AIs and how they're likely to evolve, like, are these gonna be tools that people are using?
38:32Are these gonna be displacing tools for these professions? And, like, what are the current professions, and how are they gonna change? Are they gonna grow or adjust to a large extent?
38:42Or, like, what could be new professions? So it's really just like a way to fuel my own chain of thought about the industry, I suppose. Mhmm.
38:48And so, yeah, the jobs data basically is just a bureau of labor statistics. They actually have a percent outlook for each profession about how much it's expected to grow over the next, I think, almost decade.
39:00Yeah. I think it's a decade, but it was made in 2024. Mhmm.
39:02We need a lot of health care workers. Yeah. So so they've already made those projections, and I'm not sure actually a 100% what the methodology was that they put into their projections.
39:11I guess I was interested to color things by like, if people think that what's primarily being developed now is this kinda like more digital AI, that is kind of like almost like these ghosts or spirit entities that can interact in the digital world and manipulate a lot of digital information.
39:27And they currently don't really have a physical embodiment or presence. And the physical stuff is probably gonna go slightly slower because you're manipulating atoms. So flipping flipping bits and and the ability to copy paste digital information is like makes everything a million times faster than accelerating matter.
39:42You know? So so energetically, I just think we're gonna see a huge amount of activity in digital space, huge amount of rewriting, huge amount of activity boiling soup.
39:51And I think we're gonna see something that that in the digital space goes at the speed of light compared to, I think, what's gonna happen in the physical world to some extent, it would be the extrapolation. And so I think, like, there's currently kind of, like, I think, an overhang where there can be, like, a lot of unhobbling, almost potentially, of, like, a lot of digital information processing that used to be done by computers and people.
40:12And now with AI as, like, a third kind of manipulator of digital information, there's gonna be a lot of refactoring in those in those disciplines. But the physical world is actually gonna be, like, I think, behind that by some amount of time.
40:24And so I think what's really fascinating to me is like So that's why I was highlighting the the professionals that fundamentally manipulate digital information. This is work you could do from your home, etcetera.
40:33Because I feel like those will be like, things will change. And it doesn't mean that there's gonna be less of those jobs or more of those jobs because it that has to do with, like, demand elasticity and many other factors, but things will change in these professions because of these new tools and because of this upgrade to the nervous system of the human superorganism,
40:50if you wanna think about it that way. Given the look you had at the data, do you have either any observations or guidance for people facing the job market or thinking about what to study now or what skills to develop?
41:02I mean, we can all go get like, I'm very thankful that I have to, like, meet people for my job right now. Yeah. I'm getting more physical.
41:09Yeah. Could you do your work from home, though? I could.
41:12I think there are relationship parts of it that are hard, but most of it I could. Yeah. I think it's really hard to tell because, again, like, the job market is extremely diverse, and I think the answers will probably vary.
41:21But to a large extent, like, these tools are extremely new, extremely powerful, and so just being you know, just trying to keep up with it is, like, the first thing. And, yeah, because I think a lot of people kinda, like, dismiss it or Or they're afraid of it.
41:34Or they're afraid of it, etcetera, which is totally understandable, of course. Yeah. I think, like, it's fundamentally an empowering tool at the moment.
41:42And these jobs are bundles of tasks, and some of these tasks can go a lot faster. And so people should think of it as primarily a tool that it is right now. And I think the long term future of that is uncertain.
41:52Yeah. It's kinda really hard to forecast, to be honest. And, like, I'm not professionally, like, doing that, really.
41:56And I think it's a job of, like, economists to do properly.
41:59You are an engineer, though. And, like, one thing I thought was interesting is that, like, the demand for engineering jobs is continuing to increase.
42:08Yeah.
42:09I I can't tell if that's, like, temporary a phenomenon. I'm not sure how I feel about it yet. Do you know?
42:13Yeah. That's like the demand elasticity almost. Like, software was scarce.
42:17Right? And so the reason we don't have more demand for software is just there's scarcity and it's too expensive. Too expensive.
42:22Yeah. So if the barrier comes down, then actually you have the Jivin's paradox, which is, like, you know, you actually the demand for software actually goes up. It's cheaper, and there's more more More powerful.
42:30Yeah. The the classical example of this always is the ATMs and the bank tellers because there was a lot of, like, fear that ATMs and computers, basically, would displace tellers.
42:41But what happened is they made, like, the cost of operation of of a bank branch much cheaper, as there were more bank branches, so there were more tellers. It's like the canonical example people cite. But basically, it's just Jemin's paradox.
42:52Like, something becomes cheaper, so there's a lot of unlocked demand for it. So I do think that that's probably I do have a cautiously optimistic view of this in software engineering, where I do it does seem to me like the demand for software will be extremely large, and it's just become a lot cheaper.
43:09And so I do think that for quite some time, it's very hard to forecast.
43:16But it does seem to me like right now, at least locally, there's gonna be more demand for software. Because software is amazing. It's like, you know, digital information processing.
43:23You're not forced to use arbitrary tools that were given to you that are imperfect in various ways. You're not forced to subscribe to what exists. Code is now ephemeral, and it can change and it can be modified.
43:34And so I think there's gonna be a lot of activity in the digital space to rewire everything in a certain sense, and I think it's gonna create a lot of demand for this kind of stuff. I think long term, yeah, obviously, even with other research, OpenAI or Anthropic or these other labs, they're employing, what, like a thousand something researchers.
43:52Right? Mhmm. These researchers are basically like glorified, know?
43:56They're automating themselves away actively, and this is the thing they're all trying to do. Yeah.
44:02I think like, I went around Some of those researchers also fear the feel the psychosis. Right? Because they can it's working.
44:08Yeah. Right? And and so they're like, oh, it's over for me too.
44:11I did spend a bunch of time going around OpenAI, and I was like, you guys realize if we're successful, like, we're a lot of job. Like like, it's just we're just building automation for Sam or something like that. Like, I or the board.
44:21I'm not sure. But, like, they're just dealing with this automation for, yeah, the board or the CEO or something like that, and we're all out of our job and maybe contributing on the sides.
44:31And so, yeah, it's kind of like nerding from that perspective. Is it okay if I ask you Noam's question?
44:37You know, you could be doing that. Right? Auto researching with a lot of compute scale and a bunch of colleagues at one of the frontier labs.
44:44Like, why not? Well, I was there for a while. Right?
44:46Like Yes. And I did reenter. So to some extent, I agree, and I think that there are many ways to slice this question.
44:51It's a very loaded question, a bit. I will say that I feel very good about, like, what people can contribute and their impact outside of the Frontier Labs, obviously. Not in the industry, but also in more ecosystem level roles.
45:04So your role, for example, is more ecosystem level. My role currently is also kinda more on ecosystem level, and I feel very good about impact that people can have in those kinds of roles. I think conversely, there's there are definitely problems in my mind for basically aligning yourself way too much with the Frontier Labs too.
45:20So fundamentally, I mean, you're you have a huge financial incentive to with these Frontier Labs, and by your own admission, the the AIs are going to, like, really change humanity and society in very dramatic ways. And here you are, basically, like, building the technology and benefiting from it, like and being, like, very allied to it through financial means.
45:38Like, this was a conundrum that was in at the heart of how OpenAI was started in the beginning. This was the conundrum that we were trying to solve.
45:44And so it's kind of It's still not resolved.
45:50The conundrum is still not fully resolved. So that's number one. You're not a completely free agent, and you can't actually, like, be part of that conversation in a fully autonomous free way, like, you're inside one of the Frontier labs.
46:00Like, there are certain things that you can't say, and conversely, there are certain things that the organization wants you to say. And, you know, they're not gonna twist your arm, but you feel the pressure of, like, what you should be saying.
46:11You know? Right. Because, like, obviously.
46:13Otherwise, it's, really awkward conversations, strange side eyes.
46:18Like, what are you doing? You know? So you can't, like, really be an independent agent.
46:22And I I feel like a bit more aligned with humanity in a certain sense outside of a frontier lab because I'm not subject to those pressures almost, right, and I can't say whatever I want.
46:31Yeah. I would say in the frontier labs, you can have impact there, of course, as well.
46:37But there's many researchers, and maybe you're one of them, maybe your ideas are really good, etcetera. And maybe there's lot of decision making to to do, and you want to be in a position where you are in the room with those conversations when they come up. I do think that currently the stakes are, like, overall fairly low, and so everything is kinda, like, nice.
46:50But ultimately, at the end of the day, like, when the stakes are really high, etcetera, if you're an employee of an organization, I don't actually know how much sway you're going to have on your organization, what it's going to do. Like, fundamentally, at the end of the day, it's you're not, like, really in charge.
47:03You're in a room and you're contributing ideas, but you're not really in charge of that entity that you're you're part of. So those are some sources of misalignment, I think, to some extent. I will say that in one way, I do agree a lot with that sentiment, that I do feel like the labs, for better or worse, they're opaque, and a lot of work is there.
47:20And they're kind of like at the edge of capability in what's possible, and they're working on what's coming down the line. And I think if you're outside of that frontier lab, your judgment fundamentally will start to drift because you're not part of the, you know, what's coming down the line.
47:34Right. And so I feel like my judgment will inevitably start to drift as well. And I won't actually have an understanding of how these systems actually work under the hood.
47:40That's an opaque system. I won't have a good understanding of how it's going to develop and etcetera. And so I do think that in that sense, I agree and something I'm nervous about.
47:49I think it's worth basically base being in touch with what's actually happening and actually being in the frontier lab. And if if some of the frontier labs would have me come for, you know, some amount of time and do really good work for them, and then maybe coming Guys, out he's looking for a job. This is super exciting.
48:02Yeah. Then I think that's maybe a good setup, because I kinda feel like it's kind of, you know, maybe that's like one way Mhmm.
48:10To to actually be connected to what's actually happening, but also not feel like you're necessarily fully controlled by Yeah. By those entities. So I think, honestly, my mind, like, Noam can probably do extremely good work at EdoAI, but also I think his most impactful work could very well be outside of OpenAI.
48:26No. That's a call to be an independent researcher with auto research. Yeah.
48:30There's many things to do on the outside, and I think, ultimately, I think the ideal solution maybe is like, yeah, going back and forth.
48:39Yeah. And I think fundamentally, it can have a really amazing impact in both places. So very complicated.
48:43Don't know. Like, it's a very loaded question a little bit, but I mean, I joined the Frontier lab, and I'm outside. And then maybe in the future, I'll want to join again, and I think
48:53that's kinda like how I look at it. One question related to what visibility to does the world or the AI ecosystem have into the frontier is, like, how how close open source is to the frontier Mhmm.
49:06And how sustainable that is. I I think Yeah. I think it is quite surprising.
49:12The entire sequence of events actually from, like, having a handful of Chinese models Mhmm. And global models, and I think people are gonna continue releasing here in the near term that are closer than much of the industry anticipated from a capability perspective. Yeah.
49:26I don't know if you're surprised by that, but you're a long term contributor to open source. Like, what's your prediction here? Yeah.
49:31So roughly speaking, basically,
49:33yeah. The closed models are ahead, but, like, people are monitoring the number of months that sort of like open source models are behind. And to start with, there's nothing, and then it went to eighteen months.
49:41Yeah. They've been convergence. Right?
49:43So then maybe they're behind by, like, what is the latest? Maybe, like, eight months, six months, eight months kind of thing right now? Yeah.
49:48I'm a huge fan of open source, obviously. So for example, in operating systems, have, like, closed sort like, you know, Windows and Mac OS. These are large software projects, kinda like what LMs are gonna become, and there's Linux.
49:56But Linux is very easy. Like, actually, Linux is an extremely successful project. It runs on the vast majority of computers.
50:02Like, last time I checked, was it like 60% or something, like, run Linux? And that's because there is a need in the industry to have a common open platform that everyone feels sort of safe using. I would say, like, the industry has always felt a demand for that kind of a project to exist.
50:16Mhmm. And I think the same is true now, and that's why businesses actually want there's demand for this kind of a thing to exist.
50:22The big difference is that everything is capital. There's a lot of CapEx that goes into this. So I think that's where things fall apart a little bit and make it a bit harder to compete in some sense.
50:32I do think that the current models are very good. The other thing that I think is really interesting is that for the vast majority of consumer use cases and things like that, even open source models are actually quite good, I would say. And I think if you go forward more years, it does seem to me like a huge amount of simple use cases are gonna be well covered and actually even run locally.
50:52But there's gonna be always some demand for frontier intelligence, and that that can actually be extremely large piece of the pie. But it could be that the frontier the need for frontier intelligence is gonna be, like, you know, Nobel Prize kind of work.
51:05Mhmm. Or, like, let's move Linux from C to Rust. There's gonna be, like, bigger projects, you know, like, scoped in that kind of a way, and there's gonna be maybe more and maybe that's where a lot of the frontier closed intelligences are gonna be interacting with, and open source is kinda, like, gonna eat through a lot of the more basic use cases or something like that.
51:24You know, at some point, what is frontier today is gonna be, probably later this year, what's frontier today in terms of what I'm using right now from the closed labs might be open source, and that's gonna be doing a lot of work. So I kind of expect that this dynamic will actually basically continue. Like, we'll have frontier labs that have closed AIs that are kind of like these Oracles, then we'll have open source kind of behind by some amount of months, and I kind of expect that to continue.
51:46And I actually think that's a pretty good setup overall. Because I'm a little bit hesitant of having I don't actually think it's structurally I think there's some systemic risk attached to just having intelligences that are closed, and that's like, that's it.
52:00Mhmm. And I think that that's a you know, centralization has a very poor track record in my view in the past and has You mean, like, in political or economic systems in general? Yes.
52:12Exactly. I think there's like a lot of It's like an Eastern European. Yeah.
52:15A lot of it's pretty bad precedent. So I want there to be a thing that is maybe not at the edge of capability because it's new and unexplored, etcetera, but I want there to be a thing that's behind and that is kind of like a common working space for intelligences that the entire industry has access to. Yeah.
52:28That seems to me like a pretty decent power balance for the industry. Yeah. I also think there's just, like, there are many problems to solve.
52:33Right? Like, if you keep advancing intelligence
52:36from the frontier, we can do new things, and there are a lot of, like, very big problems for humanity. Yeah. Right?
52:41Yeah. And so, like, it seems that that will continue to be a very expensive game. And so I wanna, like, root for labs that are doing that because there are problems we cannot solve without continuing to advance the models in a very expensive way.
52:52Yeah. And yet, as you point out, like, if what we have today as Frontier is open, that's a lot of capability.
53:00Yeah. Right? And and so I I think, you know, the power of that or the democratization of that seems, like, very useful and also healthy.
53:06Yeah. I think, basically, by accident, we're actually, like, in an okay spot. In optimal.
53:10Yeah. Yeah. By accident, we we are it happened to be in a good spot in a certain sense.
53:14Mhmm. Well and and to some degree, the the longer this endures, like, this dynamic,
53:19the the the healthier of a spot, like, the ecosystem might be in. Yeah. Right?
53:24Because you have more and more area under the curve. And I will say that even on the close side, I almost feel like it's been, like, even further centralizing recently because I think a lot of the front runners are, like, not necessarily, like, the top tier. And so, yeah, like, in that sense, I think it's it's not super ideal.
53:38I would love there to be more more frontier to last because, yeah, I'm, like, by default, very suspicious of, like I want there to be more people in the room.
53:47I want I think, like, in machine learning, ensembles always outperform any individual model. And so I want there to be ensembles of people thinking about all the hardest problems, and I want there to be ensembles of people in a room when they to be all well informed and to make all those decisions.
54:00You know? So I don't want it to be like a closed doors with two people or three people. I feel like that's, like, not a good not a good feature.
54:06I almost wish like there were more labs, is long story short, and I I I do think that The virtual reality has a has a place to play. I hope it sticks around, and I, basically it's currently slightly behind, and it's actually kinda like a good thing. Okay.
54:18You worked on the precursor to generalized robotics, autonomy,
54:22cars. Right? A lot has happened in the last couple months with robotics companies as well, like acceleration of really impressive generalization of environment, of tasks, like increasing long horizon tasks, lots of money going into the space.
54:38Like, is it gonna happen? Has anything in your view changed recently? So, like, my view is kind of informed by what I saw in self And I do feel like self driving is the first robotics application.
54:47So probably what I saw is at the time, like ten years ago, there were a large number of startups. And I kinda feel like like most of them basically, like, didn't long term make it.
54:57And what I saw is that, like, a lot of capital expenditure had to go in and a lot of time. And so I think it's like I think robotics, because it's so difficult and so messy and requires huge amount of capital investment and a lot of, like, conviction, just it's like a big problem, and I think items are really hard.
55:13So I kinda feel like they will lag behind what's gonna happen in digital space. And in digital space, there's gonna be a huge amount of unhobbling, basically things like that weren't super efficient becoming a lot more efficient by, like, a factor of a 100 Mhmm.
55:25Because bits are so much easier. And so I think currently, in terms of what's gonna change and where the activity is, I kinda feel like digital space is going to change a huge amount, and then the physical space will lag behind.
55:38And what I find very interesting is this interface in between them as well. Because I think in this if we do have more agents acting on behalf of humans and more agents talking to each other and doing tasks and participating in this economy of agents, etcetera, you're gonna run out of things that you're gonna do purely in a digital space.
55:56At some point, you have to go to the universe and you have to ask it questions. You have to run an experiment and see what the universe tells you to get back to learn something. And so we currently have a huge amount of digital work because there's an overhang in how much we collectively thought about what already is digital.
56:12So we just didn't have enough thinking cycles among the humans to think about all the information that's already digital and already uploaded. And so we're gonna start running out of stuff that is actually, like, already uploaded.
56:23So you're gonna, at some point, read all the papers and process them and have some ideas about what to try, but, yeah, we're just gonna I don't actually know how much you can, like, get intelligence that's, like, fully closed off and with just information that's filled through it.
56:35You know? And so I think what's what's gonna happen is, first, there's gonna be a huge amount of unhabbling, and I think there's a huge amount of work there. Then, actually, it's going to move to, like, the interfaces between physical and digital.
56:43So I and that's, like, sensors of, like, seeing the world and actuators of, like, doing something to the world. Mhmm. So I think a lot of interesting companies will actually come from that interface of, like, can we feed the superintelligence in a certain sense data, and can we actually, like, take data out and manipulate the physical world per its bidding, if you wanna, like, anthropomorphize the whole thing.
57:03Right? And then the the physical world, actually, I almost feel like the the total addressable market, etcetera, in terms of, like, the amount of work and so on, is is massive, possibly even much larger, maybe what can happen in the digital space.
57:13So I actually think it's like a much bigger opportunity as well, but I do feel like it's a huge amount of work, and in my mind, the atoms are just like a million times harder.
57:25So it will lag behind, but it's also, I think, a little bit of a bigger market. So it's kinda like yeah.
57:30I think the opportunity is kind of, like, follow that kind of trajectory. So right now, this digital is, like, my main interest, and then interfaces would be, like, after that, And then maybe, like, some of the physical things.
57:41Like, their time will come, and they'll be huge when they do come. Well, it's it's an interesting framework for it too because certain things not the things I'm working on right now, but certain things are much easier even in the world of atoms. Mhmm.
57:51Right? Like, if you just think about, like, read and write to the physical world, like, read, sensors, cameras, like, there's a lot of existing hardware.
57:58And you can imagine, like, enriching agent capabilities or capturing a lot of new data if you're just clever about it.
58:05Mhmm. And, like, you don't necessarily have to invest a lot to, like, get something valuable. Yeah.
58:10Yeah.
58:11So, like, examples of this that I saw, for example, are, you know, a friend of mine, Liam, is running is the CEO of Periodic. I I visited them last week. Yeah.
58:19So it's just on top of mind. Like, they're trying to do auto research for material science. Mhmm.
58:24And so in that case, it's like the sensors to the intelligence are actually, like, pretty expensive lab equipment. Mhmm. And the same is true in biology.
58:30Think a lot of people are very interested in engineering biology, and, you know, the sensors will be more than just, like, video cameras, if that makes sense. And then the other thing I saw, for example, is companies that are trying to have like, you basically pay people for training data Yeah.
58:41As an example. Yeah. Programmatically.
58:43Yeah. To feed to feed the borg. And so, like, these are all examples of, like, sensors in a certain sense.
58:50So they take many diverse shapes and forms, if that makes sense. Mhmm. Yeah.
58:53So I'm looking forward to the point where I can ask for a task in the physical world Mhmm. And I can put a price on it and just tell the agent, like, you know, you figure out how to do it. Yeah.
59:01Go get the data. I'm actually kinda surprised we don't have enough, like, information markets. Mhmm.
59:05Like, if, for example, if poly market or other betting markets or even stocks, etcetera, if they have so much autonomous activity and rising amount of activity Mhmm. Like, why should like, for example, if Iran was just happening now, like, how come there isn't a process where, like, taking a photo or a video from somewhere in Tehran should cost, like, $10?
59:20Like, someone should be able to pay for that. You know? Like and that's an example of, like, feeding the intelligence.
59:25There's not gonna be a human looking at it. It's gonna be, like, agents who are trying to guess the betting games in stock markets and so on. So I kinda feel like the agentic web is still, like, fairly new, that there's no, like, mechanisms for this, but this is an example of what I I think might happen.
59:37There's a good book that maybe is inspiring called Daemon. Mhmm. You potentially read it.
59:43In Daemon, the intelligence ends up, like, puppeteering almost a little bit, like humanity in a certain sense. You know?
59:48And so humans are kind of like its actuators, but humans are also like its sensors. And so I think, like, collectively, like, society will kinda, like, reshape in a certain way in to to serve that kind of a that will kind of, like, end up happening collectively across the industry where, yeah, there's just a lot more automation and it has certain needs, and kinda humans will be serving those needs of that of that machine, not necessarily like to each other.
1:00:12But we were on this very specific point of,
1:00:15like, missing pieces of training data. We needed we needed something like auto research. Right?
1:00:19Like, we we need the training cycle or the SFT piece to be far more mechanized.
1:00:26Mhmm. For for what part? In order to make the
1:00:30collection like, in order to take the human out of the loop to ask for a task that has just, like, improved my model quality with new data.
1:00:37Right? Yes. Does that make sense to you?
1:00:41Like, we if you can't have the model do the training runs Mhmm.
1:00:46By itself Mhmm. Then your ability to do this is a, like, closed loop task Yes. With by pricing data Yeah.
1:00:54Is more challenged. Yes. Yes.
1:00:56100%. Yeah. But now we do.
1:00:58The thing is for LLM training, it actually is very easily it really fits the paradigm. Mhmm.
1:01:03So you'd actually expect Yeah. Clean metric. Yeah.
1:01:05LLM training actually fits the paradigm really well, really easily. Like, all the optimization of all the code, and so it runs faster. And then you also have metrics that you can optimize against.
1:01:14I do think that if you had an autonomous loop over those metrics, there's gonna be a lot of, like, good hearting going on where the system will, like, overfit to those metrics. And so but then you can use the system to devise more metrics, you just have really good coverage.
1:01:25So it's kinda hard to tell. But in a certain sense, it's like a pretty pretty good fit.
1:01:30I wanna talk about a little tiny side project you have before we end. Tell me about the micro GPT art.
1:01:36Oh, yeah. Okay. So micro GPT.
1:01:39So I have this running obsession of maybe a decade or two of just simplifying and boiling down the basically LLMs to their bare essence.
1:01:47And I've had a number of projects along these lines, so like NanoGPT and Make More and MicroGrad, etcetera.
1:01:55So I feel like MicroGPT is now the state of the art of me trying to just boil it down to just the essence. Because the thing is, like, training neural nets and LLMs specifically is a huge amount of code, but all of that code is actually complexity from efficiency.
1:02:09It's just because you need it to go fast. Mhmm. If you don't need it to go fast and you just care about the algorithm, then that algorithm actually is the 200 lines of Python.
1:02:16Very simple to read. And this includes comments and everything. Because you just have, like, your dataset, which is text, and you need your neural network architecture, which is, like, 50 lines.
1:02:25You need to do your forward pass, and then you have to do your backward pass to calculate the gradients. And so an old autograd engine to calculate the gradient is, like, a 100 lines. And then you need an optimizer, an Atom, for example, which is a very state of the art optimizer.
1:02:37It's like, again, 10 lines, really. And so putting everything together in the training loop is like, yeah, 200 lines. And what was interesting to me, like, normally before, like, maybe a year ago or more, if I had come up with MicroGPT, I would be tempted to basically explain to people.
1:02:52Like, I have a video, like, stepping through it or something like that. And I actually tried to make that video a little bit, and I tried to make, like, a little guide to it and so on. Mhmm.
1:03:01But I kinda realized that this is not really is not really adding too much because people because it's already so simple that it's 200 lines that anyone could ask their agent to explain it in various ways, and the agents like, I'm not explaining it to people anymore. I'm explaining it to agents.
1:03:14If you can explain it to agents, then agents can be the router, and they can actually target it to the human in their language with infinite patience and just at their capability and so on.
1:03:25Right. If I don't understand this particular function, I can ask the agent to explain it to me like three different ways.
1:03:31Yeah. And I'm not gonna get that from you. Okay.
1:03:33Exactly. Yeah. And so I kinda feel like, you know, what is education?
1:03:35Like, it used to be guides. It used to be lectures. It used to be this thing.
1:03:38But I feel like now more I'm explaining things to agents, and maybe I'm coming up with skills where so basically, skill is just a way to instruct the agent how to teach the thing.
1:03:49So maybe I could have a skill for microGPT of the progression I imagine the agent should take you through if you're interested in understanding the code And it's just like hints to the model to like, oh, first start off with this and then with that. And so I could just script the curriculum a little bit as a skill. So I don't feel like yeah.
1:04:06I feel like there's gonna be less of explaining things directly to people, and it's gonna be more of just like, does the agent get it? And if the agent gets it, they'll do the explanation. And we're not fully there yet because they I still can I still think I can probably explain things a little bit better than the agents, but I still feel like the models are improving so rapidly that I feel like it's a losing battle to some to some extent?
1:04:28And so I think education is gonna be kinda like reshuffled by this quite substantially, where it's the end of, like, teaching each other things almost a little bit. Like, if I have a library, for example, of code or something like that, it used to be that you have documentation for other people who are in my user library, but you shouldn't do that anymore.
1:04:45Like, you should have instead of HTML documents for humans, you have markdown documents for agents. Because if agents get it, then they can just explain all the different parts of it. So it's this redirection through agents, you know?
1:04:56And that's why so I think we're gonna see a lot more of that playing out. Well, we'll see if the great teachers know, like, to develop intuition for how to explain things to the agents differently. Ultimately, so for example, microGPT, like, I asked I tried to get an agent to write microGPT.
1:05:10Mhmm. So I told them, like, try to boil down the simplest things, like, try to boil down neural network stream to the simplest thing, and they can't do it.
1:05:17Like, microGPT is like my is it's like my end of my obsession. Mhmm.
1:05:22It's the 200 lies. I thought about this for a long time. I've attested about this for a long time.
1:05:26This is this is the solution. Trust me. It can't get simpler.
1:05:30And this is this is my value add. Everything else, like, agent gets it. It just can't come up with it, but it totally gets it and understands why it's done in a certain way, etcetera.
1:05:38So my contribution is kinda like these few bits, but everything else, in terms of the education that goes on after that, is not my domain anymore. So maybe, yeah, it's like education kinda changes in those ways where you kinda have to infuse the few bits that you feel strongly about the curriculum or the best the better way of explaining it or something like that.
1:05:57The things that agents can't do is your job now. The things that agents can do, they can probably do better than you or, like, very soon. And so you should
1:06:05be strategic about what you're actually spending time on. Oh, we appreciate the few cents. Thank you, Andre.
1:06:10Okay. Find us on Twitter at no priors pod. Subscribe to our YouTube channel if you wanna see our faces.
1:06:19Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way, get a new episode every week.
1:06:24And sign up for emails or find transcripts for every episode at no-priors.com.
The Hook

The bait, then the rug-pull.

Andrej Karpathy walked into December 2025 writing 80% of his own code. He walked out writing essentially none. What changed was not the models — it was the realization that he had become the bottleneck, and that everything about how he worked had to be rebuilt around that fact.

CTA Breakdown

How they asked for the click.

MENTIONED ON CAMERA
29:04productNanoGPT
39:46productFolding@home
1:02:00productMicroGPT
1:02:00productMicroGrad
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this