Modern Creator
Dream Labs AI · YouTube

This "Karpathy System" could 701x your AI Workflows

How to point Karpathy's open-source AutoResearch at your cold emails, ads, and website so AI runs experiments while you sleep.

Posted
yesterday
Duration
Format
Tutorial
educational
Views
3.8K
229 likes
Big Idea

The argument in one line.

AutoResearch turns any measurable business asset into a self-improving loop that runs hundreds of experiments overnight -- but only when the asset has an objective score, a feedback loop measured in hours, and something the AI can actually change.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You run cold email outreach, paid ads, or own a website and want to automate the testing you already know you should be doing.
  • You use Claude Code and want a concrete pattern for agentic loops that go beyond single-session tasks.
  • You have a metric that updates in hours -- open rates, load time, cost-per-click -- and are willing to let AI iterate on it overnight.
  • You heard about AutoResearch and want a business-framed walkthrough rather than a machine-learning deep dive.
SKIP IF…
  • You need qualitative feedback -- make this feel better has no objective score, so the loop cannot run.
  • Your key metrics update in weeks or months: SEO rankings, pricing changes, churn rates are all too slow.
  • You are not willing to give an AI agent write access to the asset being tested.
TL;DR

The full version, fast.

Karpathy open-sourced a three-file system originally built to self-improve AI models -- one file locks in your instructions, one is the asset the AI can modify, and a third locks in the scoring rule the AI cannot touch. The AI generates a hypothesis, tests it, scores it, keeps what wins, and loops indefinitely. Applied to business, the same pattern runs on cold email subject lines, website load time, Facebook ad copy, or anything else with a fast, objective, AI-accessible feedback loop. The host demos all three live in Claude Code and provides a master prompt to scaffold the whole system.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:00

01 · Hook -- Karpathy open-sourced AutoResearch

Shopify CEO proof point and promise of three live demos

01:0002:18

02 · The viral tweet and how the loop works

Human writes instructions; AI iterates the asset indefinitely

02:1804:07

03 · Meat computers and 85,000 stars

Karpathy's paradigm-shift framing; Toby Lutke's overnight result

04:0705:32

04 · Gary Tan and the business pivot

The bottleneck is the instructions file, not compute; Chamath content engine use case

05:3207:36

05 · The three-file system

Instructions locked, asset AI-accessible, scoring locked -- evolutionary biology in software

07:3608:24

06 · 701x -- 36,500 experiments per year

Eric Siu framing: the next generation of marketing teams runs experiments while they sleep

08:2411:51

07 · The 3 must-haves

Objective score, fast feedback loop, AI access to change the asset

11:5114:35

08 · The 3 nice-to-haves and use case table

High volume, cheap to fail, consistent measuring stick; full table of applicable business assets

14:3517:34

09 · Live demo 1 -- website speed

800ms down to 90.5ms overnight via Claude Code using the master prompt

17:3419:55

10 · Live demo 2 -- cold email subject lines

24-hour open rate loop; mock results for a weeks-long run

19:5521:39

11 · Live demo 3 -- Facebook ads on autopilot

10 dollar per test, price-per-click scoring, AI generates creative variants continuously

Atomic Insights

Lines worth screenshotting.

  • AutoResearch is not a tool you run directly -- it is a recipe you hand to your AI agent with instructions on what to improve.
  • The scoring file must be locked from the AI or it will optimize the score rather than the asset.
  • Marketing teams that run 30 experiments a year will lose to competitors running 36,500 -- the only difference is automation.
  • A website load time dropped from 800ms to 90ms in one overnight Claude Code session using this three-file approach.
  • The Shopify CEO ran AutoResearch on the Liquid codebase and woke up to a 53 percent speed improvement and 61 percent fewer object allocations.
  • The bottleneck is no longer compute -- it is the quality of your instructions file, the ProgramMD.
  • Fast feedback loops are mandatory: SEO, pricing, and churn metrics are too slow; load time and email open rates are ideal.
  • Cold email subject lines are a textbook AutoResearch target because you can spin up fresh audiences and measure opens in 24 hours.
  • Any asset the AI cannot modify -- a published YouTube video, a live contract -- is off-limits for AutoResearch.
  • The system is evolutionary biology in software: variations that score better survive; the rest are discarded and the loop continues.
  • Cheap-to-fail iterations are a must -- AI-generated image variants cost cents; hiring designers per experiment does not scale.
  • One master prompt pasted into Claude Code builds the entire three-file scaffold and asks you what asset to optimize.
  • AI-driven Facebook ads can test a new creative variant every 10 minutes at 10 dollars per test, something no human team can match.
  • Karpathy's original LLM had been hand-optimized for months but AutoResearch found an additional 11 percent improvement in one overnight run.
Takeaway

Six rules that decide if AI can optimize it

WHAT TO LEARN

AutoResearch is not magic -- it is a structured loop that only works when your asset has an objective score, a fast feedback cycle, and something the AI can actually rewrite.

02The viral tweet and how the loop works
  • Any asset you want AI to self-improve needs three things locked down first: a goal the AI cannot rewrite, an objective number to beat, and file-level access so the AI can actually change what it is testing.
03Meat computers and 85,000 stars
  • The scoring file must be separated from the asset file and kept off-limits to the AI -- otherwise the system optimizes the score rather than the underlying quality.
07The 3 must-haves
  • Feedback loops measured in weeks -- SEO, churn, pricing -- disqualify an asset; loops measured in hours -- load time, email open rate, cost-per-click -- are ideal candidates.
  • The constraint is not compute -- it is the quality of the instruction file you write, which is the only lever a human turns after setup.
08The 3 nice-to-haves and use case table
  • High-volume traffic dramatically accelerates results: a site getting 50,000 daily impressions learns far faster than one with 50.
09Live demo 1 -- website speed
  • A website going from 800ms to 90ms overnight is achievable with a single Claude Code session -- the real work is writing the scoring rule, not the optimization itself.
11Live demo 3 -- Facebook ads on autopilot
  • Applying this to paid ads means committing to an objective metric before the loop starts -- without that constraint, the AI has nothing to optimize against.
Glossary

Terms worth knowing.

AutoResearch
Karpathy's open-source system that runs an AI agent in a loop, generating hypotheses, testing them against a locked scoring file, and keeping only improvements. Available at github.com/karpathy/autoresearch.
Instructions file
The first of three files in the AutoResearch system -- a human-written markdown file that defines the goal and rules for the AI agent; the AI can read it but not modify it.
Scoring mechanism
The third locked file that defines the single objective number the AI must beat -- for example, load time in milliseconds or email open rate.
Feedback loop
The time between making a change and receiving a measurable result; AutoResearch requires this to be minutes or hours, not days or weeks.
ProgramMD
Gary Tan's term for the instructions file -- the markdown document you write that tells the AI agent what to optimize and why.
Overfit
When an AI optimizes so specifically for the test conditions that the improvement does not hold in real-world use; the Shopify CEO flagged this as a caveat for his results.
Resources

Things they pointed at.

Quotables

Lines you could clip.

04:08
One day, frontier AI research used to be done by meat computers -- humans -- in between eating, sleeping, and having fun. That era is now long gone.
Karpathy quote that reframes the entire paradigm shift in two sentencesTikTok hook↗ Tweet quote
07:03
The bottleneck no longer is compute. It is your ProgramMD.
One punchy line that is already a tweet from Gary Tan -- highly quotableIG reel cold open↗ Tweet quote
07:51
Most marketing teams run 30 experiments a year. The next generation will run 36,500. And they will run the experiments while they sleep.
Stark quantitative contrast with a clear call to action impliednewsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:00Andre Kapathi, the godfather of modern AI, has just released a simple system called AutoResearch, which the top AI labs in the world have been spending millions trying to create. Since releasing this AutoResearch, it's got over 85,000 stars on GitHub, and even Shopify's CEO pointed it at Shopify's code and sped it up by 53%.
00:22But here's the thing. It doesn't just work for coding. In fact, auto research can be pointed at any part of your business, your emails, your ads, your landing pages, your AI skills, AI agents, or even your organic content.
00:35And it forces each part of your business to self improve at the speed of light while you sleep. So in this video, we'll break down Kapathi's simple auto research system and show you exactly how to plug in any part of your business that you want to improve. And we'll do this together by going through three real world examples that you can start using today.
00:53All I ask in return is you hit that like button down below, grab your Slovakian flag, and let's jump in. Okay. So the fun started here with Andre Kapathi tweeting that got another 11,000,000 views for a highly technical tweet, which once again shows the demand and how impressive this stuff actually is.
01:11He wrote, I packaged up the auto research project into a new self contained minimal repo if people would like to play with it over the weekend. He says, the human, which is me and you, will iterate on the prompt, giving the agent a set of instructions on what we wanted to improve, and then the AI agent will iterate on the training code or whatever the asset it is that we want them to improve and we give them.
01:34It's gonna literally work all night trying to improve that asset while we're asleep. Let me show you where Andre Kapathi started. He says, the goal is to engineer your AI agents to make the fastest research progress indefinitely without any of your own human involvement, which is why we can be asleep.
01:50In the image below, which this auto research image here, is his LLM getting smarter and smarter with every iteration that it was running on a five minute loop until 83 experiments later, it was 11% faster than when Andre Kapathi left it.
02:07And he also says that he'd been working on this same LLM agent trying to make it smarter for a very long time and thought it was done. But his auto research tool found another 11% improvement in its IQ.
02:18And this paragraph below the image might be one of the most fascinating things to read in terms of a mindset shift on how powerful this system is. Kapathy wrote, one day, frontier AI research used to be done by meat computers or humans in between eating, sleeping, and having fun.
02:34And these meat computers would synchronize with each other once in a while using sound wave interconnect, which is obviously speaking inside a group meeting.
02:43It's a facetious take on how slow us humans are to do these jobs compared to something like this auto research system. He says that era is now long gone. Research is now entirely the domain of autonomous swarms of AI agents, and he says that this repo or this system right here is the story of how all of this began.
03:01And so the repo is up on GitHub, which is a crazy thing. Andre Kapathi just open sourced it and made it free for anyone to use. It's called Auto Research, and you can see it has 85,000 stars at the time that I'm recording this.
03:13And so there's been hundreds of thousands of people who have been running this system including, like I said in the intro, Toby Lutke, the billionaire CEO of Shopify. And he said, okay.
03:22This thing is totally insane. Before he goes to bed, he set it up with these experiments. And by the time he woke up, he woke up to a plus 19% score after eight hours and thirty seven experiment, literally improving his code while he was asleep.
03:37And Andre Kapathi replied and said, who knew early singularity could be this fun? Showing more of the work that he's had auto research doing for himself. And four days later, Toby Luedke was still playing with auto research.
03:48He says, okay. Well, I ran auto research on the liquid code base. It's now 53% faster with 61% fewer object allocations.
03:58He says this is probably somewhat overfit, but there are some absolutely amazing ideas in this, showing the iterations of auto research without any human intervention. And even Gary Tan from Y Combinator says Kapathi just open sourced auto research.
04:12One GPU, a 100 machine learning experiments overnight while you sleep. You never touch a code. Just write a markdown file or a set of instructions on what you want improved.
04:21The bottleneck no longer is compute. It is your ProgramMD, which is instructions you're giving your agent.
04:27But I personally don't wanna apply this to Codebases or AI agents. I wanna apply it to my business and my marketing, which is where we really start to see the brilliance of this auto research system. Andre Kapathi says, you don't use it directly.
04:40Talking about his GitHub and his system. It's a recipe slash idea. Give it to your agent and apply it to whatever you care about, which is where our magic begins.
04:52Because even Chamath Palihapitiya had a potential use case, which is kind of mind bending to think about. He said the biggest threat to today's social media apps is an incredible video model.
05:03So something that can take text and turn it into incredible looking videos, plus TTS, plus auto research, which is where you could set up your AI agent with a TikTok or Instagram account and have a constantly producing incredible quality content, learning from the results of each piece of content in this loop feature that I'm about to reveal to you, and improving and iterating a 100 times a night.
05:27Honestly, this thing in the hands of your competitors will be extremely scary. So let me show you how this system actually works. It's a three file system.
05:35Now Kapathi called these files program, train, and prepare.
05:40But to me, that's just confusing and unnecessary. We have the instructions file, which is locked to your AI agent and only used by me and you, the human actually setting the AI agent up for the task. Then we have the file or the asset that we want to optimize.
05:54This is the second file, and this is the file that the AI actually gets access to because it's actually trying to optimize its performance. I'll give you a few examples in just a second. It takes that asset, tries to test a new variation of it, and then we'll compare this to the third file, which is a scoring mechanism.
06:10Again, the scoring mechanism is a file that is locked to the AI because we don't want them tampering with it in order to score higher. We want the scoring mechanism and the set of instructions only accessible to us, the human, forcing the AI agent to actually do the task and optimize the asset.
06:26So let's get practical. Let's use a few examples to really understand this. This is the baseline of what Andre Kapathi first tested his auto research on.
06:34So we essentially want to improve the intelligence of an AI agent. We're gonna use IQ here. He didn't use IQ, but I've used it just to keep it simple so we can understand it first.
06:42So in the set of instructions, he says, I need you to improve the IQ of this AI agent. Gives Auto Research the AI file that makes up the current AI agent, and Auto Research will take that file and create a test variable. It'll change the code and make one test.
07:00It'll then take that test of that new AI agent and compare the IQ of it to the original file. If it is smarter, it'll keep that new file that it tested and replace the old file because you've improved IQ, and it will loop it again.
07:16It'll take that new improved AI file that has higher IQ, make another change to it, and test it. This is basically evolutionary biology and natural selection, but in the machine world. Now if its test variation doesn't beat the original IQ of the AI file, it will revert back to the original file and try again.
07:33And this is done in five minute loops repeating indefinitely until it reaches a certain goal that you have or until a human comes and stops it. Okay.
07:40So what parts of our business can we actually apply this auto research to to skyrocket past our competitors? Well, Eric Sue had a very interesting article on Twitter. He says, Kapathi's autonomous AI can make you 701 times faster.
07:56It's the future of business, not coding, but business generally. He says most marketing teams will run 30 experiments a year, but the next generation will run 36,500 experiments per year easily.
08:10And they'll run the experiments while they sleep using the auto research tool. And so technically, this auto research tool could be pointed at any part of your business, but there are some criteria of what it works best for. So there's three must haves and then three nice to haves.
08:27And if you fit this criteria, you can run auto research on that part of your business. We're gonna go through a lot of examples together in just a second.
08:33The must have rule number one is it needs to be scored objectively. So if you're like, make this page look better, there's no objective measure. If you said, come up with the best video idea, there's no objective measure.
08:43Come up with the funniest joke. How do you measure funny? Well, is it the most laugh?
08:48Now you're starting to get an objective measure, but that would be make a joke that gets the most laughs and measure the decibel volume. You need that objective measure in order for AI to score it without a human in the loop. So things like load speed of a website, excellent.
09:00Number of impressions a piece of content gets, excellent. Click through rate on a page, excellent. Then rule number two is you need a fast feedback loop.
09:09You need the results in minutes or maximum hours, not weeks.
09:14For example, a load speed on a website. Once again, you can test that in seconds, which means you get more iterations and more improvements, and it's gonna actually work for you. Or email opens, how many people are opening that email that hour?
09:26That will pass. However, SEO rankings, you make a change to your website and be like, let's wait to see Google reindex this ten days later. It's not gonna work for you because there's too big of a feedback loop for the AI to actually get enough data to learn.
09:39Or pricing. What if I lower my pricing? Is that gonna reduce my churn in six months from now?
09:43Really hard for an AI to actually have a feedback loop and iterate on that. Number three, the AI obviously needs access to change it. So if it's a HTML file or an API in a software you use, excellent.
09:55It's got access to the asset that you need. However, if it's a video that's been already published on YouTube and you're like, oh, change the intro, you can't log in to a past YouTube video and change the intro because it's already published and AI cannot have access to that. So if you tick the box on those three things, then you wanna have a look at the nice to haves because this will make your auto research even more powerful.
10:15You want a high volume of feedback. If you have a website and you're getting 50,000 impressions per day and you're changing the ad copy on the website, incredible. You're gonna get a lot of data, a lot more iterations, and therefore a lot more improvement in your website conversions.
10:27If only you're getting 50 impressions a day, it's gonna be really tough. You're gonna have to wait a lot longer to test it. It does also need to be cheap to fail.
10:34So if you plug an image model so if you're looking to do graphic design and you're plugging a image model, say, Nano Banana into your AI and you're having it create images and then scoring them based on whatever rating system, we'll get to that in a second, you are using, that's great. That's gonna be relatively cheap for you to generate those images.
10:51If you are having your AI literally go out there and hire graphic designers and then wait to see their work, it's gonna be too expensive. You're gonna have to pay thousands of dollars for these graphic designs. Not gonna work.
11:00Your iterations need to be fast. They need to be cheap, and they need to have a lot of volume. Number six, you need consistent measuring stick.
11:08So we have a file. The scoring system is a file that the AI cannot touch. It cannot manipulate the goals to say, oh, yes.
11:16We did it. We improved it. What your definition of better is, which we said at the start, must stay the definition the whole time, and AI cannot manipulate that.
11:24But also, the scoring mechanism that you put in that file must be objective and it must be consistent. For example, if you split test an email to fresh audiences, you're gonna have a consistent measuring stick.
11:37However, if you're emailing the same list over and over with new different titles, they're gonna have list fatigue because they've had six emails already, and therefore the seventh thing can be way less likely to open because they're not gonna open seven emails in a row in a period of thirty five minutes from you. Let's go through some examples together of things we can point this auto research at.
11:56So coding efficiency. This is the obvious one. This is the one that Toby Looky and Andre Kapathi have also done.
12:02How do you make your coding how do you make your website faster? The asset to optimize would be the source code of the actual website, and the scoring system would be the runtime in milliseconds that it takes to load once booted up.
12:15Cold email outreach. Nick Serajev had an excellent example of this one where he has a cold email outreach company that is using auto research to test the titles of the emails and the body of the emails, what's actually in the content. The instructions would be get more replies.
12:30This is what you're telling the auto research to do. We need more replies. That is the metric that we wanna score it on.
12:34So a positive reply rate is the scoring system. The asset optimized is the actual email, the subject, the opener, and the call to action in that, which it could just be spinning up tests and shooting it to a 100 people in the first email, a 100 people in second email because these are cold emails. It's not to an actual list.
12:49Instagram DM outreach. Your instructions to your auto research could be book some more calls, stay human, and don't spam. Of course, these instruction files, which I'll walk you through in just a second how to build them yourself, are going to be much longer than this.
13:01This is just a snippet of an example just so we can start to get some ideas on how these would work. What to optimize? You'd optimize the DM script.
13:08What's the scoring system? How many replies you get and what the booking rate is per template. Website load speed, an easy one.
13:15Sales page copy, an easy one. Video watch time, what videos are holding the viewer retention the longest? YouTube titles and thumbs, you could plug this into your YouTube dashboard and test your thumbnails and titles over time as long as they're getting enough data to them.
13:30What metric would you use? The amount of views or the watch through, uh, or the click through rate is the one that I would actually use. Sales scripts.
13:36Now this is starting to get to the longer feedback loops. And Andre Kapathi says you need five minute feedback loops. But when we're applying this to business, we can be more and more lenient.
13:44It's just gonna take more time to get that feedback loop connected, but you could do something like sales scripts and have the AI analyze which of these scripts are giving the best close rate. Sales funnels, app speed, cart checkouts, prompt engineering. What prompts are giving you the best results, your agent intelligence, or your agent AI effectiveness?
14:02And so let's run through some of these examples actually in the real world together to show you how you would do it at home for your business. So really important to understand once again. We have an instructions file.
14:12We have an asset to optimize. We have a hypothesis that the AI makes and then tests, scores it to see if the actual test out did based on the scoring mechanism, the original asset to optimize.
14:24If it does, it keeps it. If it doesn't, it throws it away and goes again. And it repeats indefinitely overnight, which is critical that you have that as part of the instructions so that literally works nonstop on your behalf.
14:35And so I've made you a master prompt for it to walk you through setting up this exact system. All you have to do is copy paste this prompt into your Claude code, and he will literally set up a three file system for you and ask you what asset is it that you're optimizing so when you walk through it, he's gonna help you get all the files, the connections, the APIs plugged in so you can start your auto research on whatever asset it is you're trying to improve.
14:57It has pulled Kapathi's rules. It has modeled Kapathi's GitHub. It is the exact prompt that you need to start auto researching in your business.
15:05So we're gonna walk through three live examples together starting with the easiest and fastest iterative loop, making a website faster.
15:13So this is a local file I have. I got called to mock up a website, but it's not fully optimized. It doesn't load as fast as it possibly can.
15:21And what a developer would normally do or I would normally do as a business owner is like, I want my website to be quicker. Let me think about how it's going to be be quicker or hire a developer. And the developer's like, I have a hypothesis.
15:31What if I change this and test? AI is handling all the hypothesis and all the testing.
15:36All we have to do is come to our Claude code, paste in the prompt that I'm giving you in this video, and Claude code will respond. Hi.
15:44I'm now your auto research engineer. Here's the deal in one breath. We pick one thing in your business, and he turns, is it good into a single honest number.
15:52And then I sit here all night changing and scoring it, keeping what wins and trashing what loses. Fantastic. That is auto research.
15:59So we want to improve the speed of a website, I said. Claude says, that's a great pick and an honest one. Website speed is textbook auto research target.
16:09It is objective measure, it is fast, and it's reachable. It can have the page and HTML and the JPEGs all in one place locally, which I'll show you in just a second.
16:18But remember, the fit check is for the ideal thing to auto research. We're kinda stretching the limits of auto research and applying it to business, and therefore, you may have a slower feedback loop or you might be trying something that, you know, even if you do have a two hour feedback loop or a twenty four hour feedback loop, it might be better than you having to go do all the testing yourself.
16:38So we're sort of on the cutting edge of this part, but at least for this example here, it does fit Kapathi's criteria of what auto research would be good for. So I linked him to the website, which I just showed you, and he set off on his way. I let Claude pick scoring, which is in his scoring, it is just the milliseconds, the time that it takes to load.
16:58And he's not allowed to touch that scoring file as we've discussed. And basically, he analyzed it and did round after round after round, making the website faster and faster. Now I also asked him to mock up a report to show me exactly how he went, which you can also ask your Claude code to mock up.
17:15This is a classic Claude code looking file. But you can see it went from eight hundred milliseconds baseline all the way down here to ninety point five milliseconds according to Claude code, and these are the rounds that it actually knocked off the milliseconds of and exactly what it did to knock them out.
17:31So now we have a pretty incredibly optimized fast website. Okay. So the next example is cold emails.
17:37I saw Nick Seraev, who does a cold email business, uh, give an example of how he was using this. And so basically, say I have 300,000 emails, uh, that are cold emails that I want to reach out to and do cold outreach, which cold outreach is very different to a warm list.
17:52You're not gonna get your 50% opens. You're gonna get your one to 3% opens, but a lot of people make money that way. I do not.
17:58But if you are in that camp, you can now use auto research to improve something like your email bodies, your email headlines, or just overall, your click through rate, your reply rate, or how many people are actually buying your product. So I came back into a new cold code window, pasted in our prompt.
18:14He says, consider me your one person r and d department. Fantastic. He's gonna run auto research.
18:20I sent him back, I want cold email subject lines tested or improved, should we say. I want you to measure twenty four hour open rates across your sent emails, cross reference the scoreboard or the scorecard of higher open rate.
18:33So we're just changing the subject line of the email. We're gonna send the same exact email. And how he's gonna know what wins and loses is he's gonna send a bunch of them, a thousand of them in this example, and he's gonna look after twenty four hours how many of them got opened versus the last headline, and he's either going to bin that or he's going to replace it and make it the best holy grail email subject line.
18:54Now if you use something like Smart Lead AI, you can plug it into that. You can also use auto research for something like calling.
19:01If you have an auto dialer or if you have AI dialing agents, you can plug and play new scripts, new openers for that. Or if you're doing, uh, DM marketing, you could use something like ManyChat to do your auto research. If you do want more help integrating AI agents or something like auto research into your business, come and join our private community where you can get all the resources and step by step help for people like Bing or even our top AI researcher who is in the group twelve hours a day helping people out.
19:27Now for this example, twenty four hours is a lot longer period of time, and we can't actually wait weeks to show you this one example. So I asked Glor to mock up what it would look like if we ran this for weeks where it's starting to score and give OpenStrength predictors based on what email subject lines that it's actually sending.
19:44And, of course, the only thing missing here is the actual how many people of those thousand people open that email, feed it back in to make sure that that every change it's making is actually making the email be opened more and more. Okay. The last example, I once again pasted in the prompt and then asked it to improve my Facebook ads.
20:01Now this is where things get really impressive and really scary. This is what Jamath was saying how if you have something that can render images, can render video, can render really good copy, go out there, post a live Facebook ad, testing that new copy or testing that new image, seeing how many clicks that we got for, like, $10, for example, and then bidding the ones that do bad, and then testing a new one every five or ten or twenty minutes or even an hour.
20:25Even if this takes you two hours to get to spend that $10, it doesn't matter because you're constantly testing new data in the background. And I saw a guy do this with his actual Instagram and TikTok short form content, where he had a AI rendering the short form videos and then an auto research learning how well they did, coming back and giving him basically machine learning feedback on the next one, and he's doing really well.
20:46We're actually making a video over the next couple of weeks on a system like this, so make sure you hit that subscribe button if you haven't already. But for Facebook ads, I said I want you to plug straight into my Facebook ad account and make variations of the ads I already have. I want you to test them with $10 each.
21:00The thing I want you to measure is the price per click of the ad. And so once again, this is going to be a longer tail thing to actually show, and we're gonna be running these tests in our premium community if you wanna see more of the actual results side of things, but I will be posting a lot more here on YouTube as well.
21:13But we had some mock data. So I gave it my Facebook ad data. It started to create variations, and it has a predicted cost per click for each one.
21:22It mocked up a directory and file structure for me. It's got its scoreboard, how to actually judge how well things go, and then it's got its experiment logs where you can see all the losing files. And this thing is literally going to run while I sleep.
21:34Let me know in the comments below what you're going to be using AutoResearch for. Thanks for watching. I'll see you in the next video.
The Hook

The bait, then the rug-pull.

Andrej Karpathy open-sourced a system the top AI labs spent millions building -- a three-file loop that lets an AI agent run hundreds of experiments on any asset overnight, keeping every improvement and discarding every miss. This tutorial takes that machine-learning recipe and points it at cold emails, a slow website, and Facebook ads.

Frameworks

Named ideas worth stealing.

05:32model

The Three-File AutoResearch System

  1. Instructions file (human-written, locked)
  2. Asset to optimize (AI-accessible, gets modified)
  3. Scoring mechanism (locked, defines the objective number)

A loop architecture where the AI generates a hypothesis, tests it on the asset, scores it against the locked metric, keeps improvements, discards failures, and repeats indefinitely.

Steal forAny repeatable optimization task with a clear metric -- ads, email copy, landing pages, code performance
08:24list

6 Criteria for AutoResearch Viability

  1. Scored objectively (must-have)
  2. Fast feedback loop -- minutes to hours (must-have)
  3. AI has access to change it (must-have)
  4. High volume of feedback (nice-to-have)
  5. Cheap to fail (nice-to-have)
  6. Consistent measuring stick (nice-to-have)

The filter for deciding whether a given business asset is worth pointing AutoResearch at. Three hard gates plus three amplifiers.

Steal forScoping any agentic task -- run the checklist before building the loop
CTA Breakdown

How they asked for the click.

VERBAL ASK
21:10newsletter
Come and join our private community where you can get all the resources and step-by-step help.

Community plug mid-video during the cold email section; subscribe ask near the end of the Facebook ads demo. Both are soft and contextually placed.

FROM THE DESCRIPTION
PRIMARY CTAWhere the creator wants you to go next.
OTHER LINKSAlso linked in the description.
Storyboard

Visual structure at a glance.

open
hookopen00:00
tweet
promisetweet01:00
meat-computers
valuemeat-computers02:18
3-file-system
value3-file-system05:32
701x
value701x07:36
must-haves
valuemust-haves08:24
live-demo-1
valuelive-demo-114:35
live-demo-2
valuelive-demo-217:34
live-demo-3
valuelive-demo-319:55
CTA
ctaCTA21:10
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this