Big Idea

The argument in one line.

The next era of outbound marketing is AI running hundreds of campaign experiments simultaneously and learning from reply data weekly, making human creative labor approval-only rather than primary.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You run B2B cold email campaigns and want to improve reply rates through systematic experimentation rather than intuition.
You have an existing outbound stack (SmartLead, Instantly, Clay) and want an AI layer to run the testing and iteration cycle on top of it.
You are comfortable setting up a GitHub repo, connecting API keys, and approving campaign batches weekly.
You want to understand how to structure a TAM database so an AI agent can generate and test campaign hypotheses without human prompting.

SKIP IF…

You are doing B2C, social-first, or content-led acquisition. This system is strictly cold email outbound for B2B.
You want fully autonomous sends. User approval gates are hard-coded and non-negotiable in this repo.
You have no existing way to pull company data or build a TAM. The system requires a complete data layer before any experiments run.

TL;DR

The full version, fast.

The system applies the Karpathy AutoResearch loop to cold email: run experiment, score result, keep what works, discard what does not. Claude Code ingests your website, onboarding voice memos, and a pre-built TAM spreadsheet to generate campaign hypotheses, propose sample emails, and load approved contacts into SmartLead. Each week it reads reply data, identifies patterns by audience slice and copy style, and proposes the next round of tests. No email uploads without user approval and MillionVerifier sign-off, keeping a human in the loop at the launch gate while removing humans from the creative and analysis work entirely.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:42

01 · Hook + results reveal

Bold claim: this system will destroy lead gen agencies including his own. Backed with the headline number of automatic campaigns at 2x reply rate.

00:42 – 01:17

02 · SmartLead data breakdown

Screen share of SmartLead analytics showing 1.93x, 2.06x, 1.81x lift week-over-week across April 6-17. The fair comparison is replies per send, not raw volume.

01:17 – 02:04

03 · Karpathy AutoResearch origin

Explains how the Karpathy self-improving ML experiment loop inspired the idea. Claude Code discarded the actual repo as unnecessary. Only the concept transferred.

02:04 – 03:21

04 · 8-step system overview

Screen share of the full onboarding flow: Enter Website, Auto Draft, Fill Gaps, Pull TAM, Review List, Create Experiments, Approve and Upload, Learn Every Week.

03:21 – 04:21

05 · Context building

System writes ICP, case study, value prop, and problem statement markdown files. Voice memo walk-and-dump is the fastest way to load context. Without good context nothing else works.

04:21 – 06:38

06 · TAM build

Screen share of a real enriched TAM spreadsheet covering sales motion, pricing tiers, headcount ratios, ad spend, CTA type. Pre-building the full TAM eliminates bad AI game-time decisions.

06:38 – 08:09

07 · Experiment creation and copywriting

The list is the message. AI proposes campaigns based on TAM signals and context, presents 3-5 sample contacts so the human writes the message. User approves messaging before any upload.

08:09 – 08:40

08 · Campaign loading and weekly cadence

Everything loads into SmartLead and Instantly. Weekly Friday Codex automation reviews past performance and proposes next round. User still approves before anything sends.

08:40 – 09:44

09 · Hard gates

CTA is locked to prevent drift. MillionVerifier must pass every email. User approval is hard-coded. Supabase internally; CSV file for end users. These are non-negotiable.

09:44 – 11:15

10 · Weekly learning loop and CTA

System reads results, finds patterns by title and industry, suggests next test slice, repeats. Free first campaign offer for B2B businesses doing over 3M in revenue.

Atomic Insights

Lines worth screenshotting.

The AI-powered outbound loop is the product. The emails themselves are nearly irrelevant compared to the experiment structure that tests and learns from them.
The Karpathy AutoResearch repo was discarded by Claude Code as unnecessary. Only the concept of autonomous iterative experimentation transferred, not the code.
Letting an AI make game-time decisions about which companies to target produces small, unrepresentative samples. Pre-building the full TAM eliminates that failure mode.
Context quality is the ceiling of system quality. ICP files, case studies, value prop docs, and problem statements must exist before any campaign logic runs.
The list is the message. Which companies you select and the signals used to find them often drive more reply-rate variance than the email copy itself.
Locking the CTA as a hard constraint prevents AI from generating campaigns that accidentally give away the product or drift from the core offer.
MillionVerifier as a hard gate before SmartLead upload is a deliverability protection that cannot be skipped. Unverified emails poison domain reputation.
The weekly review loop compounds. Each round of experiments makes the next round smarter, widening the moat against competitors every seven days without additional human investment.
Human approval at the messaging and launch steps is the trust mechanism that makes autonomous campaign generation safe to run in production. Removing them is how systems fail at scale.
Automatic campaigns achieved 20.71 replies per 1000 sends versus 10.71 for manually managed campaigns, a 1.93x to 2.06x lift held across multiple weeks.
GPT-4o nano batch API handles TAM data processing, not a premium model. Cost efficiency at the data enrichment step matters when processing thousands of companies.
Recording a voice memo while walking and dumping everything you know about your dream customer, then giving that transcript to Claude Code, is the fastest way to build the context layer.

Takeaway

The experiment loop is the product, not the email.

WHAT TO LEARN

The leverage in modern outbound is systematic experimentation at machine pace. The copy is nearly irrelevant compared to the loop that tests and learns from it.

AI-powered outbound works by running experiments against your reply data, not by generating better copy. The loop is the product, not the prose.
Letting an AI make game-time decisions about which companies to target produces small, unrepresentative samples. Pre-building a complete TAM with every useful signal eliminates that failure mode.
Context quality is the ceiling of system quality. ICP files, case studies, value prop docs, and problem statements must be built before any campaign logic runs, or the experiments optimize against the wrong target.
Locking the CTA as a hard constraint prevents AI from generating campaigns that drift off-brand or accidentally give away the product. The constraint is a feature, not a limitation.
User approval gates at the messaging and launch steps are the trust mechanism that makes autonomous campaign generation safe to run in production. Removing them is how systems fail at scale.
The weekly review loop compounds. Each round of experiments makes the next smarter, widening the moat against competitors without this layer every seven days without additional human investment.
The list is the message. Which companies you select and what signals you used to find them often drive more reply-rate variance than the email itself, which is why TAM quality is the first investment.

Glossary

Terms worth knowing.

TAM file: A pre-built spreadsheet of every company you might want to target, enriched with every relevant signal upfront including sales motion, pricing, headcount ratios, ad spend, and CTA type, so the AI never has to fetch data on the fly during experiment runs.
AutoResearch: An open-source repository by Andrej Karpathy that trains a small local ML model by running self-improving experiments every five minutes, measuring what works, and iterating. The name and concept were borrowed here; the actual code was discarded.
ICP: Ideal Customer Profile. A markdown file describing the exact type of company and buyer persona you want to reach, used as primary context for the AI when generating campaign hypotheses.
Hard gate: A non-negotiable checkpoint in the workflow that blocks campaign upload unless a specific condition is met: CTA must match the locked objective, email must pass verification, and the user must explicitly approve.
MillionVerifier: An email validation service used as a pre-upload gate to confirm that each contact email is deliverable before it enters a SmartLead campaign.
SmartLead / Instantly: Cold email sending platforms that manage campaign delivery, inbox rotation, and reply tracking. They are the execution layer this system loads approved campaigns into.
Experiment ledger: A running log of every campaign hypothesis tested, its results, and which contacts were already reached, used weekly by the AI to avoid repeating tests and build the next round on what was learned.

Resources

Things they pointed at.

01:17linkKarpathy AutoResearch ↗

00:42toolSmartLead.ai ↗

08:09toolInstantly ↗

08:09toolCodex ↗

04:21toolClay ↗

06:38toolRapid API ↗

06:38toolAppify ↗

06:38toolProspio

06:38toolBlitz API

06:38toolhtml-to-text

08:40toolMillionVerifier ↗

09:44toolSupabase ↗

Quotables