The argument in one line.
The compounding value in AI-driven experimentation is not each individual iteration but the accumulated experiment log — once you can read a metric automatically, Claude Code will improve it indefinitely without human intervention.
Read if. Skip if.
- You already use Claude Code for tasks but have never pointed it at a repeating business metric to optimize.
- You run a community, email list, or landing page and want to test copy variations without manually A/B testing every week.
- You want to automate data collection from a platform that doesn't expose an API (Skool, Kajabi, custom dashboards).
- You're comfortable with IDEs and Python repos but haven't yet built an agentic improvement loop.
- You have no existing traffic or conversions — AutoResearch needs volume to produce signal.
- You're looking for a no-code solution; this tutorial assumes comfort with a code editor and CLI.
- You need a pre-built product, not a framework to implement yourself.
The full version, fast.
Karpathy's AutoResearch repo encodes a single idea: let an AI run controlled experiments on a metric, log every result, and compound the improvements indefinitely. The video builds this loop live — clone the repo, pick one measurable objective (Skool conversion rate), load your business DNA into the AI's context, then solve the hardest part: getting data from platforms that have no API. The solution is Claude CoWork browser automation feeding a Notion page, which Antigravity polls on a schedule. A Pinecone vector database pre-loaded with expert content (Hormozi, community surveys, your own transcripts) makes the generated copy specific rather than generic. The first iteration produces a "pain block" hypothesis with a full testing roadmap.
Chat with this breakdown.
Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.
Create a free account →Where the time goes.

01 · Hook — the WD-40 principle
Loss-frame opener, host intro, WD-40 iteration metaphor as organizing frame for the entire video.

02 · What AutoResearch is
The three-part loop: generate → deploy → harvest. The log as compounding asset. Baseline vs. challenger pattern.

03 · What makes a great candidate
Objective metrics only — pass/fail, number, percentage. Anything with a 'vibe' fails the test.

04 · Repo setup in Antigravity
Clone AutoResearch from GitHub into Cursor; connect Claude via subscription (cheaper than API); repo structure overview.

05 · Step 1 — Choose the metric + context
Pick Skool conversion rate. Load context: community copy pasted in, business.md prompt, survey data. Claude asks clarifying questions.

06 · Business DNA and survey data
business.md template walk-through. Survey responses as voice-of-customer training data. Feed everything to Claude before first iteration.

07 · Step 2 — Measure: baseline data without an API
Skool has no public API. Strategy: screenshot + manual paste for first baseline. Claude asks what data it needs.

08 · Frequency + Pinecone knowledge base
250 events = stable window. Build a vector DB from Hormozi books, YouTube transcripts, community posts to make copy specific.

09 · Step 3 — Automate with Claude CoWork
Install Claude browser extension; give CoWork a task to visit Skool dashboard on a schedule, copy analytics, return data to chat.

10 · Pipeline: CoWork → Notion → Antigravity
CoWork writes scraped data to a Notion page; Antigravity polls that page; full closed loop without manual intervention.

11 · Business guide to choosing where to start
Decision flowchart: measurable? → retrievable? → frequent enough? Great vs. bad candidate matrix. Start where volume is highest.

12 · First iteration output + CTA
Claude generates v0→v1 hypothesis: pain block between headline and offer. Testing roadmap with priorities. CTA to knowledge-base video.
Lines worth screenshotting.
- WD-40 is named WD-40 because 39 formulations failed first — the iteration count is the product, not the accident.
- The log of past experiments is the compounding asset; each new Claude session inherits all prior context and never repeats a failure.
- AutoResearch requires exactly three things: one thing to change, one objective metric, and an automated way to read the result.
- Platforms without APIs are not blockers — Claude CoWork's browser automation can scrape any dashboard your browser can open.
- 250 conversion events is roughly the threshold for a statistically stable test window; below 100 and the signal is too noisy.
- Feeding Claude a vector database of domain expert content (books, transcripts, survey data) replaces generic copy with voice-matched specificity.
- The best AutoResearch candidates have high volume, fast feedback cycles, and numeric or binary outcomes — not 'does it feel good' judgments.
- Claude's clarifying questions at context-loading time are a feature: the specificity of the question predicts the quality of the output.
- You can run the data loop even when your laptop is closed — if CoWork misses a scheduled run, it retries on next open.
- A cookie-based scraper can replace CoWork entirely for certain endpoints, removing the need for a running browser session.
- The hypothesis Claude generated — a 'pain block' between headline and offer — was grounded in survey data, not generic CRO advice.
- Starting with the single biggest constraining metric (not a random one) determines whether the whole system produces real business impact.
Any metric you can read automatically is a metric Claude can improve.
The AutoResearch framework's real unlock is not the AI — it's designing your measurement setup so the loop can run unsupervised.
- The accumulated log of past experiments is the compounding asset, not any single iteration — the AI inherits all prior context and never repeats a failure.
- An objective metric (a number, a pass/fail, a percentage) is a hard prerequisite; anything that requires a human judgment call will stall the loop.
- Platforms without APIs are not blockers — browser automation can scrape any dashboard a browser can open, then relay the data to the agent via a shared document.
- 250 conversion events is a reasonable stability threshold before drawing conclusions; below 100, variance will mask real signal.
- Loading domain-specific context (expert books, customer surveys, your own transcripts) into a vector database moves AI-generated copy from generic to voice-matched.
- Start your first AutoResearch project on the highest-volume, fastest-feedback metric you have — the system compounds fastest where data is most abundant.
- Changing one variable per iteration is the discipline that makes results interpretable; radical redesigns reset the log's accumulated signal.
Terms worth knowing.
- AutoResearch
- A GitHub repo by Andrej Karpathy that structures AI-driven iterative experimentation: generate a variant, deploy it, measure the result, log it, repeat. The log compounds as the primary asset.
- Baseline / Challenger pattern
- A controlled testing structure where the current best-performing version (baseline) is held constant while an AI-generated variant (challenger) is evaluated; the challenger replaces the baseline only if it wins.
- Claude CoWork
- A browser extension and desktop integration that lets Claude autonomously perform actions in a web browser — clicking, scraping, filling forms — on a user-defined schedule, without requiring an API.
- Antigravity
- The presenter's name for the Cursor IDE (an AI-native code editor); used interchangeably throughout the video.
- program.md
- The single markdown file in the AutoResearch repo that serves as the standing instruction set for the agent — defines the metric, the change process, and the loop frequency.
- Pain block
- A short copy block positioned between a headline and an offer that names the reader's specific frustration, used to increase self-identification and reduce bounce before the pitch.
- Vector knowledge base
- A database (here, Pinecone) that stores content as numerical embeddings so an AI can retrieve the most relevant passages by semantic similarity rather than keyword match.
Things they pointed at.
Lines you could clip.
“WD-40 is called WD-40 because WD-39 failed.”
“The log is the asset.”
“The question you ask is more important than the answer sometimes. In fact, most times.”
Word for word.
The bait, then the rug-pull.
Andrej Karpathy named his iteration framework after WD-40 — a product called WD-40 because WD-39 failed. In this tutorial, Jack Roberts picks up that logic and wires it into a live Claude Code pipeline, turning a Skool landing page's conversion rate into the first test subject of a loop that, theoretically, never has to stop.
Named ideas worth stealing.
AutoResearch Loop
- Generate variant
- Deploy
- Harvest data
- Log result
- Next experiment
A closed iteration loop where each run's result is logged and informs the next variant. The log compounds as the primary AI context.
Three Requirements for AutoResearch
- One thing to change
- One objective metric
- A way to read the result automatically
Minimum viable criteria before starting an AutoResearch project. All three must be satisfied; if any is missing, the loop stalls.
Business Guide to Choosing
- Does it have a measurable metric?
- Can you get data via API or scraping?
- Does it run frequently enough for fast feedback?
Decision flowchart for selecting the first (or next) AutoResearch project. High volume + fast feedback + numeric metric = ideal candidate.
How they asked for the click.
“Knowing how to leverage the auto research skill without knowing how to build this database of knowledge from your chosen experts is going to severely limit your skills and capability — which is why the next thing we need to do is build out a knowledge base.”
Soft CTA woven into the final summary, not a hard ask. Points to a prior video on vectorizing expert content. No subscribe pitch, no sponsor.




































































