The Only Claude Skills Tutorial You Need (Add Evals and Memory)
A 19-minute live build showing how to make Claude Code skills that grade their own output, remember past sessions, and get better every time you run them.
June 3rdKun Chen quit big tech and now ships more code in a day than most engineers ship in a month — by building three tools that move him almost entirely out of the loop.
The bottleneck in AI-assisted engineering is not the agent — it is the engineer who keeps inserting themselves into coding and review loops they no longer need to touch.
Kun Chen ships 20-40 GitHub PRs a day by treating the Plan-Code-Validate loop as something agents should run almost entirely on their own. His three tools encode this: Lavish turns planning into an interactive HTML artifact the agent writes and the human annotates; Treehouse maintains a pool of pre-configured git worktrees so parallel sessions have zero setup cost; and No Mistakes is a post-coding pipeline that rebases, reviews in a fresh context window, runs end-to-end tests, updates documentation, and opens the PR without the engineer touching code. The review step uses a deliberate fresh context window because same-session self-review is biased toward confirming what was already done.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
Highlight reel of key claims: no code review, 20-30 agents running, 20-40 PRs per day.

Kun explains his three-phase loop and how investing more in planning allows agents to run longer in code and validate autonomously.

Live demo on the hi-bit AI tutor app: screenshot-to-agent workflow, why HTML beats markdown for planning, interactive option selection.

How Kun uses Lavish to turn a rough idea into a spec, letting the agent criticize and propose risks before committing to a direction.

Running 5+ sessions, 20-30 sub-agents; Treehouse as worktree pool; when to use sub-agents for context management; ProgramBench evaluation demo.

The nm alias triggers a full pipeline: fresh context review, end-to-end tests, docs update, lint, push, PR with risk classification.

The PR risk assessment is the only thing Kun reads; he merges low-risk PRs without diff review. Discussion of team processes at 10x PR velocity.

Build many throwaway things; run more agents in parallel; adopt AI in every manual step, not just coding. Demo of Claude Code /insights.
Shipping at high velocity with AI agents is not about better prompts — it is about systematically removing yourself from every step that does not require your judgment.
“If you review every single line of code, you become the bottleneck. So I don't review this first pass code from the agents.”
“Eventually, I got to a point where I find myself never catching anything the agents don't catch.”
“Our workflows and how our teams work were built at a time when we spent most of our time coding. But when you start to write 10 times more PRs, we are not ready for that.”
“I feel liberated.”
“Build every single idea you have. Whenever you have some idea, send the prompts to the agents and see what it does.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
Most engineers running AI coding agents still act like software reviewers — reading every diff, approving every step, personally validating every change. Kun Chen stopped doing that. The ex-Meta L8 principal engineer now ships 20 to 40 GitHub PRs a day, and he does not review a single line of code himself. This is the system he built to make that possible.
00:01
01:02
01:45
02:27
03:10
03:52
04:34
05:34
05:51
06:20
07:23
08:05
08:47
09:30
10:12
10:54
11:36
12:18
13:01
13:43
14:25
15:06
15:50
16:32
17:22
18:02
18:38
19:21
19:45
20:45
21:27
22:10
22:52
23:34
24:16
24:58
25:41
26:23
27:05
27:47
28:30
29:12
29:54
30:36
31:19
32:01
32:43
33:25
34:07
34:50
35:32
36:14
36:56
37:39
38:21
39:03
39:45
40:27
41:10
41:52
42:34
43:16
43:59
44:41
45:23
45:57
46:47
47:30
48:12
48:54
49:36
50:19
51:01
51:43
52:25
53:09
53:50
54:13
55:18
55:41A 19-minute live build showing how to make Claude Code skills that grade their own output, remember past sessions, and get better every time you run them.
June 3rdJosh Pigford built and sold Baremetrics, now runs five AI products solo — and his Claude Code skill stack is the most systematic one on record.
May 31stTeresa Torres runs her entire life and business from two Claude Code terminals. This is how she built it.
December 21st 2025A 5-minute walkthrough of Anthropic's native Agent View TUI and how it slots into a folder-based Agentic Operating System.
May 12thA 7-minute live demo of Shockwave, a free open-source note app with an AI agent baked directly into the editor.
May 31stAn 18-minute walkthrough of how Claude Opus 4.6 spawns specialized AI teams from a single prompt -- what it costs, when to use it, and what the live output actually looks like.
February 26th