Modern Creator
Peter Yang · YouTube

How This Ex-Meta L8 Engineer Ships 40 PRs a Day with AI Agents

Kun Chen quit big tech and now ships more code in a day than most engineers ship in a month — by building three tools that move him almost entirely out of the loop.

Posted
yesterday
Duration
Format
Interview
educational
Views
3.6K
120 likes
Big Idea

The argument in one line.

The bottleneck in AI-assisted engineering is not the agent — it is the engineer who keeps inserting themselves into coding and review loops they no longer need to touch.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You are already using Claude Code or Codex but still spend most of your time prompting, reviewing diffs, or waiting for one agent to finish before starting the next.
  • You want to run 5+ parallel agent sessions but have no system for managing worktrees, tracking session state, or validating changes before merge.
  • You are a solo builder who wants the output velocity of a small engineering team without the coordination overhead.
  • You work in a codebase with meaningful test coverage and want agents to catch their own regressions before you review anything.
SKIP IF…
  • You are not yet comfortable reading and reasoning about code diffs — the tools here assume you can make a judgment call when the agent escalates an ambiguous bug.
  • You are building a throwaway prototype where shipping velocity matters far more than validation rigor.
TL;DR

The full version, fast.

Kun Chen ships 20-40 GitHub PRs a day by treating the Plan-Code-Validate loop as something agents should run almost entirely on their own. His three tools encode this: Lavish turns planning into an interactive HTML artifact the agent writes and the human annotates; Treehouse maintains a pool of pre-configured git worktrees so parallel sessions have zero setup cost; and No Mistakes is a post-coding pipeline that rebases, reviews in a fresh context window, runs end-to-end tests, updates documentation, and opens the PR without the engineer touching code. The review step uses a deliberate fresh context window because same-session self-review is biased toward confirming what was already done.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Voices

Who's talking.

00:44hostPeter Yang
00:44guestKun Chen
Chapters

Where the time goes.

00:0001:04

01 · Cold open: the no-review thesis

Highlight reel of key claims: no code review, 20-30 agents running, 20-40 PRs per day.

01:0406:22

02 · Plan-Code-Validate framework

Kun explains his three-phase loop and how investing more in planning allows agents to run longer in code and validate autonomously.

06:2218:13

03 · Demo: Lavish visual planning

Live demo on the hi-bit AI tutor app: screenshot-to-agent workflow, why HTML beats markdown for planning, interactive option selection.

19:5323:21

04 · Brainstorming a project from scratch

How Kun uses Lavish to turn a rough idea into a spec, letting the agent criticize and propose risks before committing to a direction.

23:2136:24

05 · Parallel sessions + Treehouse + sub-agents

Running 5+ sessions, 20-30 sub-agents; Treehouse as worktree pool; when to use sub-agents for context management; ProgramBench evaluation demo.

36:2445:19

06 · No Mistakes: automated code review pipeline

The nm alias triggers a full pipeline: fresh context review, end-to-end tests, docs update, lint, push, PR with risk classification.

45:1950:18

07 · What you still need to look at before merging

The PR risk assessment is the only thing Kun reads; he merges low-risk PRs without diff review. Discussion of team processes at 10x PR velocity.

50:1856:18

08 · Three pieces of advice for agentic engineering

Build many throwaway things; run more agents in parallel; adopt AI in every manual step, not just coding. Demo of Claude Code /insights.

Atomic Insights

Lines worth screenshotting.

  • If you review every single line of AI-written code, you become the bottleneck, not a quality gate.
  • Planning quality is the multiplier: a detailed spec lets agents run autonomously for hours; a one-line prompt gets you five minutes of work and a re-prompt.
  • Reviewing code changes in a fresh context window catches far more edge cases than asking the same session to self-review, because the same-session agent is biased toward believing its own work is correct.
  • At 20-40 PRs per day, team processes built for 10-15 PRs per month collapse, and PR reviews, QA gates, and merge etiquette all need to be rebuilt from scratch.
  • Sub-agents are most valuable for carving off large exploration or experimentation tasks that would bloat the main session context window.
  • HTML artifacts are richer than markdown for human-agent planning: you can annotate, click buttons, and give structured feedback without switching windows.
  • Running 5+ parallel agent sessions is less about multitasking and more about ensuring you are never idle while an agent is running.
  • The only code review that matters at high velocity is a risk classification: low-risk gets a scan-and-merge; medium and high get a diff review.
  • Claude Code's /insights slash command analyzes your past sessions and generates CLAUDE.md and skill improvement recommendations automatically.
  • The missing ingredient for most people is not a better model but a forcing function to run more agents in parallel and accept that they do not need to see every intermediate step.
  • Most agent harnesses do not proactively spawn sub-agents; you have to prompt them explicitly, especially for large parallel experiment batches.
  • End-to-end testing instructions in AGENTS.md are what turn agent validation from a unit-test rubber stamp into something that actually catches regressions.
Takeaway

Move yourself out of the loop, not just the coding.

WHAT TO LEARN

Shipping at high velocity with AI agents is not about better prompts — it is about systematically removing yourself from every step that does not require your judgment.

01Cold open: the no-review thesis
  • The opening claim is empirical, not philosophical: Kun ran parallel human-and-agent reviews until he confirmed he consistently added nothing the agent did not already catch.
02Plan-Code-Validate framework
  • Invest disproportionately in the planning phase: a detailed spec lets an agent run autonomously for hours; a one-line prompt gets you five minutes of work and a re-prompt.
  • Parallel sessions are not a power-user trick — they are the baseline. Running one agent at a time means you are the bottleneck every time an agent is working.
03Demo: Lavish visual planning
  • HTML artifacts are richer than markdown for planning: interactive buttons, inline annotations, and visual layout proposals replace copy-pasting from a wall of text.
  • Asking the agent to propose multiple options as a visual artifact and clicking to select is faster than typing feedback and reading long terminal output.
04Brainstorming a project from scratch
  • When starting a new project, give the agent your initial thinking and explicitly ask it to criticize your plan and surface risks before committing to a spec.
05Parallel sessions + Treehouse + sub-agents
  • Sub-agents are most valuable for carving off large exploration tasks that would bloat the main session context window, not for general delegation.
  • Every tool Kun built originated from a workflow friction he noticed himself: when something slows you down, build the fix rather than tolerating the cost.
06No Mistakes: automated code review pipeline
  • Use a fresh context window for code review — the same session that wrote the code is biased toward believing its own work is correct and will miss edge cases.
  • End-to-end testing instructions in a project AGENTS.md file are what separate agents that rubber-stamp their own work from agents that actually validate regressions.
07What you still need to look at before merging
  • Risk-classify every change before spending time on it: low-risk gets a quick scan and merge; medium and high get actual diff review.
  • Team processes built for 10-15 PRs per month break at 10x velocity — PR review gates, QA processes, and merge conventions all assume a human writing speed that no longer holds.
08Three pieces of advice for agentic engineering
  • Build many throwaway projects — the reps are how you discover where agents underperform and where your workflow assumptions are wrong.
  • Every manual step you take is a prompt you have not written yet: if you find yourself doing something by hand, ask whether an agent can do it.
  • Claude Code's /insights slash command analyzes your past sessions and surfaces what CLAUDE.md instructions and skills would make future sessions more efficient.
Glossary

Terms worth knowing.

L8 engineer
A principal or distinguished engineer at Meta — one of the highest individual-contributor levels, equivalent to a senior director on the management track.
Worktree
A git feature that creates a parallel directory checked out from the same repository, letting you run separate branches simultaneously without stashing or switching contexts.
No Mistakes
Kun Chen's open-source post-coding pipeline that runs Intent analysis, Rebase, Review in a fresh context window, Test, Document, Lint, Push, and PR creation automatically after an agent finishes a coding task.
Lavish
Kun Chen's open-source tool (npx lavish-axi) that has agents write planning proposals as interactive HTML artifacts instead of terminal text walls, enabling click-to-select feedback and inline annotations.
Treehouse
Kun Chen's open-source worktree pool manager: one command drops you into a pre-configured worktree with dependencies already installed, eliminating naming and setup overhead.
OpenCode
An open-source alternative to Claude Code that supports multiple AI models via a unified interface, allowing engineers to swap models mid-project.
ProgramBench
A coding benchmark (successor to SWE-bench) that asks agents to build programs like FFMPEG from scratch and measures whether all requirements and test cases pass.
Context window blowup
When a long agent session accumulates so much conversation history that it hits model context limits, forcing compaction and degrading the agent's ability to reason about early context.
Fresh context window
Starting a new agent session with only the minimal context needed for a task, deliberately discarding prior conversation history to avoid confirmation bias in review or validation steps.
Resources

Things they pointed at.

00:00toolTreehouse
19:53productLinear
31:00toolProgramBench
10:00toolOpenCode
Quotables

Lines you could clip.

00:05
If you review every single line of code, you become the bottleneck. So I don't review this first pass code from the agents.
Contrarian opener with immediate follow-through, no setup needed.TikTok hook↗ Tweet quote
32:45
Eventually, I got to a point where I find myself never catching anything the agents don't catch.
Empirical validation of skipping code review with credibility from years of parallel testing.IG reel cold open↗ Tweet quote
38:20
Our workflows and how our teams work were built at a time when we spent most of our time coding. But when you start to write 10 times more PRs, we are not ready for that.
Names an industry-wide structural problem in two sentences.newsletter pull-quote↗ Tweet quote
41:52
I feel liberated.
One-word emotional payoff after a long setup about leaving big tech, clippable with context.TikTok hook↗ Tweet quote
53:18
Build every single idea you have. Whenever you have some idea, send the prompts to the agents and see what it does.
Direct actionable advice, quotable as motivational content.IG reel cold open↗ Tweet quote
Topic Map

Where the conversation goes.

01:0406:22densePlanning philosophy and spec quality
06:2218:13denseLavish HTML artifact planning tool
23:2131:00denseParallel agent sessions and worktrees
31:0036:24steadySub-agents and context management
36:2445:19denseNo Mistakes validation pipeline
45:1950:18steadyTeam processes at 10x PR velocity
50:1856:18steadyAdvice for getting better at agentic engineering
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogy
00:00If you review every single line of code, you become the bottleneck. So I don't reveal this first pass code from the agents. Eventually, I got to a point where I find myself never catching anything the agents don't catch.
00:12I typically have, like, at least five different sessions actively running. On average, there's, like, 20 to 30 agents running. Most of the time, uh, it's, like, 20 to 40 kind of PRs every day.
00:23Our workflows and how our teams work were built at a time when we spent most of our time coding. But when you start to write, like, 10 times more PRs, we are not ready for that. To really scale up how much we can get from the agents, we have to move ourselves out of the loop as much as possible.
00:44Hey, everyone. Today, I'm really excited to welcome my friend Kun, an l eight engineer from Meta at Microsoft who's now a solo AI builder.
00:52Kun is gonna show us exactly how he builds products using agents. I've been asking him a lot of dumb questions about all this, so we're really excited for him to show us live. So welcome, sir.
01:02Thanks for having me here, Peter. Alright. So, uh, let's get right into it.
01:05Maybe you can start, uh, by kind of walking through at a high level how you're building products with agents.
01:12Alright. That is my workflow. Plan, code, and validates.
01:16I don't think this is too different from what everybody does. So I'll probably talk through the parts where I think I'm doing something unique. So I think, typically, when we build something meaningful, we typically go through these phases.
01:29Right? We plan what are the requirements are, and then we let the agent code, and then we have to do some validation to make sure the agent actually did what we want them to do.
01:39So this the high level workflow, think, is pretty standard. Where I think I do something different is how much time I spend in each phase. So currently, I think I spend more time in the planning phase.
01:53So planning is, like, mostly me with assistance from the agents. The coding phase is pretty much entirely the agents. So I once the requirements are planned very clearly, I trust the agents to do most of the work.
02:07And then in validation phase, I use agents a lot as well. And agents do most of the work with some judgment from me when things are ambiguous. And I think the the the parts the part about this is that if we actually start to delegate most of the coding to agents.
02:27Mhmm. What I the way I think I can get agents to do more for me is to try to increase the amount of time agents spends in this phase because this is entirely agents.
02:39Right? So if we can get the agents to do to go for longer, uh, then I'll get more done.
02:45So this is one area where I tried a lot of things to just scale up the amount of time I can let the agents run autonomously. Yeah. It's almost like, uh, the code and the validation is a loop that the agent can run itself.
02:58Right? And and so that it can actually code for a longer time period. Yeah.
03:02Yeah. And, also, I think it depends on how much time we spent in the planning phase. So if I spend a lot of time crafting a very detailed plan, then I can let the agents go for longer.
03:12If I only write a very short prompt, then what I'll find is that very quickly, the agents will get work done, and then I'll need to go back and prompt them again. So, uh, like, how much time we invest in the planning phase actually affects this a lot. Okay.
03:27That's a really good point because I've I've gone, like, super lazy with these agents. I don't actually
03:31like, I just give them, like, one line prompts and, yeah, it it never works for hours. So, yeah, we'll we'll love to kinda see each phase.
03:39Yeah. Yeah. So, yeah, I think the things that we we can do differently in the planning phase is, like, go from a short prompt to say, what is the next action you should take to something more like a spec where you write down a more a more comprehensive set of details of the requirements and then go from spec to a goal.
03:58So if you can actually craft a measurable goal, you can let the agents do a lot of experimentation.
04:04Okay. Okay. So can you show us how this works?
04:07Like, maybe we can start with the planning phase, like, some some example plans that you write. Yeah. Yeah.
04:12Actually, there is another
04:14dimension of how I optimize this flow as well, which is like, if you look at this this timeline, right, the parts that need me is only, like, this beginning and the end.
04:26Right? So what I do is, like, I make make sure I can paralyze a lot of sessions. So so that's I'm always spending my time productively while the agents are doing the work.
04:37So I think increasing the the amount of concurrent parallel sessions, that's also a very important aspect of how I get more done. And do you parallelize sessions in the same project and product or, like, across prod products or both?
04:51Both. Both. So I have a hybrid of different projects.
04:54But even within the same project, I sometimes have multiple sessions doing different things. It's funny it's funny because we used to, uh, like, you know, both of us used to work in big tech, and
05:04it used to be a lot of context switching between meetings, but now you're context switching between different threads. Right?
05:10You know? It's it's actually it's actually faster context switching in some ways. Yeah.
05:14Yeah. Totally. I I think, uh, it's kind of like a, um, someone that's overseeing a very large scope.
05:20Right? There's always different things happening, and there are different things escalating to you. And you need to jump into different things depending on what is the where you are needed the most.
05:29So this is very much alike. Okay. This episode is brought to you by Linear.
05:34When engineers use tools like cursor, clock code, and codecs, a lot of work happens invisibly.
05:40Someone can go from a bug report in Slack to a shipped fix without creating any record of what happened outside of the code editor. And that's fine for speed, but it makes coordination harder as you scale. Linear integrates with the very best agent coding tools directly, like cursor and codex.
05:56That way, anyone can see what an agent is working on and who assigned them to the task. You get the speed of agents without losing visibility across the team. Product teams at OpenAI, Ramp, and Blog are all using Linear to collaborate with AI agents, and I use Linear myself to run my creator business.
06:13So check it out at linear.app/agents. That's linear.app/agents.
06:20Now back to your episode. Can you show us your, you know, AI stack or agent decoding setup? Yeah.
06:25Yeah. Yeah. Part.
06:26Yeah. Yeah. Let's do it.
06:27So this is my terminal. This is where I I do, like, all of my work pretty much. Occasionally, I I switch to a GUI or a browser, but most of the time, I'm spending here.
06:38So, yeah, I'm using a project here as an example to walk through it. So this is this is a project called Hivebit. This is the AI tutor I'm building for my son.
06:48It's an AI agentic harness for kids, basically. And I just built a new screen.
06:56So let me let me show you what that looks like. I revamped the the main screen a little bit, but this is very messy because I just did this this morning, and it's not looking good. This is like this is not how I want this to look like.
07:10Okay. So I what I'll do, like, a very typical workflow, I'll take a screenshot of this.
07:17Right? Took a screenshot, and then I come to my agent.
07:22I use OpenCode a lot, so I'm gonna just launch OpenCode in here. Mhmm. And you use it because you can use multiple models?
07:30Yeah. Yeah. Exactly.
07:31So I I can very quickly try different models when the new models come out. That is the big benefits I get from these open source tools. Makes sense.
07:39So yeah. So what I'll do here is I'll just say, hey.
07:43Look at this this screen. I'll paste the image here.
07:49And I'll say the things that we saw on the screen, the things that I'm I was not very happy about was there is too much technical details not that are not friendly for kids.
08:04Also, there is a big area of white space unused.
08:09Right? Those were the problems that we saw on the screen that's worth that were, like, clearly not ideal.
08:15So I'll I'll point out these problems, and I'll say, hey. Can you propose some options for how we improve?
08:27Right? So this is my the the request I sent to the agent. So because I sent the screenshots, the the model is gonna be able to see visually what is going on there.
08:37And then it's gonna look at the code base as well. So, yeah, it's very quickly came up with this plan.
08:44So it says, like, best direction, option one, option two. The thing with this plan is that it's not very easy to read. Right?
08:53So, like, when you look at this long wall of text, I like, this I I I I will spend so much time reading this text. So what I do instead let me just try a new session.
09:06What I actually do is I I use a visual editor to do the planning. So I'll say the same thing.
09:13Look at this screen. There is too much technical details. Same thing.
09:17Right? I will just add one bit to say use Lovish to discuss this with me along with any questions.
09:30So Lovish is a is a visual editor I built after I read the article about HTML
09:40over markdown. Have you seen that? Yeah.
09:42Yeah. The from the third the week. Yes.
09:45Yeah. Yeah.
09:46Initially, when I saw the article, I was not very sure about that because I I felt like HTML is gonna be so token inefficient. Right?
09:55The models will have to write a lot more than a simple markdown. But when I tried it, it's actually super useful. So I'll show you once once we have this result from here.
10:07The HTML as an artifact can be a lot richer in terms of, like, supporting this collaboration between human and agents. So it's not gonna be a long wall of text I have to read through. Yeah.
10:18It's gonna be, like, very visually things I can just interact with. So Lavish is a is is like a app that you build to create the HTML in the format that you want? Is that what Yeah.
10:28It's a, um, it's a tool I built. Uh, so what I do is, like, I, uh, every time I encounter any kind of a friction in my workflow and I don't find anything that can solve the problem for me, I just build something myself. Yeah.
10:41So, yeah, Lavish is a a tool I built. Uh, it's a tool for both generating the HTML artifact and also supporting the, uh, back and forth interactive experience between human and agents on that.
10:54Because what you what we could do is I can just ask the agents to generate a HTML file. Right? And I and then I can open up the HTML file in the browser and it works.
11:04The problem with that approach is that once the HTML file is open and I I look at the old HTML file and I see that there are some things I don't like, it's very hard for me to then tell the agent, hey. Please change this part. Please iterate on this aspect.
11:20Right? So that back and forth is what language editor is trying to solve.
11:26Awesome. Yeah. I'm really excited to see what what it is.
11:28Yeah. Yeah. Yeah.
11:29So now he's writing the HTML. It'll probably take a little while because that's usually a lot of contents to write. So let's see.
11:38Okay. What I maybe one thing I can show here is that while the agents are working, typically agents either coding or planning can spend quite some time doing this work. So what I do is I'll just spin up another parallel terminal tab, a window.
11:54I I use TMux. So this is a new TMux window. And in this window, I will do something else.
12:00And we can see it's in the same directory. The problem here is that if I spin up another agent to walk in the same directory, they will run into each other.
12:09Right? So what this agent does in this session will, like, step on toes of the other agents that were that's already doing the work. Yeah.
12:18So this is where people started using work trees. So typically, people what people do is, like, get work tree ad and give another directory, like, hi, bits, and spend, like, five minutes thinking about the name.
12:30But I'm just gonna say, like, hi hybrid two. So the the thing the problem with this approach is that once I create a work tree like this, next time I come to this work tree, I have to think about what is hybrid two doing? Like, what is this this work tree doing?
12:47Right? Is it still being worked on? Is it, like, okay to, like, use for something else?
12:52It's very hard to keep track of. Yeah. And the other problem is, like, when we create a new work tree, the dependencies are not installed in the in the work tree.
13:01So in this work tree, we have things like node modules. Right? Like, these are dependencies downloaded on the fly.
13:08And these dependencies won't exist in the new work tree until you install all all of them again. So there were many problems like that. And just for people who don't know, like like, what's the definition of the work tree?
13:20Is it like a copy of the code base? Right? Or Yeah.
13:22Yeah. So a work tree is basically like a you can think of it as a clone of your current git repo in another directory.
13:30So it's gonna be a parallel direct directory, and so they don't directly interfere with each other. So you can do diff a different kind of work, different set of work in the work tree, and it won't affect what you were doing in the main repo. Okay.
13:43But but you're saying that there's, like, many issues with the work tree. So what Yeah. Yeah.
13:47Yeah. Basically, there's a very heavy, like, cognitive load to maintain the work trees. You have to think about which work tree is which and which ones are okay to clean up, etcetera, etcetera.
13:59So what I did was I have a tool called Treehouse. So Treehouse is basically like a a a a no brainer, like a a very, like, dead simple way to manage work trees. So every time I have to spin up a new work tree to do something new, right, I don't need to think about, do I have an another work tree I can use?
14:20Do I create a new one? I just type TreeHouse. And TreeHouse will basically set up the work tree for me and drop me into the new work tree.
14:28So now it you can see it's set up a work tree in this directory. Right? Okay.
14:32And it dropped me into it. And the the good thing is that this directory is a is from a pool of managed work trees.
14:42So so the dependencies are already installed here because I have used this work tree before. So I don't have to, like, reinstall the dependencies, rebuild the project every single time.
14:54It also saves on the efficiency aspect. So, yeah, just, like, reduce the mental load a lot. I don't need to think about anything.
15:00I just type TreeHouse every time I wanna start a new session. That makes sense. Okay.
15:04Alright, dude. Well, let let let's go back to the other tab. Yeah.
15:07So this is what's the HTML looks like. Mhmm. So it's saying, hey.
15:12Redesign discussion. It's basically there's a tiny icon here not available. Not sure what happened there.
15:19But, basically, it's it's wrote the proposal in in a visual artifact. Right?
15:26So what's going what's feeding off? The screen is doing, like, grown up work in kids' space exactly right.
15:34And these things there's a unused space.
15:38Yeah. And so this is easier to skim and and read for a human, basically.
15:42Yeah. Yeah. Yeah.
15:44If if there's something I I look at this artifact and I if I see something that doesn't feel right, I can just annotate. So bit has no visible body. I can say I just click on this and say, I don't care about this and give the feedback to the agent this way.
16:03Oh, I see. So this is your app. Okay.
16:05Got it. Got it. Okay.
16:05That makes sense. So this is a lot more difficult to do when it's a long wall of text. Right?
16:11When it's a wall of text, you have to say to the agent, hey. I I I I don't I'm not happy about this part of the spec, and you sometimes have to copy paste a lot.
16:21Got it. Yeah. So it basically proposed a bunch of things.
16:25Copy, clean up. Yeah. Some of the layout things is not ideal, but, yeah, I I get it.
16:31It's it's easier to read for sure. Yeah. Yeah.
16:33And I I I think there's probably, like, something that went wrong in this page. Let me let me let me just check. I can just ask the agent as well.
16:41Because when I look at this, I think the agent is trying to give me a visual representation of the layout. But because of the CSS, it's not quite working or something. It seems the CSS styles not working.
16:58Let me fix it. So, yeah, I can just send feedback back to the agent this way, and I don't have to keep switching between the HTML artifact and the agents in the terminal.
17:09Yeah. I can just talk to the agent here, and I can easily annotate everything and just points the
17:16pinpoints exactly where I mean. Can you show folks where they can download this tool? It's it's it's open source.
17:21Right? So it's in my GitHub repo,
17:25lavish axi, in this repo. And it has it's actually very simple to start using it.
17:34Just tell your agent, use NPX LavishAxie to write the technical plan or do whatever you want. Yep.
17:41And the agent will go invoke this and everything goes on from there. And you have to you have to hook up your own API key for the IOM? No.
17:51You just you just use whatever agents you are already using. This Lavish editor itself does not run another agent.
17:59It runs within your agent session. So I I can show Okay. Got it.
18:02Yeah. So you can see here. The agent Oh, it's Shannon here.
18:05Calling lavish axi to pull, like, this this artifact. Okay.
18:10That makes sense. So let's come back to it. Yeah.
18:13So so now it it fixed the CSS problem. Right? This is what what it's supposed to look like.
18:19So you can see, like, this is a lot, like, more visual and easier to understand. Looks a lot better.
18:25Yeah. Yeah. So this is, like, pointing us the current layouts, current problems, and then it's probably, like, proposed a new thing.
18:33Okay. So it proposed four directions for using a space better.
18:38Option a looks like this. This is like this is so much easier to see, right, like than, like, the long wall of text we have in the in the terminal. Yeah.
18:49So here we can see, okay, it's moved layout a little bit. Now this is the chat. This is some other area.
18:56Okay. That's one option. And it even gave me buttons.
19:00So if I like option a, I can just click this button and I get the option a. Got it. So option b looks like this.
19:08Today's goal. Okay. Option c is this.
19:13Okay. Option c is very simple. I actually like this.
19:16Option d. Okay. Yeah.
19:19So let's say I like option c. I can just click this, and it basically queued a a piece of feedback to the agent saying I like option c.
19:28So it's just so easy to interact with. I don't have to keep typing every time I wanna tell the agent something.
19:34Everything can be done interactively.
19:36Okay. So and and this is the plan phase for, like, building a new feature on top of the existing app. Right?
19:42Yep. I'm curious and maybe you have to show this one. I'm just curious, like, how you plan something from scratch initially.
19:47Like, did you, like, spend a lot of time planning, like, the the milestones and the tech stack and that kind of stuff? Yeah. Yeah.
19:53So if it's something from scratch,
19:56I usually have to spend a little bit more time. So what I do is that I use the same lavish editor. I tell the agent that I I want to brainstorm a new idea with you.
20:07And I'll probably, like, talk through some of my initial thinking for what things I think are the core parts of my idea. And then I'll ask the agent to criticize that and come up with, like, areas of risks or weaknesses I haven't may maybe I haven't thought through yet.
20:27And then come back with its its opinion. And the agent will then come back with a HTML artifact like that. And I can look at the artifact to basically, like, work with the agents to refine the idea to a point where it becomes a spec, basically.
20:44Do you always, like, include some certain sections in your spec, like, build it in three phases or, like, here's the milestones or, like, here's the tech tech stack I wanted you to use, like, that kind stuff? Yeah. Yeah.
20:54So for some projects, for some ideas, I already have some opinions on things to use and things to do. In those cases, I'll just write them down and say these are my preferences. But I always tell the agents that it's okay for you to push back if you see something that is not right because I want to give the agents the flexibility, and I want to see more options as well.
21:16So, yeah, I I I basically, like, give my ideas to the agents, but let the agents give more back. So so then do you have, like, a user level agent.md
21:25or something that, like, has has some of these best practices? Like, you know, you can push back on me or it's just more natural through the conversation.
21:32Yeah. So I I actually built a lot of those instructions into Lavish editor.
21:38Okay. So whenever the agent is is using the Lavish editor to work with me, the agent already knows a lot of those, like, those best practices.
21:48Got it. Okay. And how about how about, like, if you're building, like, a user facing product,
21:52how do you think about the design? Do you have, like, another tool for design, or you just you have some skills? For design you mean visual design?
21:59Yes. Yeah. So for visual design, I like cloud design a lot since it came out.
22:05I use that a lot. And very often, I I'll use a lot of the quota they have for me. So if you look at this this this bar where I track my quota Yep.
22:16Cloud, I mostly used up my weekly quota already. I'm waiting for the reset.
22:21And cloud design, I used, like, two thirds of it. Okay.
22:26Because I yeah. I just find it very useful to especially for new projects, I use this a lot to build a new design system.
22:35Because once I get the design system built, I can apply that to many, many different components in my project very easily. Okay. May maybe you can show that later, but why don't why don't we finish this work with your question first?
22:47Yeah. Cool. So, yeah, we basically, we chose option c.
22:50Right? So now we can just say, hey. Build option c now.
22:56And because we already have the plan written in the HTML artifacts, the agent already has the context on what that means and what's the choices were made. Right? So the agent can just, like, go ahead and and implement that now.
23:11How many, like since you're just, like, building solo now at at home, like, how many of these agent building sessions do you have going? Like like, agent actually building something for you at any given time. Like yeah.
23:21Yeah. Yeah. So I I, like, closed as many sessions as I could before I started this session.
23:28But I typically have, like, at least five different sessions actively running. Okay. And in each session, there are usually, like, a bunch of sub agents or different agents working.
23:39So in total, I never, like, really counted, but I I would I would guess, on average, there's, like, 20 or two thirty agents running. Okay. Got it.
23:48Wait. Oh, as you mentioned, you have sub agents running.
23:50Like, you actually specifically asked it to run sub agents or, like, it just decides to like, when when when do you actually need a sub agent versus just using one agent? Yeah. Yeah.
23:59Great question. So I think the most of the models today and the harnesses,
24:04they are not very great at proactively using sub agents. There are only a few cases where, like, Cloud Code or Codex will proactively use the sub agents.
24:13It's when, like, they have the they're built in agents like explore. So when you ask a complex question, cross code will often run a explore sub agents, right, to do some exploration in the code base and come back with some investigation results. Those are the cases where the models will proactively use a sub agent.
24:33But in a lot of cases, because the models, I think they are not trained enough yet to use sub agents in various different kind of cases, you often have to prompt it to do so. Got it.
24:44Okay. What are some cases where you actually wanna prompt it to use sub agents like to like, validation? Or Yeah.
24:50So the reason I think the the main reason I would use a sub agent is to avoid context context window blowing up in the main agent's session. Oh, I see.
25:00Yeah. So what I what I do what I I think the time when I choose to use sub agents is when I realize what I'm about to do is gonna use a lot of context. And most of the context is gonna be, like, investigation kind of exploration kind of scenario.
25:18And Yeah. Most of the exploration may be not meaningful for the main session.
25:22So in those cases, basically, I, like, carve out those sub agents to do those investigations and only come back to their conclusion.
25:29Okay. So it's like a like, hey. Hey.
25:31Spin up a sub agent to look at this code base or do some research on this topic and summarize it and give it back to the main agent,
25:38like, that kind of stuff. Right? Yeah.
25:39Yeah. Or, like, there are cases where I have, like, 10 experiments ideas to run. And each experiment I each experiment can be done in isolation.
25:50Right? So in those cases, I also, like, just say, hey. Like, spin up 10 sub agents to do that.
25:55If I do that all in the main agents, it's gonna just, like, blow up the context window and take a lot of time and and tokens as well. When you say experiment ideas, you mean, like like, a b testing stuff or or or what? Like like, different ways to build things?
26:08Yeah. So there are various kind of experiments I run. There's one example here I can show.
26:14So this is one something I'm running. This is the one I didn't kill.
26:19So this is a this is a a benchmark I'm running to evaluate the effectiveness of different programming languages when given to agents.
26:29And there was this benchmark that were that was published, like, two weeks ago called program bench. It's called program bench.
26:36It's built by the same people that built sweep bench k. And it's their new thing.
26:41And program bench basically ask the agents to build a a bunch of programs like f f m MPEG, like these tools from scratch and see whether the agents can actually get all the requirements done and pass all the test cases.
26:57So that is the that was the benchmark. But I thought the benchmark can be very useful for evaluating different harness harness techniques and also different programming languages.
27:08So right now, what I'm evaluating here is I'm I'm running program bench on codex. And I I force codex to use these programming languages like TypeScript, JavaScript, Python, and see when they use different languages, do they get different results?
27:26Right? Is there a programming language that will that will lead to the agent getting more requirements done and passing more tests and use less tokens, etcetera, etcetera?
27:37So so this is a very large amount of experiments. Basically, like, there are, like, 200 multiplied by Yeah.
27:46Eight. Right? So there's that that's a lot of things to run.
27:50And in those cases, I I basically, like, have sub agents running. And if I run all these in a single main agent, it's just gonna keep running compaction and not gonna be very efficient. That makes sense.
28:01Okay. Cool. Let's go back to the Kita app.
28:03Yeah. So it looks like it's running a bunch of tests right now. Right?
28:07So, like, is that just the model knows to run tests, or you actually you you have some instructions to have it built unit tests and stuff like that. Yeah. So Yeah.
28:15I typically in my agents dot m d in each project, I will, like, have some instructions for how to perform tests.
28:24So here, for example, I can show the agents m d here. So in this is the agents m d for the hybrid project we were looking at.
28:35And in here, we'll just have some, like, high level context on the structure of the project. And then I'll have some testing instructions. This is actually super helpful.
28:45So previously, I didn't do this, and I let the agent decide what to do. And the agent will just do the like, kind of do the minimum. And they they they are trained to run some basic testing, but they are not gonna be comprehensive enough.
29:00So what I have here is, like, instructions for how to do end to end testing. This is important for, like, building front ends and UI kind of projects. Right?
29:10Well, we were looking at high bits, which had a GUI. So in this case, I tell the agents, hey. This is a electron app.
29:20You can drive this this app by running a browser and and blah blah blah. How to do this in in testing? How to actually test things end to end?
29:29So with that instruction here, the agent will will, like, just once it's done its work, it will actually validate things end to end for me. So that can save me a lot of time from, like, running the app myself and visually validating is that actually what I want.
29:45Okay. So it's basically, like, using browser use and checking out the apps, see if it looks okay, maybe checking some browser errors? Yeah.
29:52Exactly. Yeah. Yeah.
29:53And take screenshots as well. Take screenshots and look at these things visually
29:57and see whether it's actually aligned with what we talked about. I think if you use the Codex app, I I think it does it by default. But, like, let let's say, like, I'm not very technical.
30:06Like, how do I even know to
30:08include the stuff? Should I just tell the agent to run a lot of tests? Or Yeah.
30:11Yeah. Yeah. So, typically, what I one thing I one thing that's really interesting I found is that by default, the agents like to write unit tests, like Yeah.
30:22Very purely code based unit tests. And those unit tests often don't actually validate things end to end.
30:30So for example, even in codecs, I think codecs by default likes to use the built in in that browser. Right? Yeah.
30:38So when you work on some front end changes, it it will use the in that browser to look at the change and have you look at that as well. But this is an electron app.
30:48It's a desktop app. So it actually requires a different set of, yeah, facilities to validate that. So the instructions here are basically how I would test this thing myself.
31:01Okay. Yeah. So, basically, like, the more the more things that I find myself doing that I can dedicate to an agent, I turn them into instructions and then let the agents do the work instead of me, like, operating the app myself manually.
31:16Okay. Got it. Okay.
31:17So so I guess, like, someone who maybe is not as knowledgeable as you can just, like like, I guess the general principle is, like, if you're doing something manually,
31:24like, you're manually opening an app and looking at the screens,
31:26just ask the agent, hey. Can you just automate this for me? Right?
31:29Just just ask it, and, hopefully, it can figure some something out too. Yeah. Yeah.
31:33So, yeah, if you are, like, not trying to dig into the technical details, then the principle the high level principle is, like, if you find yourself manually doing something, then try to turn that into something the agent does for you. And you can very likely like, with today's models, you can very likely just ask the agent to, like, to do what you were trying to do.
31:54And the agent will figure out, oh, I should do this. I should do that. Alright.
31:57Well, it looks like it's done now. Is it Yeah. It's done now.
32:00So so now good question. Right? Like, it's done.
32:04The agent says it's done, and we can look through what it did. Right? It said it changed this, changed that.
32:09How do we know this is actually a good change? Right?
32:12How do we know there is no, like, bugs and everything? So the validation phase is where, like, I see a lot of people spend a lot of their time.
32:21So the default approach is, like, people will open up their IDE and start to review the code. Like, they will start to review the diff.
32:28Right? Yeah. But the the the thing is that AI can write so much code.
32:34So if you review every single line of code, you become the bottleneck. So what I do here is I I don't even review the code. I don't review this this first pass code from the agent.
32:47I use something I call no mistakes. So no mistakes is another tool I built just to help make this part of the my life easier. So what it does, I'll I'll I'll show you.
32:59I actually made a alias. So every time I got some change, like, coaching is done from the agent, I just n m. And it will go through a few steps.
33:09First, it will ask the agent to create a branch for me so I don't even need to think about the branch name. Otherwise, I need to think about the the branch name, the commit message, all those things.
33:21Just it's just wasting time. And I get the agent do that. The agent basically did that fixed chitchat workspace.
33:28That's right. Right? Mhmm.
33:30And the agent is now analyzing my session to understand my intent.
33:36So the the agent here, no mistakes, is reading the session where we did the work to understand my intent. So now it's understood what I I was trying to do. It will do the all these steps for me.
33:48So it will rebase my change on top of the latest main branch on the remote. So there's gonna not gonna be much conflict later on. It's gonna review my change.
33:58So this is where I actually did a lot of prompt engineering to get the agents to really scrutinize the change very, very hard. Okay.
34:09So any kind of edge case or bugs, like logical errors, things like that will get caught. So this is a very, high recall phase.
34:19I mow I when I initially built no mistakes, I did a lot of parallel testing where I let the agents review the change, and I also review the change myself and see how often I catch something the agents, uh, don't.
34:34Right? Um, and I use that phase to, uh, iterate on this, uh, the the prompts and the, uh, the workflow within this phase. So eventually, I got to a point where I find myself never catching anything the agents don't catch.
34:49So in in this case, the agents act the they actually didn't find any material problems. So it's just passed. But if if it found some problems, it will categorize that into two categories.
35:03One is obvious bugs. So if it's just obvious error, it will just auto fix by itself.
35:09It won't even bother me. Mhmm. Another category is like when it's realized there's an error, but fixing the error will have some product implications.
35:20And then it will ask me instead of just auto fixing that. So in those cases, it will escalate to me, and it will basically pause at this phase and ask me to judge,
35:31do I actually want to make that fix, or do I want something else? This is like the PR review, basically. The agent doing PR review.
35:37Right? Yeah. Yeah.
35:38PR review between the agent and the author. And this no mistakes is like a whole new context window. Right?
35:44It's like a new agent looking at your other conversation.
35:48Yes. So this is a fresh context window and and actually did that deliberately.
35:53I think that's an important thing to do, which is to use a fresh context window to review the change that was done. Because a lot of people, what they do is, like, they will just ask, hey. Can you review the change in the same session?
36:07When you do that, the agent is very heavily biased by what was already done because it it it saw all the context. It saw every every step along the way.
36:18So it's biased into believing that what was done was correct. Um, and it will because of that, it will sometimes miss something. Um, so if you, uh, I I I tested this a lot.
36:30Um, and when you use the fresh context window, you get just get a lot more edge cases caught. I guess the only problem is, like, the no mistakes agent has to does it have to look at your whole code base again to even understand what this app's about? That's what this intent phase was doing.
36:44So it basically analyzed your session to understand what was your original intent and some of the surrounding context as well. Okay.
36:52But it's not copying the entire session into this new context window. It's it's like it's like, you know, like, some senior engineer build some feature, and then you're asking the principal engineer to come in with fresh eyes to look look through everything.
37:05Right? Yeah. With fresh eyes.
37:07But you usually, you will ask the senior engineer to explain a little bit of context to to the principal. Right? That's right.
37:13Yeah. Yeah. So this intent phase is basically that.
37:16It's basically, like, explaining the basic context
37:19of what this change is trying to do. Okay. And Yeah.
37:22Why don't we walk through the rest of the phases too? Like, documenting is what?
37:25Just writing what is observing?
37:28Yeah. So yeah. So each phase, what it does is, like, review is just reviewing the code, and test is running tests.
37:35And the test phase is very different from what the agent does by default. So the agent what the agent does by default is running some tests and validating locally, like, was the change was the change tested and was that working?
37:52But this test phase is a little bit different. It's more like CI. It's validating, did this regress other things as well, etcetera, etcetera.
38:01And this test phase will actually present some evidences evidences of the change actually working.
38:08It would paste screenshots or, like, sometimes a video to capture this thing is actually working. So it's easier for me to review. I can just look at the artifact and see, oh, okay.
38:20It's actually working. Oh, that that's actually really interesting. So yeah.
38:23Because sometimes when I ship stuff with codecs, like, stuff I'm shipping works, but then it breaks some something else. It breaks, like, another core work workflow in the app. So so this test base will actually look through all that and try something through all that.
38:36Yeah. And just, like, present very easily digestible artifacts
38:40for me to, like, have confidence it's actually working as I expected. This is a bit a dumb question, but, like, for example, I'm I'm trying to build, like, a fitness app. Right?
38:48And, like and, like, there's, like, a few core workflows that I wanna make sure that it tests each time, like, creating a workout, tracking your workouts, you know, like so, like, do you do you have to manually define this stuff, or is the AI enough, smart enough to figure it out to to test the stuff each time you make a change?
39:04Yeah. Yeah. Yeah.
39:05I typically, like, try to get AI, the agent, to turn those things into an automated end to end test. Okay.
39:12Yeah. Because then it will be very easy to run that every single time. Right?
39:16And the automated end to end test is basically just like it let's say it's like a browser app. So it it just kind of like actually be the user and click click through stuff. Right?
39:24And see if see if anything breaks. Yes. Yes.
39:26So there are various kind of, like, end to end browser testing tools like Playwright. Yeah. So but but yeah.
39:32You you can just ask the agent. You can say, hey. Like, write an end to end test for this scenario or this user work this user flow and make sure it's actually working end to end.
39:42Uh, it will typically be able to figure out what kind of frameworks or tools, uh, that needs to be used. I think the trade off here, dude, is, like, it just takes a lot longer to actually ship a feature.
39:53Right? Because you're you're running all all these stages.
39:56But but I guess you have way more confidence that the feature you ship actually doesn't break anything. So so I guess if you're, like if you have a lot of users because a a lot of stuff I work on, Dart doesn't have any users. It's just me.
40:06Uh-huh. But if you have a lot of users that you ship to prod, you wanna make sure it actually works. Right?
40:10It's it's like software engineering one on one. Yeah. So I I I would argue, like, even if it's only for yourself, uh, like, probably you can make the trade off.
40:19Right? How much you want to, uh, prefer just making changes very fast versus making sure things actually work. Um, because sometimes there's, like, a little bit of a cost to you as well if things broke.
40:31Yeah. So, um, yeah. So this, uh, this phase taking longer time is actually okay because I never look at this.
40:39Like, I I I never just stare at this screen and wait for every phase to pass. Right?
40:45Every time I launch no mistakes, I just immediately switch to another session. Like, I I don't even look at this. What I have here, I I will show you.
40:55Now I switched to another session. Right? I can just look at the terminal screen here to see what phase is that no mistakes pipeline at.
41:04So So I can see it's working on the LinkedIn pipeline. And if it's, like, if it's waiting for me to, like, make a judgment or something, it will change the status here.
41:14So I can just, like, very easily see, do I need to jump back into that session? Do you run no mistakes after, like, almost every change?
41:21Or or, like, if because if you do that, then why don't you just automatically run it? Ah, yeah. Yeah.
41:26Yeah. So I run that on most changes, but not not every single one. Because there are changes where, for example, I make a very simple documentation updates, and I know, like, it doesn't need, like, so much validation.
41:39Got it. It's gonna use a lot of my tokens as well. So I make some judgments on whether the change just just justifies this kind of a heavy validation phase.
41:47Yeah. It's kind of like, uh, yeah, when you work within a team and some of your changes don't like it's not that every PR will go to a QA team. Right?
41:56Yeah. Uh, only some, like, milestones, some meaningful things will go there. Dude, do you think it feels weird, like, after spending, you know, your career in big tech?
42:04Because in big tech, when you push a change, you have, like, a teammate come and review your PR. Right? And and then you run some test tests.
42:09And and and now you're just by by yourself. So it's like so it's I guess, like, you have all these agents.
42:15But, like like, how do you feel? Like, do you feel, like, un unshackled, or or do you feel like you you kinda miss the teammates?
42:22I so it's a bit of both. But I would say, like, largely speaking, I feel liberated.
42:28Liberated.
42:29Yes. So I I think our teammates are great, especially in the brainstorming phase. So when we are, like, thinking about an idea, if it's just me, it's a very like, it's not a very diverse perspective.
42:43Right? So I may not think through everything and I may not realize problems others can see. AI can help to a degree, but I don't think AI is, like, quite there yet to replace, like, a really smart team that can ideate together.
43:00So that is, like, one the part I miss. The part I don't quite miss is, like, everyone's busy. And if I write, like, 20 PRs every day, no one's gonna review that.
43:11So that already happened before I left my last company. And what I found myself doing was, like, I I have to write less PRs.
43:23Got it. And spend my time elsewhere because the bottleneck is, like, really on on the rest of the team. Yeah.
43:29Because the your teammates aren't actually reviewing the PRs. Like, they don't have they have a lot of other things going on. But, like, if you submit a PR to AI, it's always gonna start work working.
43:37Right? Yeah. So this is something that I think is gonna, like, fundamentally change as we progress on AI adoption.
43:44So our workflows and how our teams work were built at a time when we spent most of our time coding.
43:54And the average stats of, like, a average software team an engineer is like an engineer will write 10 to 15 PRs every month. That's like the velocity of an average software engineer team.
44:08So when that's the case, you spend like, it's okay for everyone else to do code reviews and all these processes because the velocity is not that that's massive.
44:19But when you when you start to write, like, 10 times more PRs, we are not ready for that. Like, our process is in our, like, human team composition and everything is not built with that assumption in mind.
44:32So what's gonna happen is, like, things are starting to break. A lot of teams are starting to change their practices in order to fight that.
44:42So some teams, it's especially smaller teams in startups, they basically stopped doing PR reviews.
44:49Um, they still raise a PR, but mostly for, like, a formality or for, like, leaving a record. They don't actually wait for another peer to review. Um, they sometimes just merge the PR, and later on, if there's a problem, they they can go back to it.
45:02That's the kind of changes I'm starting to see. Yeah. They they get the agents to review.
45:07Right? I do think it does lead to, like, a little bit more unstable
45:10products.
45:11Yeah. But yeah. Yeah.
45:13That's because they are not using no mistakes.
45:16Okay. Yeah.
45:18It looks like it's done. Yeah. This pipeline just completed.
45:20Alright? So it went through all these steps, and there was actually one thing fixed in documentation phase.
45:27This is, uh, yeah. This is something I, uh, we can look at whether, uh, it's actually a legit change. But, um, this is something I find super useful and something both me and my agents often don't do automatically.
45:40Um, so it's like when you make a change, can you actually find all the places in our documentation that can be affected by that change?
45:49Okay. Got it. Yeah.
45:50So documentation, linting, and pushed and create a PR.
45:54So I can just open up the the PR and let's look at what it does. It So created this PR. The PR summarized the intent that was understood from my original session in open code.
46:07It summarized what changed. It did a risk assessment as well. So, like, what this is very useful as well.
46:13When I look at a low risk change, I spend less time. When the agent is flagging this is a medium risk or high risk change, I spend more time on this PR. Right?
46:23So I can, like, decide where I spend my time more intelligently. So and testing, yeah, it did some test and had had evidence.
46:33So let's see what this is. It renders the workspace. Okay.
46:40Yeah. Basically, like, the this evidence is is about presenting, like, actual results from the change.
46:48So we can look at this and see if that's what we want. Okay. And there is the pipeline, and there is the documentation phase.
46:55What did they find? It found that the design system example copy was not updated.
47:02Okay. Yeah. So it it actually it actually caught a inconsistency.
47:07So that's great. Yeah. So so because it's a low risk change, I don't even go into the diff here.
47:13I don't go there. I I just merge it. Okay.
47:16And when it's a medium risk or high risk change, then I go into the diff and start to look at things myself.
47:23Okay. But you you pretty much always somewhat look at the PR, scan the PR, and then you hit the button to merge.
47:30Right? Yeah. Yeah.
47:31I still look at this PR, uh, because I think looking through the risk assessment and, uh, what's the agent actually did, uh, what was the fix, uh, those things are actually still useful. That's the mistake that I'm making, dude. I I don't look at the PR.
47:42Some sometimes I just it's hard to merge. So I just need to be a little bit more thorough. Yeah.
47:48Also, after the agent made some code changes, you just just get it merged? Well, I asked you run some tests and stuff. I I don't use no mistakes.
47:56And then I I I I get to merge. And then yeah. And, really, like, you know, like, a a day later, I'll find something else broke.
48:03So, yeah, it's it's probably not the most efficient way to do it. Yeah.
48:07Yeah. So, yeah, I think some validation
48:09and then some some, like, review, but it's not a line by line code review. I think some review on what's changed and what kind of risks exist.
48:21That's still useful. And and you're probably submitting ten, fifteen PRs a day, right, or, like, doing doing this? Yeah.
48:27So I I I actually do a lot. So I like, 26, fourteen, twenty seven, 30.
48:36Yeah. That's like the average. So it's yeah.
48:39Most of the time, it's like 20 to 40 kind of PRs every day. Sometimes I do more.
48:45Like I I can tell I can tell when when, uh, you became unemployed. It's, like, sort of it's around March. Very very clear on this chart.
48:53Alright. So that's I guess we just walked through the whole plan, build, validation process. Right?
48:57Like, that's basically it. Right? Yeah.
48:59So, yeah, we went through, like, building a plan interactively,
49:03implementing that with the agents, and then going through this validation pipeline. This basically if you think about it, I didn't spend much time in the coding and validation phase at all.
49:15Right? Most of my time was actually on the HTML artifact iterating with the agents. Um, so that's kind of how I, um, how I do these things now.
49:23And, uh, as soon as I sent the agents to do implementation,
49:26I just switched to something else, uh, and work on that in parallel. Okay. So I guess we we we can provide the links to Lavish, the HTML plan planner, and also no mistakes, the validation.
49:37Uh, we'll provide that in the description of this episode. Uh, I guess dude, let me ask you one last question. Yeah.
49:43I mean, you, you know, you're like a LA engineer. You've been doing this for a while. There's there's, like, a lot more builders now.
49:48Right? Like, there's a lot more people trying to get into this stuff and learning how to build AI. Yeah.
49:54How do you think do have any advice for people to actually ramp up the technical skills? And and also, like, what kind of technical skills do they actually need to learn?
50:01Like, obviously, like, testing and valuing everything. Uh, but also just, like, there's there's, like, stuff like for example, like, if you don't set up your database properly in the beginning, like, it's hard to change it later. Just, like, just stuff that you learn over time.
50:12So so, like, do you have any thoughts on how people can just scale up as they build more stuff? Yeah. Yeah.
50:18Good question.
50:19I I think there's a few things come to mind. One is that I think just play a lot.
50:26Build a lot of things even if it's a throwaway toy. Build it. And through that process, you will, like, often often discover things you can do better or things the agents didn't quite do very well and start to reflect on that.
50:40So do think do a lot of things. I think that's, like, probably the first step. Some people, I think they what I see at least from some people is, like, they only they they spend a lot of time trying to decide what do they do.
50:55And then they only do one thing and that thing didn't work. They then they stop. I think the the mindset I would encourage is to just, like, build every single idea you have.
51:05Whenever you have some idea, like, send the prompts to the agents and see what it does. And whenever, like, you you have some, like, inspiration or idea you you think might be interesting, just give that to the agent and have it run for you.
51:21Mhmm. I think through that, like, process, a lot of learnings can be derived. That's one.
51:27Another, I think, is to, like, try to challenge yourself to use, like, more tokens and to run more agents in parallel.
51:38Like, I I think that is a forcing function for people to, like, upgrade their workflow. Because when we, by default, work with one agent at a time, we are still kind of, like, being a bottleneck.
51:51We are putting ourselves into the loop too much. And I think to really scale up how much we can get from the agents, we have to, like, move ourselves out of the loop as much as possible.
52:04So that's, like I think using more tokens and running more agents in parallel kind of forces us to do that. That's probably, like, another thing I can think of. Got it.
52:14Yeah. Maybe, like, the last thing is to, like, try to adopt AI in every part of your workflow, not only writing code.
52:23So what we could see there, like, AI did a lot of validation and documentation, all those things for me, right, and raising the PR and everything. I don't need to do anything there.
52:32I think, like, when when we work through a project, whenever we find something manual, what we talked about earlier, like, something we are spending time ourselves, Just try to think about, uh, can we dedicate that to the agent as well?
52:45Uh, and through that, uh, people, I I think, will find a lot more useful, like, workflows that can, uh, handle automation and reduce our workload. Yeah. Maybe there's, like, some sort of a skill or, like, some something we can build where
52:58because the AI remembers, uh, its conversations with you. So, like, maybe the AI can actually proactively suggest, like, hey. You should auto you should automate this.
53:05It's, like, the second time we're talking about this.
53:07Yeah. Maybe you better than x. Yeah.
53:09That exists. That exists. So I can show you.
53:12Okay. So if I run Cloud Code.
53:15Cloud Code has this slash command called insights. These insights will basically analyze your Cloud Code sessions and generate a report for what what can be done better.
53:27Like, what can what what kind of skills can you add? What kind of things can you tweak in your, like, memory files, etcetera, etcetera to make Cloud Code work more efficiently for you. Alright.
53:38Oh. Yeah. Yeah.
53:39So this is super cool, but it's gonna use a lot of tokens. I I'm already out of tokens, so I'm not gonna demo that now. Yeah.
53:46Yeah. But this is something I definitely recommend people trying it. This is a very cool thing.
53:50Okay. Yeah. I'm gonna write it right now.
53:53Yeah. I I I think the token maxing thing is kind of like a meme, but I think basically, like, just summarize your advice.
53:58Number one is, like, putting the reps and, like, try different things, try to build different things. Number two is, like, if you use multiple agents, you can put in more reps. Right?
54:06Because you don't have to wait for one agent to do anything. Yeah. And then and and then the third one is is sorry.
54:11What was the third one again? Part of your workflow? Yeah.
54:14Not only writing code. Yeah. I think the second one is especially hard, dude, because, like I don't know.
54:19Like, growing up as an Asian person, I have, like, a scarcity mindset. I I just try to save money and stuff. And Uh-huh.
54:25And, like, just trying to burn out tokens. It doesn't feel doesn't feel right.
54:29But there's a Yeah. So most of us, like, working as individuals, we have the subscription. Right?
54:36Yeah. Yeah. So at least try to make the most out of the subscription.
54:40Yeah. You can toss the quota. Okay.
54:42Yeah. So I guess it's kind of like going to a buffet and and, like, trying to yell the crab legs. I I guess I can.
54:48Yeah. But I I I I would say, like, there's the token maxing thing.
54:53I I think we shouldn't just use tokens for the sake of using tokens. Right?
54:58We want to get actual work done. So I I think it's more about pushing ourselves. Like, my my point about number two was more about pushing ourselves to figure out ways to scale up
55:11Yeah. And really, like, get more done with agents instead of, uh, finding ourselves into the loop and only do one thing at a time. That makes a lot of sense.
55:19That makes a lot of sense. Alright. Cool.
55:21Well, thanks thanks so much, man. Uh, where can people find your, like, like, all the free stuff you've been shipping and also yourself?
55:26Yeah. Yeah. So I'm very active on X and YouTube.
55:30I'm I plan to share a lot of my workflows and tools and setups over there. And I also my GitHub is also a good place to look at my projects.
55:41Your your GitHub is just slash kunchen. Right? Kunchen UID.
55:45So I I have this let me let me move my window here. Oh, there it Yeah. Yeah.
55:49This is my handle almost everywhere. So YouTube, x, and GitHub, LinkedIn is all this handle.
55:56Couldn't change the UID.
55:58Yeah. I think it's like a blessing to all of us that you're shipping all this stuff for free and, like, we can all try it. So, yeah, I'm definitely gonna try no mistakes and,
56:07you know, every everything else that you've built. Cool. Cool.
56:09Thanks, Peter. Yeah. If you run into anything, let me know.
56:11I I I'm constantly trying to improve these tools as well. Cool. Alright, Kun.
56:15Take care, man. Alright. Thanks, Peter.
The Hook

The bait, then the rug-pull.

Most engineers running AI coding agents still act like software reviewers — reading every diff, approving every step, personally validating every change. Kun Chen stopped doing that. The ex-Meta L8 principal engineer now ships 20 to 40 GitHub PRs a day, and he does not review a single line of code himself. This is the system he built to make that possible.

CTA Breakdown

How they asked for the click.

Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this