The argument in one line.
Agentic engineering is not about prompting a model once — it is about closing the feedback loop between code generation and automated review until the confidence score hits a threshold, then shipping.
Read if. Skip if.
- A developer actively using Cursor or another agentic IDE who wants a repeatable review-and-ship loop.
- Someone who has shipped AI-generated code but has no systematic quality gate before merging.
- A builder comfortable with GitHub PRs who wants to integrate an automated reviewer into their existing workflow.
- Anyone considering WhisperFlow or voice prompting and curious how it fits into a real coding session.
- You are looking for a conceptual overview of agentic engineering — this is a live build, not a framework lecture.
- You need a Greptile setup tutorial — the video assumes greploop is already installed and configured.
The full version, fast.
The workflow is three tools chained together: Cursor with GPT-4.5 xhigh fast writes the code, WhisperFlow handles all prompting via voice, and Greptile reviews every PR with a confidence score. When a score comes back below 4/5, the /greploop skill reads the comments, patches the code, pushes a new commit, and waits for the next review, cycling up to five turns until it hits 5/5. The key constraint the video surfaces is PR size: the reviewer agent breaks down on large diffs, so the author splits a 2,000-line PR into four stacked PRs under 1,000 lines each, greploops each one independently, and merges all four to staging.
Chat with this breakdown.
Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.
Create a free account →Where the time goes.

01 · Workflow overview
Three-tool stack introduced: GPT-4.5 xhigh fast in Cursor, Greptile /greploop for automated review, WhisperFlow for voice prompting.

02 · Pluto demo + feature goal
Tour of the Pluto app (chat, tasks, routines, files, finance, desktop computer). Feature goal stated: Claude-style artifacts panel.

03 · Prompting Cursor + plan mode
Voice prompt via WhisperFlow asks Cursor to research Claude artifacts, then generate a build plan. /code-structure skill explained. Sub-agents deployed.

04 · Plan review + PR strategy
Five-PR rollout discussed and accepted. Red Dead Redemption interlude while agent cooks.

05 · Scrimba sponsor
Scrimba full-stack developer path sponsored segment.

06 · First feature test
Agent finishes build. World War 2 artifact tested live — HTML preview renders in side panel. Bugs noted: streaming HTML visible, dark mode not updating.

07 · Iteration loop
Streaming suppressed (replaced with 'crafting artifact' animation), dark mode fixed, version history added, panel resize implemented. Feature declared working.

08 · PR to staging + Greptile review
Agent pushes branch, opens PR to staging. 2,000 lines added. Greptile returns 3/5 — draft-content leak bug and security issues flagged.

09 · /greploop cycle (3 to 4 to 3)
/greploop triggered. Score climbs to 4/5 then dips back to 3/5 as new issues surface. Decision point reached: PR is too large.

10 · Split into 4 stacked PRs
Author determines 2,000-line PR is too large for reliable review. Agent splits into four stacked PRs under 1,000 lines: fence contract, persistence, preview components, chat integration.

11 · Four PRs greploop'd to 5/5
Each PR runs its own greploop. All four reach 5/5 and are merged to staging sequentially.

12 · Staging test + wrap-up
Final test: artifact for best restaurants in Toronto. Sub-agent spawned for research; main thread stays responsive. Feature ships. Workflow recap delivered.
Lines worth screenshotting.
- The /greploop skill automates the entire fix-push-review cycle — you set it running and come back when it hits 5/5 or exhausts five turns.
- PR size is a hard constraint for AI code review: a 2,000-line diff overwhelmed Greptile and returned incomplete feedback.
- Splitting one large PR into four stacked PRs under 1,000 lines each lets the review agent cover every file rather than sampling.
- Voice prompting via WhisperFlow produces longer, more detailed prompts than typing — the author attributes richer agent outputs to this.
- Sub-agents in Cursor run on separate threads, so the main chat stays responsive while a background agent is doing research or generation.
- Stopping the greploop at five turns is deliberate — more iterations cause hallucination drift, not improvement.
- The plan generated in plan mode is primarily useful to the human, not the agent — it acts as a re-entry document for multi-session work.
- A 3/5 Greptile score after splitting PRs means the reviewer found real issues; a 5/5 on a large PR often means it missed things.
- Writing tests alongside every code change is what makes the greploop effective — the reviewer can verify intent without running the app.
- Code organized into a strict service-layer via /code-structure makes agent context windows more efficient because functions are localized and predictable.
Close the loop between generation and review before merging.
The moment you treat code review as a manual step, agentic development stalls — the workflow that actually ships is one where the reviewer and the generator cycle automatically until a quality threshold is met.
- Automated review tools return a confidence score that acts as a ship or no-ship signal — treat anything below 4/5 as a reason to run another fix cycle, not a reason to merge and hope.
- PR size is a hard constraint on review quality: a 2,000-line diff causes automated reviewers to miss issues, while splitting by concern into sub-1,000-line PRs produces complete, actionable feedback.
- Voice prompting produces longer and more specific instructions than typing — the quality of the prompt is directly proportional to how much context the agent gets, and speaking removes the typing bottleneck.
- Capping an automated fix loop at five turns is a safety rule, not a quality threshold — past that point the agent begins introducing new issues rather than resolving the original ones.
- Sub-agents running on separate threads keep the main conversation responsive; the ability to continue prompting while a background task runs is a workflow multiplier, not a cosmetic feature.
- A build plan generated before starting serves the human more than the agent — it provides a re-entry document for multi-session work and a shared vocabulary for follow-up prompts.
- Service-layer code architecture makes agent context windows more efficient because the model can scope changes to a single module without reading the entire codebase.
Terms worth knowing.
- greploop
- A Greptile skill that reads PR review comments, instructs the agent to fix them, pushes the updated code, and waits for the next automated review, cycling until the confidence score reaches 5/5 or five turns are exhausted.
- Greptile
- A code review tool that integrates with GitHub, reads every open PR, and returns a confidence score from 1 to 5 along with specific issue comments.
- WhisperFlow
- A local speech-to-text tool that captures voice input and types it into whatever the active application is, used here to prompt Cursor without typing.
- Stacked PRs
- A sequence of pull requests where each one depends on the previous being merged, used here to break one large feature into four reviewable chunks that can be validated and merged in order.
- Convex
- A reactive backend platform that handles database, real-time subscriptions, and server functions in a single deployment, used as the backend for Pluto.
- Artifacts (Claude-style)
- A UI pattern where a chat agent output that is visual or reusable is rendered in a separate panel beside the conversation rather than inline in the chat.
Things they pointed at.
Lines you could clip.
“It will keep going until it gets a five out of five.”
“You don't want the agent to keep editing — it's gonna start hallucinating.”
“I yap for a living. WhisperFlow just makes so much sense.”
Word for word.
The bait, then the rug-pull.
The workflow has changed. Not because the tools changed first, but because the experience of building complex features taught a different discipline — one where the model writes the code and an automated reviewer decides when it is good enough to ship.
Named ideas worth stealing.
The Greploop Cycle
- Push change to PR
- Greptile reviews and scores
- If score < 5/5: agent reads comments, patches code, pushes
- Repeat up to 5 turns
- Merge when score >= 4/5 or stuck
An automated quality gate where the review tool and the coding agent form a closed loop, iterating on the same PR until a confidence threshold is met.
PR Size Rule
Keep individual PRs under roughly 1,000 lines so automated and human reviewers can cover the entire change. Split large features by concern: data model, parser, UI, integration.
How they asked for the click.
“Would really appreciate a like, a comment, and subscribe.”
Soft ask at the very end after full value delivery. No mid-roll pitch other than the clearly labeled sponsor segment.




































































