GosuCoder · YouTube · 11:05

Augment Code: The Upgrade You've Been Waiting For

An 11-minute tool review arguing Augment Code's new task list is the best in-context implementation yet — backed by a 63.2% to 67.5% eval-score jump.

Posted

June 19th 2025

10 months ago

Duration

11:05

Format

Review

educational

Channel

G

GosuCoder

§ 01 · The Hook

The bait, then the rug-pull.

The opening is a thesis statement masquerading as a casual review. Inside the first 18 seconds GosuCoder names the central problem of AI-assisted coding (keeping the agent on track in big projects), then makes the bold claim that Augment Code's new task list may be the best implementation he's seen — which is the entire rest of the video unpacked.

§ · Stated Promise

What the video promised.

stated at 04:05“I'm gonna go through some of the features on it, but the big thing I wanna cover is it went from 63.2% in my evals to 67.5.”delivered at 05:20

§ · Chapters

Where the time goes.

00:00 – 00:30

01 · The thesis

Names the hardest problem in AI-assisted coding (keeping AI on track at scale) and makes the bold claim about Augment Code's new task list.

00:30 – 02:12

02 · Two schools of thought: split context vs same context

Whiteboards the landscape — Roo Code's orchestrator mode (split context, sub-agents) vs the same-context approach where one chat manages itself.

02:12 – 03:40

03 · Text task lists and Taskmaster

Walks through how he used to solve this — ChatGPT/Claude-generated text task lists, then bolt-on tools like Cloud Taskmaster (screenshot shown). Powerful but takes setup work.

03:40 – 05:00

04 · Claude Code and Augment Code's built-in task lists

Frames Claude Code's todo list and Augment Code's new task list as the in-flow same-context answer. No orchestration needed — the agent manages itself.

05:00 – 06:20

05 · The receipt: 63.2% → 67.5%

Shows his Best-AI-Agents leaderboard. Augment's eval score jumped from 63.2 (May 30) to 67.5 after the task-list update — a real, measurable boost moving it in line with Klein and Roo Code.

06:20 – 08:20

06 · Live demo: manual add, run-all, status control

Screen-share inside VS Code. Adds a task manually, edits status manually, hits run-all-tasks. Argues this is incredible because you can edit individual tasks without spending tokens to make the AI redo a plan.

08:20 – 09:50

07 · What it gets right (and one nitpick)

Praises that Augment knows WHEN to generate a task list — it picks complex queries and skips simple ones. Nitpick: panel is binary open/closed, can't be resized to a partial state.

09:50 – 10:00

08 · Continue-in-new-chat + import/export

Lists the workflow extras — push a task list into a brand new chat (he uses this), import from markdown (untested), export (untested).

10:00 – 11:05

09 · Reining in the AI + closing prediction

Generalizes the lesson — task lists 'rein in the AI,' which is why Claude Code feels controllable. Predicts every other AI coding tool will copy this pattern because it makes too much sense.

§ · Storyboard

Visual structure at a glance.

whiteboard thesis

hookwhiteboard thesis00:00

Taskmaster screenshot

promiseTaskmaster screenshot02:12

Augment Code 63 → 67

promiseAugment Code 63 → 6703:50

Task Lists make a difference

valueTask Lists make a difference04:50

VS Code demo

valueVS Code demo06:20

feature walkthrough

valuefeature walkthrough08:30

continue-in-new-chat

valuecontinue-in-new-chat10:00

wrap-up

ctawrap-up10:50

§ · Frameworks

Named ideas worth stealing.

00:30model

Split Context vs Same Context

Split Context — orchestrator mode dispatches sub-jobs to specialized modes (Roo Code, Claude Subagents)
Same Context — one chat manages its own task list in-flow (Claude Code todo list, Augment Code task list)

The two architectural approaches AI coding tools are converging on for keeping agents on track in large codebases.

Steal forAny 'state of the industry' creator video — pick the axis, place the players on it, name the trade-offs.

01:40list

Five Things That Help Keep AI on Track (whiteboard list)

Small surgical changes — don't let the AI do big sweeping things
Use text-based task lists and have the AI work through them
Add-ons like Cloud Taskmaster
Claude Code built-in todo lists
Augment Code built-in task lists

The whiteboard slide that anchors the whole video — the progression of techniques he's used over the last year.

Steal forAnchor a feature-review video with a 5-bullet whiteboard slide that shows the progression of solutions to a single problem. Frames the new thing as the natural next step.

§ · Quotables

Lines you could clip.

00:00

“One of the hardest things to do with AI assisted coding is keeping the AI on track in large coding projects.”

Universal pain statement, no setup needed.→ TikTok hook

05:20

“Augment Code went from 63.2% in my evals to 67.5 — pretty huge boost very consistently because of the task list management.”

Concrete number, before/after, immediately credible.→ X / LinkedIn pull-quote

10:00

“It really does rein in the AI. I felt this in Claude Code, and that's probably one of the reasons why I like Claude Code so much.”

One-sentence thesis of the whole video. Perfect closing line for a clip.→ IG reel cold open

10:38

“I would actually be surprised if we did not see things like this in some of the other AI coding tools because it just makes way too much sense.”

Confident prediction, sets up an 'I told you so' for the next news cycle.→ Newsletter pull-quote

§ · Pacing

How they spent the runtime.

Hook length30s

Info densityhigh

Filler12%

§ · Resources Mentioned

Things they pointed at.

02:55toolCloud Taskmaster

03:40toolClaude Code

03:55toolAugment Code

00:30toolRoo Code (orchestrator mode)

05:27toolBest AI Agents leaderboard (his own site)

09:35toolOpenAI o3 (for PRD generation)

§ · CTA Breakdown

How they asked for the click.

10:45next-video

“Have you had a chance to try this out? If not, you should just definitely go check it out because this thing is freaking awesome. Let me know what your thoughts are below.”

Soft CTA — three asks bundled (comment, try the product, implied subscribe). No hard sell, no affiliate pitch despite a Scrimba affiliate link sitting in the description. The product itself is the call to action.

§ · The Script

Word for word.

HOOKopening / re-engagementCTAthe pitchmetaphor

00:00HOOKOne of the hardest things to do with AI assisted coding is keeping the AI on track in large coding project.

00:10HOOKAnd I will say probably something kinda bold, but I think AugmentCode might have just released one of the best task orchestrators,

00:19HOOKtask list implementations that I've seen so far in any of the AI coding tools. And to back up a little bit, we can kind of see this sort of split direction that people are going that these companies are going.

00:33On one hand, we've got the orchestrator mode in RUCODE. What that does is it is it's kind of like a task list if you think about it. You give it a large thing. You give it a large amount of of work to do. It actually breaks it up and then orchestrates out the smaller jobs to,

00:52like, other modes. For example, you might need to do an architecture mode or you might need to go do some coding. But that actually is a really interesting thing because you're no longer, like, confined to a single context, and you're able to really keep the AI on track by that orchestrator mode, kinda dishing out the work.

01:12Where on the other side, there's we're gonna keep it all in the same context. One model chat log and so on. And one of the things that I've actually done probably for the last, I don't know, year now at this point, is I found that just small surgical changes, basically, not letting the AI do these big sweeping things, actually,

01:34has honestly helped me a lot because I can review the code easier. I can go in, and I can make sure that the AI doesn't get off track to really being surgical about it. But some other things have evolved over time. So one thing I did for a while was I was using kind of text based

01:52task list. So I would do I would have, you know, maybe ChatGPT or Claw generate a list of things that needed to be done, And then I would use the AI assisted coding tool to actually go through, read that file,

02:06and use that sort of thing as what it needed to do. But then we've had things kind of add on like Taskmaster. So Cloud Taskmaster might be one of the more popular ones.

02:18So here's an image of kind of what that looks like. This thing's actually pretty sweet, honestly. But at the same time, I would say, like, it does take a little bit extra work to get it set up. So we've got these really cool, really great implementations

02:33that basically allow you to orchestrate and keep the AI on track. Because if if any of you know, like, if we can constrain the AI to do our bidding, we can actually accomplish some pretty amazing things with it. And a lot of people talk about the big complications with AI, and I think one of the biggest complications with AI is just making it stay on track. And as the context fills up,

02:59its likelihood of staying on the path that you want it to be on kind of goes off the rails a little bit. And AI has gotten a lot better at this from where it was even, let's say, eight months ago to today. It's gotten a lot better. So that that's where Cloud Code and Augment Code have kind of come in. And there may be others too, but these are the two I know the most.

03:20Cloud Cloud Code has this built in to do list function where you can actually get it to generate its own task list so you don't have to orchestrate anything. It just manages it, and it keeps itself on track.

03:35And what AugmentCode has done is it has also just released its own version of the task list. And this thing is freaking awesome.

03:46And I'm gonna go through some of the features on it, but the big thing I wanna cover is it went from 63.2% in my evals to 67.5. And I've actually got an early version of my site up if you go to the best AI agents.

04:01HOOKYou can see here if we search by CloudForms on it, the last time that I had ran AugmentCode was sixty three point two o, and that was on May 30. The task manager alone

04:14HOOKhas improved its scoring quite substantially right in line with Klein and RootCode, know, maybe slightly behind that. Still number seven spot, but it is all relatively within margin of error when you see, like, a fraction of a point difference between them. So pretty huge boost very consistently

04:33from claw or from AugmentCode because of the task list management. Now they just make a huge difference. Ever since ClogCode,

04:46and I've been working with that, I very rarely have Cloud Code go off the rails. Where if I go back a few months ago, some of you might have known that I was venting a little bit about Cloud three dot seven because it would just go do things I didn't ask it to do. I don't have that problem anymore. And in fact, I would say clog ClogCode, in particular, has kept my my, um, what I'm actually executing, like, really, really well constrained.

05:11And I've just totally enjoyed working with it because I don't have to worry about it doing things I don't want it to do. But let's talk about AugmentCode a little bit because what what I said before is I think they have created one of the best, if not the best implementations for task list. So one thing to note is not everything is going to actually generate a task list.

05:35So in this particular one, I was actually had it debugging an error message for me. It did not generate a task list here. But if I go into some of my past ones let's take a look at this one. I think this one may have. It did not. Okay. So let me go back one more. K. Here's one that actually did. So this one actually generated a task list of things that it wanted to do.

05:56And I'm just gonna stay here for a second and kinda show some of the functionality that I've really enjoyed. So the first thing is you can just manually add a new task. And this is incredible, in my opinion, because I can come in and be like, I want this to do x y z.

06:13And I've used this quite a lot over the last day and a half. Because sometimes I don't even I like what the AI did or I wanna actually come in and I wanna change this to something else. I can do that without having to actually have the AI generate and use tokens to make that happen.

06:31I can go in and actually edit it myself. Now once the task list is in place, you literally can hit run all task, and it will just run through them. The other thing you can actually control is the status yourself, which I thought was really interesting. It will auto get updated as well as it actually gets completed,

06:49but you can also come in and you can say, oh, I don't wanna do that one. That one's complete, or you can just go ahead and delete them. So really, really very, very cool because when you think about clogged code, for example,

07:02you know, you get, like, this plan, for example, that I could go ahead and approve or not. I can't really change individual

07:12things very easily. So for example, if I if I like phase one, but I wanna change something in phase two, the way I need to do that is communicate back to it to have it make that plan. And that has been fine for me. Like, honestly, I I haven't had a lot of issues or concerns with doing that. But as I've started using all of my codes new task list, I've realized how much nicer it is to be able to come in and actually tune

07:39the task itself, remove a task, add a task. So this thing is incredibly powerful. And what I would say is, um, if you want it to generate a task list,

07:51it seems to do, I would say, a really good job of finding the right times rather to actually use a task list, and it finds the right times when it shouldn't use a task list. Because I've I've had somewhere when I put in a query, it knows that it's complex enough and it's like, boom. Here's my eight things. I'm gonna go ahead and do it. And it did that with every one of my evals that I put in.

08:15Where before, I think it scored poorly because a lot of times the agent would just kinda give up. Give up is the wrong word, but it wouldn't complete it to the fullest extent that it needed to. And, the task list is harnessing the AI to accomplish the particular goal that we actually have here. I would highly recommend giving this a try. It's this little icon here

08:38right beside the file change. Now the one thing that I would say is I I sorta wish I had a little bit more control on the size of this thing. I it's just either open or close, and I would love to be able to, like, bring it down to, like, two or three show. I know this is a minor nitpick,

08:55but I do keep trying to, like, drag it up and down this. And I want this a certain size, and I want this a certain size, kind of a minor a minor issue. But it is just worth kinda calling out. The other thing that I would say is you can actually take your task list and put it into a new chat. So I've done this a little bit where I was actually planning through something.

09:18And then then now that I have my task list, I just started a brand new context window with it in a brand new chat. I have not tested the import from markdown yet. I am very interested in checking that out because you could go in, you know, o three, for example, have it help you generate the PRD or the technical task list that you want and bring it in, and that could be kinda sweet as well.

09:43HOOKAnd I haven't really played around with exporting yet because I haven't really had much of a need to. But continuing new chat is the one that I've I've used a couple times at this point now. So, anyway, I just wanted to get kinda touch on this real quick and just share that massive improvements on evals with this. It really does rain in the AI.

10:04HOOKI felt this in Clog code, and that's probably one of the reasons why I like Clog code so much. I hadn't quite put that, you know, that into into words yet because Cloud Code is so controllable. I've said that. It's so controllable. And I think a lot of it has to do with the way it kind of breaks out its own tasks that it's doing. And I think Augment code, now that it has this ability,

10:27HOOKCTAit's just gonna become a lot more powerful for people. Combined with its amazing context engine, this thing is a beast, and I'm excited to kinda see what they go next because I was not expecting something like this. And in fact, I would actually be surprised if we did not see things like this in some of the other AI coding tools because it just makes way too much sense for the way they have this implemented.

10:49CTAReally great implementation. Really excited to kinda see what they do next. Anyway, I'm gonna wrap it up there. Let me know what your thoughts are below. Have you had a chance to try this out? If not, you should just definitely go check it out because this thing is freaking awesome. Till next time, everyone. Have a wonderful day. Peace out.

§ · For Joe

Steal the format: feature review with a receipt.

GosuCoder playbook

Every JoeFlow / Mod Boss feature ship deserves a number — a before/after metric on a real workflow — not a vibes review.

Open with the universal pain, not the product. 'The hardest thing in AI-assisted coding is keeping it on track' lands before he ever says Augment Code.
Whiteboard the landscape first. Show where the new feature fits in a map of all existing options. Makes it feel inevitable, not random.
Always have a receipt. The 63.2 → 67.5 eval delta is what makes this a tool review and not a tool ad. Joe needs a number for every JoeFlow accuracy/speed claim.
Predict the future at the end. 'Every tool will copy this' creates an evergreen rewatch hook — when the next tool ships the feature, the video gets fresh relevance.
Keep the demo inside the editor where the audience already lives. No fancy cuts, no Premiere transitions — VS Code screen-share + face-cam PIP is enough.

§ · For You

What this means if you're picking an AI coding tool.

If you're choosing between Augment, Claude Code, Cursor, Roo Code

If your agent keeps going off-rails on big features, the unlock is built-in task lists — and right now Augment Code or Claude Code are the two tools doing it best.

If you've been frustrated with AI doing too much at once or losing the thread mid-feature, this is the category of fix to look at, not better prompts.
Augment Code's task list is editable mid-run — you can add, remove, or change individual tasks without spending tokens to re-plan. That's the killer feature.
Claude Code's built-in todo list does the same thing in a different shape, and is also worth trying if you don't want to switch tools.
Skip Cursor / vanilla Copilot for multi-step features — they don't have this pattern yet (though that'll likely change soon).
The 'continue in new chat' workflow is the secret weapon: plan in one window, execute in a clean one. Smaller context window = less drift.

§ · Frame Gallery

Visual moments.

whiteboard thesis

Frame at 00:12 from Augment Code: The Upgrade You've Been Waiting For

Frame at 00:20 from Augment Code: The Upgrade You've Been Waiting For

Frame at 00:29 from Augment Code: The Upgrade You've Been Waiting For

Frame at 00:37 from Augment Code: The Upgrade You've Been Waiting For

Frame at 00:45 from Augment Code: The Upgrade You've Been Waiting For