Big Idea

The argument in one line.

The bottleneck in AI-assisted coding is not the model but the builder's comprehension, and these five tools are designed to raise that ceiling by surfacing what the model cannot tell you on its own.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

A non-engineer builder shipping apps with Claude Code, Cursor, or similar tools who regularly hits walls the model alone cannot diagnose.
A solo developer who suspects their codebase is over-engineered but has no systematic way to audit it.
Anyone paying for voice-to-text tools like WhisperFlow and open to a free self-hosted alternative.
A builder who installs third-party agent skills without auditing them for security vulnerabilities.

SKIP IF…

You are already fluent in software architecture and have rigorous code-review processes in place.
You do not use Claude Code, Cursor, or agent-based development workflows.

TL;DR

The full version, fast.

Vibe coding raises a ceiling problem: the model can only go as far as the builder's understanding. This video demos five repos that attack that ceiling from different angles. draw.io-skill generates architecture diagrams from your codebase so you can point the model at the right layer instead of letting it wander. Ponytail audits for unnecessary complexity and YAGNI violations. Handy is a free local voice-to-text tool that gets more context into prompts faster. improve (shadcn) produces remediation plans without implementing, feeding a structured backlog for automated agent loops. SkillSpector (NVIDIA) scans skill libraries for security vulnerabilities before you install them, catching credential-exposure patterns and remote-code-execution vectors.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:33

01 · Hook + framing

Five repos, half under 10k stars, all solving real daily friction. Sets up the through-line: AI tools are only as good as your understanding of your own codebase.

01:33 – 07:43

02 · draw.io-skill

Architecture diagram generation from natural language. Demos a layered food-logging app. Core argument: knowing where a problem lives saves tokens and builds comprehension.

07:43 – 11:37

03 · Ponytail

Anti-complexity auditor with three commands: ponytail, audit, review. Live audit on Expo app surfaces deletable files, collapsible components, and YAGNI violations.

11:37 – 15:04

04 · Handy — free voice-to-text

Open-source WhisperFlow alternative running Parakeet or Whisper large locally. Raw transcription only, but sufficient for prompt drafting at zero cost.

15:04 – 22:23

05 · improve (shadcn)

Code audit skill that produces remediation plans, not implementations. Paired with Handy for a voice-prompted resolver audit; findings flow to GitHub Issues for agent loops.

22:23 – 27:09

06 · SkillSpector (NVIDIA)

Security scanner for skill repos. Live demo returns critical/do-not-install on a trending Chinese-language repo due to cookie-paste exposure and remote code execution vectors.

27:09 – 27:30

07 · Wrap

Subscribe CTA.

Atomic Insights

Lines worth screenshotting.

The model can only push as far as your expertise — vibe coders who never learn architecture will always hit the same wall.
Generating an architecture diagram before debugging tells the model exactly which layer to inspect, cutting wasted token exploration.
AI coding tools will over-engineer solutions by default; a dedicated simplicity auditor is the structural fix, not better prompting.
Three separate error-strip components that differ only in copy and color are the exact problem Ponytail catches — one component with props is the answer.
Spoken prompts generate 3x more context than typed ones, and better context produces better model outputs without changing the model.
A free local Whisper model eliminates the $20/month WhisperFlow cost for builders who only need transcription, not AI rewriting.
Audit tools that only surface problems without implementing fixes are more valuable than auto-fixers because they let you decide priority.
GitHub Issues with labels are a low-friction system for routing audit findings to automated agent loops without complex project management tooling.
A deterministic lookup that already exists in one function but is not propagated app-wide means unnecessary LLM calls at every other call site.
Scanning a skill library with SkillSpector before installing costs 20 cents to $5 and prevents credential exposure and supply-chain compromise.
Cookie-paste patterns in skill repos are a critical vulnerability: any attacker who gets those cookies owns your Twitter, Reddit, and other sessions.
Piping a remote install script directly to your machine is a remote-code-execution vector regardless of the author's intent.
Using architecture diagrams, voice prompts, and issue-based backlogs together creates a compound system where each tool makes the others more effective.
YAGNI violations are the most expensive mistakes in vibe coding: abstractions built for scenarios that will never ship add permanent maintenance cost.
The real opportunity in AI development is not one-shotting new things but using each model upgrade to improve what you have already built.

Takeaway

Five tools that make your codebase legible to you.

WHAT TO LEARN

AI coding tools amplify whatever understanding you already have — these five repos are designed to raise that baseline before the model ever touches your code.

Before asking an AI to fix a bug, generating an architecture diagram from your codebase lets you point the model at the right layer, cutting wasted exploration and token cost.
Over-engineering is not a prompting problem — it is a structural one. A dedicated simplicity auditor surfaces deletable files, collapsible components, and YAGNI abstractions the model will never flag on its own.
Spoken prompts naturally include 3x more context than typed ones; switching to local voice-to-text is one of the cheapest improvements available to any solo builder.
An audit tool that produces a plan without implementing it is more useful than one that auto-fixes, because it forces a deliberate decision about priority before any code changes.
Routing audit findings to a labeled GitHub Issue backlog lets background agent loops implement changes autonomously while keeping a human in the loop for review and merge.
A deterministic resolver that already exists in one function but is not propagated to the rest of the app represents the most common class of unnecessary LLM spend: problems already half-solved.
Third-party skill and plugin repos are an active attack surface. Cookie-paste patterns and unverified remote install scripts are the two highest-risk patterns to check before installing.
A 20-cent to $5 security scan on a new skill repo is cheap insurance against supply-chain compromise, especially for repos in languages you cannot read.
The most durable AI development loop is one that compounds your own expertise alongside each model upgrade, not one that replaces the need to understand what you built.

Glossary

Terms worth knowing.

draw.io-skill: An agent skill that reads a codebase and generates editable architecture diagrams using the draw.io CLI, without requiring a server or MCP.
Ponytail: A Claude Code plugin that audits codebases for unnecessary complexity: files to delete, components to shrink, and over-engineered abstractions.
Handy: A free, open-source macOS voice-to-text app that runs local Whisper or Parakeet models and pastes transcribed speech wherever the cursor is.
improve: A Claude Code skill by shadcn that performs a targeted code audit and produces a structured remediation plan without implementing fixes.
SkillSpector: A NVIDIA-released security auditing toolkit that scans agent skill/plugin repositories for vulnerabilities such as credential exposure and remote-code-execution vectors.
YAGNI: You Ain't Gonna Need It — a software principle warning against building abstractions for hypothetical future scenarios that are unlikely to materialize.
Parakeet: A fast local speech recognition model used within Handy and similar tools, positioned as a speed-accuracy tradeoff alternative to Whisper large.
Agent loop: An automated workflow where an AI agent pulls open issues from a backlog, implements the fix, creates a pull request, and awaits human review before merging.
Deterministic resolver: A code path that resolves a user request using exact matching or lookup logic, bypassing the language model entirely for unambiguous inputs.

Resources

Things they pointed at.

01:33tooldraw.io-skill (Agents365-ai) ↗

07:43toolPonytail

11:37toolHandy

15:04toolimprove (shadcn) ↗

22:23toolSkillSpector (NVIDIA) ↗

06:30productWhisperFlow

Quotables

Lines you could clip.

07:00

“You will only ever be able to push these models as far as your expertise goes.”

Standalone thesis, no context needed, contrarian to the 'AI replaces expertise' take→ TikTok hook↗ Tweet quote

04:56

“If we could have pointed [the model] to what we knew was the area that the problem most likely lived, we're gonna end up saving on tokens in the long term.”

Concrete, practical, reframes token efficiency as an architecture problem→ IG reel cold open↗ Tweet quote

09:01

“Long ponytail, oval glasses. He's been at the company longer than version control itself. You show him 50 lines. He looks at them, shakes his head. He says nothing, and he replaces them with one.”

Vivid character sketch, funny, lands the tool value in a single image→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

There are five new GitHub repos that I've come across recently that are pretty awesome, and half of them don't even have 10,000 stars yet, which is pretty surprising to me because these things solve some pretty big problems that I deal with on a daily basis trying to build apps with AI.

So we're gonna go through each one. We're gonna demo it, and we're gonna talk about, like, where this can fit in your process and where some of these things can actually be used together. Starting with my favorite of the bunch that you might make fun of me for, draw.io.

So one of the things that really tends to suck about vibe coding things, especially if you're not an engineer by background, is that you tend to not know, number one, what you've built realistically, but number two, how those different things that you've built actually connect together.

And that can be a big problem because when you wanna go through and make improvements or just generally understand where an issue might be coming from or an area that should be improved, you're kind of relying purely on the language model to figure that thing out for you when in reality, you do need to have an understanding of how those things work.

And so what this skill does specifically is that it uses the draw.io command line interface, but it uses it to help you build actual architecture diagrams of your app.

So we can see that in an example here where we can see the the mobile surface, the web surface, maybe the admin surface, it all gets routed through this API back end, which passes it off to all of these different services depending on what's happening. And then they have, you know, different databases depending on the service and how you've structured your app.

And so having this type of understanding can be super valuable because, again, it's gonna give you the understanding of, like, what is actually happening inside of your app, where do things live. And in my opinion, the best way to approach Vibe coding or Vibe engineering is to approach it from the perspective that we want to be learning at all times about how this stuff works so we can build better and better and better things in the future.

The worst thing that could possibly happen is that your skills stay where they are now, And this, in my opinion, is the type of tool that helps you do that. So let me show you how it works specifically. So the first thing that you need to do is just pop in a command depending on, you know, if you're on Mac, Windows, or maybe you're a complete chat and you're on Linux.

Make sure that you install this thing first. I'm not gonna go through that because it is very straightforward. And then after you've done that, all you need to do is install the skill.

So if you're not using Cloud Code, you can just use the general s skills add command and then pass this in. But if you wanna install this via, like, a Cloud Code plugin or whatever, you can just use the slash plugin command, add the marketplace, and then install the skills out of the marketplace.

And so the first thing you're gonna do is you're gonna come down, you're just gonna invoke the skill, the drawio skill, then and you're gonna give it a command. And so one of the things that's really nice about this is you can give it a natural language description of exactly what you want to see visualized, and it is gonna move through and do that.

So in this example that I'll show you first, I said, I'd like to visualize the different layers of the services inside of my repo. And so this thing moved through.

It explored the code base, which realistically, I mean, didn't take that many tokens. And then from that, it basically drew us an editable architecture diagram.

So if we were to go down now into Draw. And pop it open, we can see exactly that. So in this case, we can see the overarching architecture of an MVP that we've been building, and we're gonna be taking it to market inside of my paid community.

And so we can see exactly what that looks like. We have our presentation layer, which is purely responsible for just displaying things to the user. So we have the mobile service.

We have all of, like, the navigation and Chrome inside of this thing. All of our different, like, feedback states and icons and badges and the different overlays we have in our app, and everything that's responsible for actually presenting something to the user lives here. So if you continue to move through, we can see, like, the presentation layer then is interacting directly with our front end state management.

So we can think of that, like, where does all the data from the front end, like, actually live or exist when the presentation layer is calling it. This front end then is, like, resolving everything inside of it against our service layer.

So accessing the data, all of the business logic. Um, in this case, it's like a natural language food logging app, and so there's a lot of, like, resolving of the expressions that a user kind of chats in with.

So if someone says, like, log me my chicken and rice bowl, it needs to be able to resolve that expression and then go out and search it. And so then we can see all the different functions inside of this. We have this resolver.

We have this recipe function. Right? And we have a bunch of other, like, functions and services in this layer.

So the reason that something like this is valuable is I think what a lot of people do when they run into some sort of issue is we just say, hey, Cloud Code. Go fix this problem. And, you know, it can typically do that pretty effectively.

But to force it to go out and explore a bunch of stuff, at the end of the day, if we could have pointed them to what we knew was the area that the problem most likely lived, we're gonna end up saving on tokens in the long term, and we're gonna actually build an understanding of how things work. So for example, if I'm running into a bunch of issues with relation to, like, how the chats that a user sending get actually parsed and whether or not it's calling, like, the agent effectively and things like that, I know that that, like, is most likely going to start with, like, We hey.

Need to go check this, like, resolver layer inside of our app. So it would be really dumb if we sent it off to, like, read our analytics or telemetry files.

That would just be dumb and a waste of time. So then we can see, okay. Well, the service layer then interacts with, like, our our back end database specifically.

So the different foods, different food entries, recipes, macro targets, like, all that stuff related to our apps specifically, uh, live inside of this database layer. And so then the last piece is that we have these different edge functions where we're connecting to, like, external services.

So post hoc for, like, capturing product events. We're using OpenAI for, like, chat completions, and then we're using edge functions to handle some of the the other things that happen.

So now this is just one example of what this could look like. If we wanted to see how these different services, for example, like, actually connect into very specific databases and, like, what that logic actually looks like, uh, we could ask this skill in this case to come through and actually, like, make that type of update to our diagram.

And so to drive this point home, there was a a tweet recently from the CEO of Microsoft. And, basically, to wrap all of this up in a a too long don't read, we tend to be, like, really focused on, you know, the best model and trying to, like, one shot things and do all of this, like, hand wavy type of, like, demonstrations of these tools.

But if you really want to succeed in the long term, you need to use these tools as learning loops because you will only ever be able to push these models as far as your expertise goes. That's why people that are, like, incredibly talented engineers already can build really complex things with these tools, whereas people that more are more beginners, you know, it's pretty awesome that they can build their remind me to walk my puppy app.

But the reason that it can't really extend beyond that is because they don't have the language or the understanding to know where they can push these things in the first place. And so while it may seem basic, having a tool like this that can actually start moving you in the direction of, like, learning exactly what is going on inside of your projects, I think, is incredibly, incredibly valuable thing that everybody should be doing.

But even with having something like this in place, you may inevitably run into the situation that maybe even what we have here is unnecessarily complicated.

Like, sure, we've built all of these things and technically it works, but the one thing that language models love to do is over engineer solutions to problems, and that is what the next tool helps us with. So the next tool up is called Ponytail, and it has one of the best avatars, I think, that I have ever seen for GitHub repo.

And so like I said earlier, one of the biggest problems with AI coding tools is that they will 100% overengineer solutions to problems despite your best efforts, despite, like, trying, maybe you try to rein that in, maybe you don't care at all, but it will go out there and build, like, abstractions and things that you do not need for where your project is at now.

And so this library is meant to help us solve that problem. So you can kind of think of it like those caveman plugins that we see flying around everywhere where it's meant to have the models speak to you in, like, less lines.

But this is doing that, like, for the actual implementation. Like, can what you are doing be done away with entirely or be done a lot more simply?

So if you've ever worked inside of, like, a a SaaS company, you might know someone like this. Long ponytail, oval glasses. He's been at the company longer than version control itself.

You show him 50 lines. He looks at them, shakes his head. He says nothing, and he replaces them with one.

That's what we're trying to do with this library. So let's go in and actually look at how it works. Obviously, you're gonna install this thing the way you would install a plugin.

So you add the marketplace and then install the plugin. So there's actually a few different, uh, commands inside of this. They have the the straight up ponytail command, and this is gonna be helpful if you're, like, actually attempting to implement something.

Then they have this audit command that will look through everything you have and try to understand, like, where you have unnecessary things in place, where you could simplify, where you could delete things entirely. And then you can also run this for, like, actual code reviews.

So for example, if we were to come down inside of this project and run the audit command, this thing is gonna run through and hopefully, maybe hopefully, maybe hopefully not, tear my code base apart and tell me that I've done a horrible job. Alright, guys.

So now that this thing is done, we can see that it came up with a bunch of things to potentially fix. So one, two, three things that it would recommend deleting entirely, because they're either, like, not actually used.

So in this case, it's like an expo project, and so we're importing, like, default expo things that don't actually get used. So we would wanna remove those, um, along with some other files. There's a few areas.

I think this is the biggest find. We have one, two, three, four, five, six, seven things that should be shrunk down.

So an example of, like, what one of those things to shrink down might look like is in our case in this app. My philosophy with the planning process is that you should be planning for all of the different, like, error states way ahead of time before you start building. And so in this case, we have, uh, at least three different error states, uh, specifically as it pertains to this strip thing that we have in our app.

And so we have an error strip. We have a correction error strip, then and we have a save error strip. And these are all individual components.

And the only thing that's really different about them is the copy that's used and, like, the color and the handler that's inside of it. And so in this case, we should just have one error strip component that accepts these as, like, properties inside of them. So that's an example of what you might find with the shrink command.

And then we have this, uh, YAGNI, you ain't gonna need it, which are typically situations when you've, like, overengineered something that, like, yeah. Okay. You're maybe planning for, like, some way off in time thing that, like, is never gonna actually happen.

And so why create the complexity when you could just do something that, again, is simpler, which is, again, the point of this entire library. So in a bit, I'm gonna show you, like, a different take on this type of tool because this is incredibly valuable again, especially if you're not an engineer by by background.

But even if you are an engineer and you wanna be able to, like, drive this thing in a certain direction, I personally think this type of tool is is really, really, really awesome. You just need to find, like, based on your experience level where this fits best in your specific process, like what skills you use to build things, like what spectrum and tools you use, and, again, where this is gonna make sense in the context of all But before we get to that other kind of, like, implementation of something like this, I wanna show you a an open source tool that I found recently that is a a huge quality of life improvement if you don't already do it, and it is free.

And so you've probably heard this stat before that we can speak things out, I think, three times faster than we can type them out. And so that's why you see a lot of people, like, going all in on tools like WhisperFlow, where they're just speaking into the model what they want to happen instead of sitting there and having to type the thing.

Because what ends up happening is that when you are typing things out, you tend to, like, cut down on the context that you probably would have otherwise given the thing if you were able to, like, get that context out of your brain a lot more quickly. And so there's a lot of, like, paid tools for this that I try. I currently use WhisperFlow, which we can see down here.

This is me using WhisperFlow down in the bottom, but it costs money. And so this tool, which is called Handy, is basically a completely free and open source version of something like WhisperFlow with, like, technically, like, a little bit less functionality inside of it.

Like, I don't think it has, like, AI rewriting capabilities and some things like that, which WhisperFlow does. But if you don't care about that and you just want, like, an easy way to be able to dump your thoughts, this is a really great tool for that. And so the way that we can install this thing, there's two options.

You can go through and, like, install it via Homebrew. You can also just go to their website, download it for whatever platform you are on.

Downloads very quickly. Then all you need to do is drag it into your application folder. Again, in this case, a Mac, you do whatever it is for your your operating system.

So we can pop the thing open, accept the permissions. And now one thing that's pretty cool about this, if we look at it, is that we can choose the type of model that gets used, like, based on the machine that we actually have. So whether you care more about, like, the accuracy or the the speed, you can choose what you wanna actually use.

Parakeet seems to be a model that is getting a lot of traction for people lately. There's also whisper large, which is slower to process things, but it's very accurate.

But we could come through and try, for example, something like parakeet. And now just to demonstrate how this thing works, I didn't configure any other settings. We can just come through and do command space bar.

And now as we're sitting here typing out, we could be talking about anything, talking about what our feedback is on, like, a specific spec that we were looking at inside one of our projects. Maybe we didn't like the direction that it was taking. Maybe we wanted to zoom out and, like, explain things in a different way, whatever it might be.

We can see that we got that entire thing, and it is, uh, pretty accurate. In order to type it out, can be typing anything. Blah blah blah.

Again, pretty accurate. There are some differences between something like this and WhisperFlow. WhisperFlow will take out, like, fluff words and, like, filler words.

So if you're rambling or saying, like, uh, a lot, it will take those types of things out. But for purposes of using a language model, I don't really think that matters that much. So if you've been interested in using something like Whisper Flow, but you can't stomach paying the $20 per month or whatever it is for Whisper Flow, uh, this is a really cool open source option that you can use.

But it would be helpful in this situation to, like, actually put this to the test with something valuable. So how can we combine this with the next skill on the list to really improve our overall, like, efficiency as we're moving through and building things? So the next skill that I wanna show you is called improve, and it is developed by Shadcyan or Shadcyan.

I still don't know how to pronounce it. Comment below and tell me the proper way to pronounce it so I stop butchering it on my YouTube videos. But basically, what this thing is is it is a code base auditor.

So if you remember all that time ago when we had access to Fable five for, like, two and a half days or whatever it was, uh, this was an example of one of the skills that I was using nonstop in order to improve my projects. Because in my experience, Fable five by itself was really good at improving things where you had, like, an existing project or there were existing patterns.

I know a lot of people were showing off, like, one shots of new things, which is cool, but I thought it was really great at fixing existing things. The fact of the matter is, I think, like, every model jump, we should be using those opportunities to, like, actually improve things we've already built. But until they decide to give that back to us, this skill is still pretty dope.

So let's look at how it works. So this case, what we're gonna do is we're gonna call this improve command, but then we could come down and we could use a tool like Handy that we found in the, uh, in the last video. I need you to do a code based audit of our app, but I want you to specifically look at the, like, effectiveness of our resolver functions.

So how are we parsing the information that comes through on the front end from the user via the chat and deciding whether or not that needs to actually be processed by a language model, and it needs to, like, go off to the agent or if that's something that can actually resolve down to something like a simple search. So this is primarily going to be a language model token optimization exercise, but it is important in whatever recommendations you make that we're still optimizing for the accuracy so that we are taking the proper actions when we need to.

Boom. And there we go. So if I was gonna have to type all of that out, I probably would have said, I need you to optimize our code base for language model calls.

Right? But now that I have this capability, I can get a lot more context in there and really, like, flesh out my ideas more. If I wasn't doing this, like, in a video, I probably would have spent, uh, more time doing, like, several of these thought drops into, like, a text file and then pasting it over.

But, again, one of the reasons this all of these things kind of come together is because if we were to go back to the draw IO skill that we were looking at earlier, again, the reason that I was able to know, like, specifically where I think this, like, inefficiency actually comes from, it's because I have this type of architecture diagram and have, like, paid attention to the types of things that were being built inside of the app, reading the specs, and trying to understand what it was doing and why.

So I know that all of this logic lives inside of this resolver. And so if I'm able to point the model there, then we're gonna get a much better analysis, um, straight out of the gate.

And so one of the reasons I really like tools like this, it doesn't have to be this specific one, but any tool like this, is that it finds these edge cases. So what happened previously in the context of this app is that I already asked Opus to fix this thing for me.

And what it did was it fixed this in one spot. So the code base already has this thing that I'm basically asking it to optimize, but it only put it in one place.

It's only in the recipe composition function inside of this app.

So if someone's, like, on their phone talking via natural language and saying, hey. Need I a chicken rice bowl. I had a 100 grams of chicken.

I had 50 grams of rice, and I had my homemade buffalo sauce. That's, like, the only place that it gets used, and there's a lot of places in the app where this type of optimization should take place.

But in reality, what's happening is that most of this stuff is just being sent directly to the language model when in reality, a lot of this stuff could resolve deterministically, meaning we don't need to use a language model to do that. And so that is exactly what it found here, and now it's moving through, and it's enumerating, like, all of those things that it found.

So we have one, two, three, at least four different things that it is pretty confident in what it found is an issue.

It's relatively low effort, and it's a relatively low risk area to go in and try to refactor. And now one of the things that I I really do like about this tool is that it will not go through and implement things. Like, I think a lot of these tools, like, try to implement the thing in the same breath.

This is just going to build you a plan so that you can go and implement this in any tool that you could possibly want to implement it in. So now that it has these plans built out to fix all these different issues that it found, just to show you guys, like, what my concrete next steps for a workflow would be, is to create these as GitHub issues.

Now the reason that I like to do this specifically goes back to a video I did last week on agent loops, which get a lot of shit for some reason. But one of the really valuable workflows that I personally use is anytime I want to build something and I have a clear concept behind that thing, whether it's a bug, whether it's a feature request, an optimization that needs to be made, like, whatever it is, I create those as GitHub issues, and then I make sure that I'm really on board with what the plan is to implement against that issue.

And then I can create a giant backlog of these things. And then as I'm working throughout the week, I can just have something in the background moving through, pulling any issues that are open out, implementing them, creating a poll request, reviewing the poll request, and then I can step in and do any sort of review that I need to before it actually gets merged into the main project.

And so for me, this is, like, where that type of thing starts. If I'm using a tool like Improve and I wanna be able to create that backlog, this is how I do it.

I see no reason not to use something like GitHub. There are other tools you can use like Linear and other project management tools. But since my repo is hosted on GitHub and it works for that, I don't need some super complicated solution.

That's what I use, and this is exactly how that process works. So later on today, after I'm done with this video, I'm gonna be kicking off agents to move through and actually implement on these things. If you guys wanna see any other, like, types of videos about the types of agent loops like this that I use, you can comment below and let me know, and I I can put a video together on something like that.

But now, if we were to pop into the GitHub repo, we can see that we created all four of those issues right here. And then I would move through like, what I would do is add labels to them depending on, like, exactly what needs to be done. So anything with a backlog tag, for example, won't get implemented when I have these automated runs moving through.

So you can come up with your own system for how you wanna manage these things, but this is a really solid process. And now if we pop in, we know, like, exactly where the plan lives. We can see, like, exactly why we're making this change, what we are doing, and then you can come through if you want to, and you can have more of a back and forth chat.

You could tag Claude, for example, and tell it to go out and research something. A lot of different things you can do, but overall, this is a really solid process.

Now the last thing that I'm gonna show you guys is kind of unrelated to all of this, but I think it's a really dope tool that everybody should be using in certain situations. And so that skill, I guess, maybe it's not technically a skill.

It is a scanner of skills that was released by NVIDIA. So what this is is this is a security auditing toolkit for scanning skill libraries specifically.

So I wanna give you a really concrete example of how I would use something like this. So there's this repo I came across recently that's going really hard and trending. The only issue that I have with it is that the entire thing is in Chinese.

And I'm the type of person that when I want to use a new library, I will typically go through and try to understand how it works. First, before I dip in and decide to start building this thing, you could try to maybe swap the repo to English and move through and try to read the thing.

But in this case, this is a a prime example of where, like, I don't know what could live inside of this script library. Like, I I don't know what could be in here. Like, yeah, I could go through maybe and and read some of this.

But, realistically, if it's in a language you don't understand, there's gonna be, like, so many potential, like, issues, realistically, that could crop up.

And so this is an example of where I would wanna scan a repo. And so what we can do is we can come through and copy the URL, and then we can pop down into our terminal. And in this case, we are inside of the SkillSpectre project.

One thing that I will say, just popping back to this, um, the way that you need to install this and get it to work is that you need to actually clone this repository, and then you need to have Python on your machine. And, I mean, you can have a language model help you with this if you're not comfortable with it, but you need to kick off a virtual environment inside of that project.

You need to install the dependencies. And then the last thing that you do need to run this, you can run it for free, but you will get a shit ton of false positives to the point that it's not even really helpful.

In this case, I am using an OpenAI API key to run this scan. So we're gonna pop down in here, and then I'm gonna run SkillSpector scan, and then I'm just gonna paste in the repo.

And now this thing is off and running doing this scan. And so especially if you are like a Open Claw Hermes Chad, you know, that's automating their entire life, allegedly making a million dollars a day doing no work because, you know, the Hermes bot goes out and sells stuff for you.

But even if you just like to experiment with things, skills are a huge, like, attack surface for people that live in their parents' basements and eat Cheetos, and also for serious talented hackers that just want to take advantage of whatever it is that you've built.

And so being able to, like, scan things like this is is incredibly valuable. So in this case, I mean, I'm actually somewhat surprised. This says that this is a critical issue, this repo, and do not install it.

Um, so that's interesting. Let's try to move through and understand, like, why that's the case because, again, it could be something that, like, maybe you are willing to, I don't know, like, bypass that and just do it anyway. So I think the biggest issue is that it has a lot of functions inside of it, which, I mean, makes sense that are executable.

Right? So this thing can go out and use, like, XSearch and GitHub search and all these other search functions. You're giving, like, executable access to these scripts on your machine.

Now in this case, there's, like, 63 different issues that it found. And for me, it's I mean, I think to be able to read through all of these things and parse that out, it's like, yeah. You could do it, but this is, a ton of stuff.

And so for me, what I did in this case was I pasted all these into Claude and asked it to, like, describe realistically, like, based on the design of things, like, what would the attack situations actually be? Um, number one, they have these things where you need to, like, paste your cookies in, and that is obviously, like, a very, very, very risky thing to do because that can give, like, any attacker realistically that gets access to those things complete control over any of the things that you pasted in, your Twitter sessions, your Reddit sessions, like, any of that stuff they can gain access to.

So number two, uh, which I think is probably, like, seems to be, for me, like, a bigger issue, is that you can just allow remote code execution through unverified install and update scripts. So in this script file that they have, there's external install script that you are basically downloading and piping straight into your machine.

And that, again, is a serious, serious issue. And I feel like we've seen enough of these, like, supply chain compromises recently that even if this person wasn't malicious, that just seems like a very easy way to get yourself completely pwned.

So that being said, while still Spectre wasn't, like, as related to all the other ones, I do think this is, like, an incredibly valuable thing. And if we were to pop in and see, like, how much did that actually cost to run, I've done a few of these scans, and I've been using my OpenAI stuff for other things this month.

So that did cost actually about $5, uh, for me to run. I've done other ones on smaller repos, and it cost more, like, 20 to 30¢.

And so just for full context, like, that is what it cost to do that type of security scan on a, like, a relatively larger, I guess, project. There we have it. Five, like, I think relatively new repos that I hadn't heard really of any of these, so I think they're pretty dope.

If you found this video helpful, make sure to subscribe. But that's it for this video. I will see you in the next one.

The Hook

The bait, then the rug-pull.

Five GitHub repos, most under 10,000 stars. Each one solves a problem that shows up daily for anyone building with AI coding tools — and the creator demos all of them inside a real project.

Frameworks