Modern Creator
Mansel Scheffel · YouTube

I Made My Claude Skills Learn Without Going Rogue

A 17-minute systems walkthrough of building a five-stage skill refinement pipeline with a judge AI, a human gate, and a pointed critique of tools that skip both.

Posted
today
Duration
Format
Tutorial
educational
Views
407
23 likes
Big Idea

The argument in one line.

Self-improving AI is not magic -- it is a structured loop where real-world feedback flows through a routed evidence pipeline, an AI judge, and a mandatory human gate before any change lands in a skill file.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You build Claude Code skill systems for clients and keep patching the same output manually instead of fixing the underlying skill.
  • You use or sell AI automation to businesses and want a defensible framework for when to let AI auto-update vs. require human review.
  • You are evaluating tools like Hermes and want an honest second opinion on constraint-based vs. spawn-everything approaches.
  • You run an AI consultancy and need a concrete lifecycle model to explain why unrestricted auto-refinement is a liability.
SKIP IF…
  • You have not built any Claude skills yet -- the presenter flags this as an advanced concept and points to prerequisite videos.
  • You are looking for a one-click product rather than a system you configure and operate yourself.
TL;DR

The full version, fast.

The video teaches a five-skill pipeline -- signal capture, evidence router, skill self-update, context update, and weekly review -- where raw feedback from external systems flows through structured evidence cards before any change is proposed to a skill file. A judge AI scores proposals against the skill definition of done, but nothing ships until it clears the three Ms human gate: Megaphone (audience impact), Money (delivery impact), Meaning (system direction). Weekly cadence is the recommended default. The presenter closes with a critique of Hermes and tools that auto-spawn and auto-update skills without constraint.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:28

01 · The self-improving AI promise

Pattern-interrupt hook debunking the hype claim; promise to show what self-improvement actually looks like.

00:2802:43

02 · What skill refinement actually is

Defines skill refinement as real-world feedback going into an evidence inbox to produce an improved skill file. Distinguishes from evals.

02:4304:25

03 · The skill lifecycle

Seven-stage loop: build, define done, eval, use, capture, refine, re-eval. Emphasizes starting with a solid definition of done.

04:2505:18

04 · The three-layer system

Signal capture / refinement engine / cadence -- the three structural layers of any refinement loop.

05:1808:08

05 · Walking the pipeline

Five-skill pipeline walkthrough: signal-capture, evidence-router, skill-self-update, update-context, weekly-skill-review. Introduces the judge AI.

08:0811:00

06 · The human gate and the three Ms

Blast radius thinking; Megaphone / Money / Meaning framework for deciding what requires human review vs. auto-pass. Cadence options.

11:0016:42

07 · Live demo on mock data

Claude Code live walkthrough through all four pipeline stages using mock LinkedIn rejection and Acme Robotics call transcript.

16:4217:49

08 · Constraint-based approach and the Hermes critique

Preference for building skills only when there is a repeating business need; critique of Hermes for unconstrained skill spawning.

Atomic Insights

Lines worth screenshotting.

  • Skill refinement is not the same as evals -- evals grade a skill against known examples, refinement teaches it from real-world usage.
  • A skill that lacks a clear definition of done cannot be meaningfully refined -- the judge has no standard to measure against.
  • The blast radius question -- what is the worst thing that happens if auto-refined information is wrong -- is the right mental model for deciding what needs a human gate.
  • One wrong update to a shared context file like your ICP can corrupt every downstream skill that reads from it.
  • Auto-refine where a mistake is a typo. Keep a human in the loop where a mistake affects an audience, a client, or a direction change.
  • Weekly review is the practical default for most teams -- real-time refinement creates noise and overhead that outweighs the speed gain.
  • The skill lifecycle is a loop, not a line: build, define done, eval, use, capture, refine, re-eval, curate.
  • Rejected outputs are data, not failures -- the richer the rejection note, the more the pipeline can learn from it.
  • Not every piece of incoming evidence belongs in a skill file -- the evidence router decides whether feedback updates a skill, a context file, a memory, or nothing yet.
  • Building a skill only when you have a repeating business need prevents your skill folder from filling with one-off agents that never run again.
  • A judge AI can only enforce quality if the skill it judges has explicitly defined what quality means -- vague skills produce vague judgments.
  • GitHub pull request review is a valid human gate mechanism for skill updates -- treat skill changes like code changes.
Takeaway

How to build a skill that fixes itself.

WHAT TO LEARN

A skill that keeps making the same mistake is not a bad AI -- it is an unfixed system, and fixing the system means building a feedback loop with teeth.

  • Evals and refinement are not the same thing: evals test a skill against known examples before deployment, refinement teaches it from real-world failures after it runs.
  • A skill without a clear definition of done cannot be meaningfully improved -- there is no standard for the judge or the human reviewer to measure against.
  • The blast radius question is the right filter for auto-refinement: the bigger the downstream impact of a wrong update, the more a human needs to be in the loop.
  • Not all feedback belongs in a skill file -- a router layer that sends evidence to skill files, context files, memories, or a no-op queue prevents context pollution.
  • Weekly review cadence beats real-time auto-refinement for most teams: it collects enough signal to act on without creating noise from every individual output.
  • One corrupted shared context file can break every skill downstream that reads from it -- treat shared context like production infrastructure.
  • The constraint-based approach -- build a skill only when you have a repeating business need -- prevents the skill folder from filling with one-off agents that add maintenance cost without recurring value.
Glossary

Terms worth knowing.

Skill
A saved, reusable procedure that an AI agent follows every time a specific type of work comes up, capturing rules, examples, references, and evals in a single file.
Skill refinement
The process of improving a skill file over time using real-world feedback from actual usage, distinct from evals which test against known static examples.
Evidence inbox
A structured holding area where raw signals from external systems are validated and formatted into evidence cards before routing.
Evidence router
A skill that reads the evidence inbox and decides the correct destination for each piece of feedback -- skill file, context file, memory, knowledge base, or no-op.
The Judge
An AI gate in the refinement pipeline that scores a proposed skill update against the skill definition of done and assigns a confidence score before the update reaches a human.
The Three Ms
A human gate framework with three dimensions -- Megaphone (audience/public impact), Money (revenue or delivery impact), and Meaning (system direction or promise) -- used to decide whether a proposed change requires human approval.
Definition of done
An explicit description of what a good output looks like for a given skill, required before refinement or judging can be meaningful.
Blast radius
The scope of damage if an auto-refined piece of information turns out to be incorrect, used to calibrate how much human oversight a given update requires.
Hermes
A tool referenced as an example of unconstrained skill auto-generation, criticized for spawning skills for nearly everything regardless of business need.
Resources

Things they pointed at.

17:00productHermes
05:00toolFathom (call transcript source in demo)
Quotables

Lines you could clip.

00:00
So apparently, AI is supposed to rewrite itself now. Read your Slack, learn your voice, become you while you sip a flat white. Unfortunately, you been missold.
Punchy cold open, complete thought, instantly frames the problemTikTok hook↗ Tweet quote
01:24
Skill refinement turns real-world feedback into better reusable AI behavior.
Clean one-liner definition, quotable standalonenewsletter pull-quote↗ Tweet quote
01:59
Evals grade the skill. Refinement teaches the skill.
Perfect two-sentence contrast, no setup neededIG reel cold open↗ Tweet quote
08:14
What is the worst thing that can happen if the information that is auto-refined is incorrect -- the blast radius, if you will.
Introduces a useful mental model in one sentencenewsletter pull-quote↗ Tweet quote
16:56
I prefer a constraint-based approach, meaning I only build a skill when I have a business need or a problem that arises, and I know that the work is going to be repeated.
Clear opinion, directly contrasts with competitor toolsTikTok hook↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphor
00:00So apparently, AI is supposed to rewrite itself now. Read your Slack, learn your voice, become you while you sip a flat white. Unfortunately, you've been missold.
00:07In this video, I'll show you what self improving actually looks like under the hood so that you know how to approach it for the systems that you build for your clients. Let's get into it. So first things first, if you don't know what a skill is, you need to go and check my other videos on skills to understand what they are before you get into this.
00:21This will be a little bit more of an advanced concept. Second thing here, there are many ways to solve this exact problem we are talking about. This is just one way to do it.
00:28Now that that's out of the way, we can finally get into this video. So a skill in its simplest form is literally just a procedure that the AI is going to follow every single time. A workflow, standard operating procedure, whatever it is.
00:38And we do that because it saves you time, it improves the quality over time, and it's easier to iterate for the exact same work that you're gonna be doing over and over again. But there comes a problem where sometimes you end up repeating yourself because you don't take the time to address the problems with your skill after you've actually built it.
00:54I have this problem all the time with several skills up into the point where I actually built the skill refinement system, particularly because they were often small things, where I would just solve the problem on the fly, but I did it in the system that it was ending up in instead of just getting clawed to go and fix it.
01:06So our goal here is to stop that stupid manual pattern that a lot of people have and not just get to a point where our skills are refined, but also get to a point where we can auto refine some of them because I certainly believe you shouldn't be auto refining all of them. More on that in a second. So from my point of view, skill refinement itself, it turns real world feedback into better reusable AI behavior.
01:27That is how I'm kind of framing this whole video. So we take a rejected output over here, perhaps the wrong voice, maybe it was missing a rule, maybe it wrote a little bit too much AI slop. We're then chucking it into this evidence box, and once we've gathered all of our evidence into a box, we need to do several steps inside that box to then ultimately improve our skill, with the goal being that we want a refined, clearer, stronger behavior with inside the specific skill that we are working on.
01:52The important thing to remember here is that we are getting real feedback from real systems inside our actual working environment, and then using that to change the behavior of our skill for the next time that it runs, and therefore improve it. Something to keep in mind, evals are not refinement.
02:06They are not the same thing even though there is some overlap. To me, evals grade the skill. So we might evaluate against known static examples of work while we are building the skill initially.
02:16We would run our evals over as many examples as we could, and it would grade at each time to see which one was better, and we might give it feedback as to what was wrong, why didn't it match our expectations, why is it not our definition of good. Skill refinement, a comparison to me, would be something that teaches the skill over time with real world data, like I mentioned on the previous slide.
02:35It is something that is observed through usage. The usage informs the notes. The notes create the rules, and the rules improve the skill over time.
02:43To help ground this a little bit more, we need to look at the skill life cycle as a whole because it really is an entire process. We don't just build something and then refine it and job done. It all starts, obviously, by building our skill with a very, very clear definition of done.
02:56You need to come in there at least having an understanding of what you're trying to achieve. For instance, if we're trying to write LinkedIn content out there, you're going to have an understanding of at least what type of content you want to be writing, what type of voice you might want to be using, and you would give as much information as you can to AI upfront to go and build your initial skill.
03:13Once we've done that, we would then run some evals against where we are evaluating against the definition of done that we just created. Have we matched what we were trying to get to? After we've done that to the best of our ability, we would then use it.
03:24We would put it into real situations. Go out there and write based on x y zed. Maybe write me 20 posts.
03:30After we've done that, we're going to be collecting feedback, the outcomes, the observations, any of the edge cases that might have come out from running it over so many iterations. After that, we then get into the refinement process because now we have all of the data that we need in order to actually these refinements and learn from the things that we had from the previous runs.
03:46Then we would reevaluate to make sure that our behaviors have stuck and that the patterns are actually better than they were the first time that we set up the skill with our initial definition of done. And then finally, as a part of this, we might want to curate this, but I'll cover curation in another video. I wanted to bring up the skill life cycle because it is important to understand that you're not just going to be able to refine something if you've already given this thing a shitty definition of done from the minute that you started.
04:07And for those of you wondering, yes, of course, even if you don't have a complete definition of done upfront, you can use AI to actually help you understand what a good definition of done is in the first place. Many ways to research this before you dive into refinement. So always make sure that you're nailing the first step of this process very well because it means you're probably gonna have to do less refinement down the line.
04:26And that brings us on to our three layer system. Now we'll get into a prac in just a little but first we need to look at a few more slides to understand how the system works. So our three layer system is quite simple, and there are various intricate layers that go inside this, and I'll break them down for you.
04:38So with signal capture, we are pulling in all of the information from our outside systems. We would then have our refinement engine, which runs through various processes in order to look at all of that information and decide what it needs to do with it before it can apply any quality changes to that or any refinements.
04:54Finally, a part of this, we would need to understand the cadence behind it. So is this thing going to run daily or weekly? Is it going to be called at the end of a session inside your Versus Code environment or Claude code?
05:04Or are you just gonna have a weekly review that runs as a separate skill? There are many ways to skin this cat. You can even use hooks if you want to.
05:10But for the everyday user, hooks might not always be available. So I think for most businesses, having something like a daily or weekly review is probably the best cadence.
05:18Then next up, we can take a deeper look into the skills pipeline. When we understand what each skill is doing, it helps us understand the processes inside the system, which will make it a lot clearer for you guys. So like I said, the first skill that we wanna run is our signal capture.
05:31And if we take a look at what's inside there, we have our little evidence inbox, and you can see that we're just using it to gather all of that information from our outside systems over here. Rejected drafts from something that we wrote, maybe a call transcript from some Fathom sessions that we had, which could tie in with the session notes.
05:44What were the key takeaways, the questions, the updated client information? It would all go into our evidence box, customer comments, failed evals, whatever. It lives inside here so that we can use this to sort through it and ultimately manipulate it and refine it.
05:57Next skill we have is the evidence router, and that is doing exactly what it sounds like. It is routing the evidence that we have inside our signal capture box over here. It decides the destination of where all of this information that we have in the raw format gets sent to.
06:09Because not every piece of information that comes in from a system goes directly into a Skilled. M d. There are different things that form our Skilled.
06:16M d. There are references. There are context files.
06:18There are memories. Various things that go into our AI operating system as a whole that we might want to update. So we wouldn't just route everything to skills, and that's why we have this evidence router skill because it can intelligently decide based on the evidence that it's found inside that signal box where to send it to.
06:35For instance, if it learned a behavior about the skill that ran and it was specifically related to skill dot md not doing something, it would then route the behavioral change to say, oh, hey. We need to make a change to the skill.md.
06:45If we learn some new information from some prospects that we spoke to about their company, it would say, this sounds a lot more like context. It should probably go into the context file for this specific client so that the next runs that we have for anything related to that client, it takes into account that new context instead of living off of the old one.
07:02You get the point by now. We are rooting that raw information, sorting through all of it, and putting it in the right place so that later on when we get to the approval part, it knows exactly where it needs to go. Next up, we have the skill self update, and this is proposing the skill changes.
07:15It can't just do that off a whim. Of course, it is doing this off of actual evidence. And it's why I said in the beginning that we need to have a very clear definition of done because in this case, as a part of our workflow, we have this role of something called the judge.
07:25And what it's doing here, it's literally judging that the changes that are about to go through are actually of better quality or not. It is an AI gate in this case. In this case, just remember that the judge cannot know quality unless the skill defines what good is up front.
07:38So as a part of our skill, we have made very sure that ours has a very clear understanding of standards that we want for whatever the skill is that we're working with. So after step three, where we have suggested any ads or edits or things that we need to remove, we get on to step four where we update the context, and all of this stuff will end up in a folder called proposals.
07:56The reason that we do that is because we need a human gate as a part of this process. Like I said earlier, I don't think every single thing out there can just magically auto refine itself despite what a lot of people on YouTube say. There's definitely a need for a human in the loop here, and we need to take a look at why.
08:10Now instead of thinking to yourself, what kind of skills can I have running on autopilot? It's much better to look at the gate from the perspective of what is the worst thing that can happen if the information that is autorefined is incorrect, the blast radius, if you will, because that will help ground you when you're looking at auto refining your skills.
08:28I see a lot of people saying, oh, you know, I have this system where it pulls in all this information from my sales calls and automatically updates all my clients. Does that about five or six times a day. And I'm like, okay.
08:37Cool. So how do you know if any of that information is actually accurate for the clients that you're serving? Because the first thing that goes to my mind when I have discussions with people, if I can see that they're talking mostly based on hype, they don't really know what their systems are doing.
08:49So for me, this comes down to these three m's over here. The megaphone, this is how will it impact your audience, how is it going to impact your money, and how does it impact the meaning of your systems? Because having one wrong thing in one wrong place can destroy a whole cascading set of skills that you have.
09:04For instance, if you change the context for your ICP, how many skills use that ICP in order to fulfill whatever it is that part of that workflow does? So if you had to have some form of auto refinement on your ICP and nobody's checking this thing and the AI actually made an error in judgment, you're gonna screw up every single one of the workflows that rely on that.
09:21So you need to look at this from the perspective of who is affected, what value is at stake, does this change our promise or change our direction. If the answer is yes to any of those things, I would highly recommend pausing for a second, using a system like this, and literally just reading the proposed change, and then pushing it through.
09:37It will take you ten minutes and save you a ton of trouble. Great. So now that we understand the consequences of what can happen if we don't have a human in the loop, we need to find some form of cadence that works for us.
09:46For me and my cadence, I just do this thing weekly. I don't need something to run every single day, but that's just for the style of my business. You can do yours differently.
09:53So if we take a look at our options, we can either do any of these over here. The first one being manual. Now if you were working with Claude and you were having a conversation one day you and noticed that it made a few mistakes, you could literally just say at the time, hey.
10:04I don't want this mistake to happen again. Review our skill. Take everything that we've spoken about in here and update our skill so it never happens again.
10:11That's the easiest way to do it, and often that's the best way to do it because you're right there. You can just reseed the output, then run it through some evals again to make sure it doesn't happen. But then there are obviously other times this can happen after a call is a very popular one at a session end.
10:23So, again, when I close this, that is a session end. A hook would then fire, and it could update a whole bunch of things. You could also manipulate any of the hooks that happen in between the sessions that you have in Versus Code or clawed code.
10:34There are many of them. I'll cover that in another video. But for most average users out there, the weekly or daily schedule is probably gonna be your best bet, as well as pulling in information from other systems as they happen in real time, which you can do either via a skill or you can just use a webhook that pushes something down, however you wanna make that thing work.
10:50But for the average user, the weekly or daily cadence is probably gonna be the way to go. You can get it to pull information down from other systems, or you can just get those systems to push the information directly into the folders that we're about to take a look at. Cool.
11:01So here we are in my environment, and I'm gonna run through it very quickly. I've just given Claude a very simple prompt to run through everything I just spoke about and then stop at certain sections so that you can see how it pans out. On the left over here, you can see we've got our evidence folder.
11:13And like I said, this is all just markdown stuff. And each of the skills that will run as a part of our refinement loop, they'll be dumping things into certain parts here. Remember, in our first instance, intake is where all of the raw events come in from our outside systems.
11:26It could be Fathom. It could be Notion. Could be Slack.
11:29Any sessions that we have where we want to pull in that outside information. So I'm just gonna hit enter on this, and it's gonna run through the first part over here. It's gonna generate some mock data for us, and it's gonna throw it into the intake folder as if it has just harvested a bunch of information from our outside systems for us.
11:43Okay. So stage one is complete, and this was just generating some mock data for us. So we can see how our information is now coming from our outside systems.
11:51We have a draft for LinkedIn that was rejected, and we have an Acme procurement call from our Fathom transcripts. So the first thing here, it gives us a little bit of information around the draft of what was written, why the user rejected it.
12:03Of course, you would want to have reasons in there giving as much information as possible when you reject something because the clearer you are, the more the AI will understand what to learn from that. Then And it tells us why this matters and how it is going to affect future LinkedIn posts. Then for our call, this is based off of a discovery call with Acme Robotics.
12:20Gives us key moments from the call, and then it tells us why this matters. These are durable customer commercial facts, not one off task notes, and that means it's going to change several skills. For instance, if they had a rule like any deal over 25 k at Acme now requires procurement review, that means you're going to have to change your proposal, your statements of work, and perhaps some of the ways that you actually reach out to them in the first place.
12:41It also says that Acme now requires SOC two and signed DPA before production access, so that's gonna change delivery and a few other things that might affect the consultants who you'll be pushing to their site to go and actually do the work. It then also lets us know that multiple skills should know this, our proposal generator, our meeting prep, and our sales closer amongst a few other things.
12:59So capturing this raw information and distilling it where we can is very important. Next up, we have our signal capture skill, and for those of you who've been paying attention, that was the one that puts everything into our evidence box. We've taken all of our raw data.
13:11We've run it through the signal capture, and we've now built little evidence cards to prove that the information in there is actually valid and worthwhile going through our pipeline to make a change. So for instance, here, we've now moved out of intake into inbox. And if we look at our Acme procurement that we just read over, it gives us all the information that we need.
13:28Like, our source system was Fathom. The signal type was a changed fact. It gives a little bit of a summary, the observed problem, and the proposed lesson that we are hoping to learn as a part of this refinement.
13:39It lists the signal that we have, and then it shows the direct evidence from the raw events over here, verbatim moments. Anything over 25 k now has to go through a formal procurement review. It would have pulled that out of the transcript.
13:50Our security team requires a SOC two report and assigned DPA before any vendor touches production data. That's a hard gate now. Again, direct evidence from that transcript.
14:00You get the point here. We are building that evidence box before we push it further through the system. For the LinkedIn thing, it's pretty much the exact same process.
14:07The evidence here is just a user rejection verbatim about what the user might have said when they rejected this thing. You get the point.
14:13Cool. Now we're at stage three. And just to make it very clear, you wouldn't be doing all of this stuff manually stage by stage.
14:17That's ridiculous. It would be doing all of this on autopilot for you to the point where we would ultimately get to our proposals to review it in case you guys were wondering. But for stage three, what we have done here is our goal is to route the evidence to the specific place that it needs to end up in.
14:31So we've now gone to our inbox over here, and this skill has run. And you can see it's edited the files. So all of the same information is still in there, but it appended some stuff at the bottom, the router verdict, where this thing is going and why it's going there.
14:44It knows the destination is a skill, and it knows that the destination path is our LinkedIn content writer specifically for references and the anti AI writing guide, because in this case, this is where I rejected it for putting some form of AI slop in there. And its decision here was to propose, meaning there's going to be a human in the loop, and it's smart enough to know that because it's following those three m's that I spoke about.
15:04In this case, one change to this is a megaphone, meaning we are broadcasting something to an audience, so a human definitely needs to review this before we make a change to it. And it did the exact same thing for our Acme robotics. It told us over here exactly where it needs to go.
15:18In this case, the destination type is context, the and context, the client's name, and then acme. M d, which is all of our customer information for them.
15:26For this one, the decision is also human in the loop because, again, this is going to affect broad customer work where we might have multiple systems using this information in order to deliver services to them. In terms of what gets stashed where, you can see we now just have our rooted folder, which has the same information from our inbox for Acme procurement, which goes into context, and then our skill over here will be for LinkedIn that gets updated as well.
15:48And at this point, we haven't made any change yet, so we need to move on to the next stage. And then as a final part of this process, we obviously have our proposal, which has our judge in it. It goes through this information that we have already looked at, and it puts it into a proposal for us before we go ahead and make this change as our final gate.
16:05It tells us the type of the change. In the case of our LinkedIn writer, we are adding a very specific rule. The reasoning behind it is because the rejected draft shows the writer reaching for the let that sink in AI slop.
16:15As a part of it being a judge, it also gives us a confidence score, and then we have the diff to see what is actually gonna change. So if we were reading this as the human in the loop, we would be able to see the change. And if we were doing this through GitHub, would obviously use a pull request for this that you would review as a pull request and then accept the change and would go through just like code review.
16:32And that's it. I realize it can get quite convoluted, but it is the framework behind this that is so important to understand. There are many different things that you can manipulate as a part of the system, and you certainly can do different forms of this.
16:42Hermes does this quite well, obviously, but I have a problem with Hermes in that it creates skills for nearly everything when that's absolutely not necessary. For me, I prefer a constraint based approach, meaning I only build a skill when I have a business need or a problem that arises, and I know that the work is going to be repeated, not just a one off thing, and then all of a sudden, an agent goes and spawn something that will just die inside my skill folder and never get used.
17:05I also think that if you use this kind of loop inside your business or you're setting this up for your clients, it's a much better way to do it because it gives them the confidence to know that their systems aren't just being filled with information that they might not know is entirely accurate. Because most people out there don't know how these systems work, and they're being misled by products like Hermes that will auto update everything for you, and YouTube is claiming that it is just a magic solution when in reality, it's not.
17:28It will still make mistakes. Problems can still happen, but people aren't thinking about that because they aren't aware that they need to be looking at this stuff, which is why as automated as we can make this entire process in the video, we certainly need to have some point of a human gate. So I hope this video was helpful.
17:41Leave some comments down below. If you have any questions, I will get back to you. Otherwise, check out the videos on the screen now.
17:46They'll definitely help you in your journey. Thanks very much for watching. See you guys.
The Hook

The bait, then the rug-pull.

The pitch is irresistible: AI that reads your rejections, learns your voice, and updates itself while you do nothing. Mansel Scheffel spent 17 minutes explaining why that version does not exist yet -- and building the one that actually can.

Frameworks

Named ideas worth stealing.

02:43model

The Skill Lifecycle

  1. Build
  2. Define Done
  3. Eval
  4. Use
  5. Capture
  6. Refine
  7. Re-Eval
  8. Curate

A closed loop that governs the entire life of an AI skill from creation through continuous improvement.

Steal forAny AI agent or workflow system where output quality needs to compound over time
08:08model

The Three Ms Human Gate

  1. Megaphone (audience impact)
  2. Money (revenue/delivery impact)
  3. Meaning (direction/promise impact)

Three dimensions to evaluate whether a proposed auto-refinement change is safe to apply without human review.

Steal forAny team deciding which parts of an AI system can run on autopilot vs. need approval
05:18model

Five-Skill Refinement Pipeline

  1. Signal Capture
  2. Evidence Router
  3. Skill Self-Update (with Judge)
  4. Update Context
  5. Weekly Skill Review

A modular pipeline where each skill has exactly one job, keeping the system understandable and debuggable.

Steal forBuilding client AI systems that need a defensible paper trail for every skill change
CTA Breakdown

How they asked for the click.

VERBAL ASK
17:29next-video
Check out the videos on the screen now.

Standard end-card outro. Also links to AI Native Skool community in description.

Storyboard

Visual structure at a glance.

hook
hookhook00:00
skill definition slide
contextskill definition slide00:33
what refinement does
valuewhat refinement does01:20
evals vs refinement
valueevals vs refinement01:59
skill lifecycle
frameworkskill lifecycle02:43
three-layer system
frameworkthree-layer system04:25
pipeline overview
frameworkpipeline overview05:18
the judge
valuethe judge07:30
the human gate
valuethe human gate08:08
pick a cadence
valuepick a cadence09:32
demo start
demodemo start11:00
stage 2 evidence routing
demostage 2 evidence routing13:20
stage 3 router verdicts
demostage 3 router verdicts15:00
outro + Hermes critique
ctaoutro + Hermes critique16:42
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this