Big Idea

The argument in one line.

The quality of a Claude skill is determined almost entirely by how well you document the human workflow before you touch Claude — model choice is a distant second.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

A consultant or solo founder who wants to automate business processes but does not know which workflow to start with.
Someone already using Claude who keeps getting mediocre output and assumes a bigger model will fix it.
A non-technical builder who wants a decision framework for Haiku vs Sonnet vs Opus and Low vs High effort.
Anyone who has heard of Claude skills but feels overwhelmed about where to begin.

SKIP IF…

You are a software engineer looking for code-level skill implementation — this is business-workflow focused, not a coding deep-dive.
You are already running evaluated, skill-chained production workflows at scale.

TL;DR

The full version, fast.

The most common failure mode in building Claude skills is skipping straight to Claude without first documenting the human workflow that needs automating. The presenter maps every business into four pods (Acquisition, Delivery, Operations, Support), audits the workflows inside each pod to find the highest-ROI process, then briefs the build using one of three modes: reverse-engineer if you know the steps, fill-the-blanks if you only know start and end, or go back to audit if you cannot explain the behavior at all. From there, skills progress through three stages: proof of concept (Sonnet medium, just prove it works), refinement (rubric plus evaluator loop), and decomposition into skill chains when context costs climb. Model escalation comes last: fix the prompt first, bump effort second, add a rubric third, then decompose before ever touching model tier.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:22

01 · Where people get stuck

Host names the problem: people dive in without a grounded starting point. Introduces the Four Pods framework as the pre-skill map for any business.

01:22 – 02:51

02 · Audit the workflows first

Workflow audit reveals Automate / Assist / Keep buckets, surfaces compliance gaps, and identifies highest-ROI process to tackle first. Argues Operations beats Sales as the starting pod.

02:51 – 04:49

03 · Three briefing modes

Mode 1: Reverse-engineer (walk backwards from goal). Mode 2: Fill-the-blanks (give what you have, Claude fills gaps). Mode 3: Not ready yet — go back to audit.

04:49 – 06:30

04 · Three stages of skill development

Stage 1: Proof of concept with lowest plausible tier. Stage 2: Refinement via rubric and evaluator. Stage 3: Decompose into skill chains when context overhead climbs.

06:30 – 10:24

05 · Live demo: Skill Creator in Cowork

Builds a LinkedIn DM outreach skill live. Shows how to install the skill creator plugin, submit a structured workflow brief, respond to qualifying questions, and review the generated SKILL.md.

10:24 – 17:00

06 · Model and effort level selection

The complexity ladder: No AI to Haiku to Sonnet-medium to Sonnet-high to Opus. Pricing table. Five effort levels with MAX flagged as a trap. Escalation order.

17:00 – 19:37

07 · Testing with evals

Write 3-5 concrete success criteria. Run 10 real inputs. Grade programmatically or LLM-as-judge. Failure above tolerance means escalate.

Atomic Insights

Lines worth screenshotting.

Most Claude skill failures are prompt failures, not model failures — the model gets blamed for a briefing problem.
Operations is usually the highest-ROI pod to automate first, not sales, because broken back-end processes compound silently.
If you cannot explain the behavior you want AI to take, you do not understand it well enough to automate it — go back to the audit.
The escalation order is fixed: prompt fix, then effort up, then rubric and evaluator, then decompose, then only then model up.
Sonnet handles the majority of real business workflows; Opus is for genuine structural complexity where you have to invent the procedure.
MAX effort is a trap — Anthropic says not to use it for most workflows, and in practice it overthinks and stalls.
Running a skill three times is not an eval — write 3-5 concrete success criteria, run 10 real inputs, grade programmatically or LLM-as-judge.
N8N and Make exist for a reason: if a task needs no judgment, use dumb plumbing — it is more reliable and cheaper than a skill.
A good proof of concept does not require knowing the right model or thinking level — just prove the idea can work.
Bad output is almost always caused by bad examples or missing guardrails, not by the model being too small.
Skill chaining is cost-driven, not complexity-driven — you decompose when context overhead makes the single skill too expensive.
Agents are for unknown paths; skills are for known, repeatable workflows — do not reach for agents when a refined skill will do.
Haiku earns its place when a task needs one simple decision rule, not judgment — tagging emails by label, not analyzing sentiment.
The Haiku-or-Sonnet test: if a junior could follow a one-page rulebook to do it, Haiku fits; if they need to write the rulebook first, Sonnet.

Takeaway

The workflow comes before the model.

WHAT TO LEARN

Every Claude skill that returns slop has a briefing problem upstream — and fixing that problem follows a fixed sequence that never starts with upgrading the model.

Map your business into four pods (Acquisition, Delivery, Operations, Support) before touching any AI tool — this gives you a logical place to start instead of a blank page.
Audit the workflows inside those pods to triage each step into Automate, Assist, or Keep; the audit is where you discover the actual process Claude will need to follow.
If you cannot explain the behavior you want from AI step by step, you do not understand the workflow well enough to automate it yet — the audit is the fix, not a bigger model.
Start every skill as the simplest possible proof of concept at the lowest plausible model tier; you are proving the idea can work, not building the final version.
When output quality is insufficient, escalate in this order: fix the prompt with better examples and guardrails first, then bump effort level, then add a rubric with an evaluator loop, then decompose into skill chains, and only then consider a larger model.
Skill chaining is a cost and context management technique, not a complexity technique — you decompose when the context overhead of a single skill becomes too expensive, not because the task feels hard.
Testing a skill is not running it three times and eyeballing the result; write 3-5 concrete success criteria, run it on 10 real inputs, grade with a programmatic check or LLM-as-judge, and ship only when the failure rate is below your threshold.
Use deterministic tools (N8N, Make, plain scripts) for tasks that need no judgment — they are more reliable and cheaper than a Claude skill, and reserving AI for judgment-required tasks makes your entire system more predictable.

Glossary

Terms worth knowing.

Claude skill: A structured prompt or SKILL.md file that encodes a repeatable business workflow so Claude can execute it consistently on demand, without redescribing the process each time.
Four Pods: A business decomposition framework: Acquisition (getting clients), Delivery (doing the work), Operations (keeping the lights on), and Support (keeping clients happy). Used to identify which workflows to audit first.
Skill chaining: Breaking a long skill into sequential sub-skills, each running in its own context window, so that only the final answer is passed back to the orchestrating model. Primarily a cost and context management technique.
Evaluations (evals): A structured test suite for a Claude skill: write a concrete definition of good output (3-5 criteria), run the skill on 10 real inputs, grade each output programmatically or with an LLM-as-judge, and escalate if failure rate exceeds a threshold.
Effort level: A setting in Claude-based IDEs (Cowork, VS Code) that controls how much extended thinking Claude applies. Ranges from Low to MAX; the presenter recommends starting at Medium and escalating to High or XHigh for most workflows, avoiding MAX in production.
Dumb plumbing: Automation tools like N8N or Make that execute deterministic tasks without AI judgment. Preferred over Claude skills when the process has no decision points that vary by input.
LLM-as-judge: Using a separate Claude call to evaluate whether a skill output meets the defined success criteria, as an alternative to programmatic grading when the quality criteria are subjective.
Rubric: A written definition of what good output looks like for a skill, embedded in the prompt. Adding a rubric is the first escalation step when output quality is insufficient, before changing the model.
Cowork: A Claude-native IDE built by Anthropic (distinct from VS Code) that ships with built-in skill creation, project management, and a plugin marketplace including the skill creator.

Resources

Things they pointed at.

06:30toolCowork

07:00toolAnthropic Skill Creator plugin

08:30toolN8N ↗

08:30toolMake ↗

13:55toolHeyReach

13:53toolApify

13:53toolRelevance AI

13:53toolFirecrawl

00:00productskool.com/ainative community ↗

17:12linkEvals deep-dive video ↗

00:00linkSkill Chaining video ↗

Quotables

Lines you could clip.

02:12

“If you cannot explain the behavior or the action that you want AI to take, that means that you do not understand it well enough, and therefore, you should not be automating it.”

Standalone rule — no context needed→ TikTok hook↗ Tweet quote

14:20

“You do not just replace a model because you got a bad DM output. That would be ridiculous.”

Short, punchy, challenges a common mistake→ IG reel cold open↗ Tweet quote

14:34

“If the thing is writing AI slop, that means you either have not given it enough guardrails or you have given it bad examples of what good is.”

Actionable diagnosis, no preamble needed→ TikTok hook↗ Tweet quote

02:35

“The clearer we are upfront before we even look at Claude building a skill for us, the better the skill is gonna be from the get go.”

Core thesis in one sentence→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogystory

So you know what a Claude skill is, but the problem you have now is that you have no idea how to turn your business processes into a skill inside Claude. Where do you start? What model do you pick?

How do you plan it? Where does the idea even come from? People ask me these questions every week, so here's the answer.

So when it comes to skills, we need to start in a very important place, which is right at the beginning because you don't want to go into your business and just say, I need skills. Let's go and implement them. You wanna start with a grounded approach, and I like to start by breaking down the business into these four pods.

Now anything that I speak about in this video at a high level will have a detailed video link down below because a lot of these concepts, they have very deep dives that we need to go into, and that would just make this video way too long. But for your context here before we get into the skill building, when you're about to build your skill, you wanna start with mapping out your business into acquisition, delivery, operations, and support.

And, really, we do this because it helps your brain logically group what skills you might need. For instance, acquisition is just how you get your clients. Delivery is the work that you're going to be serving for your clients.

Is it a product? Is it a service? How do you deliver that to them?

Operations is the stuff that keeps your lights on in the back office, and, generally, this is where people should start because this is the broken processes that you have internally that shift between the various departments. Then you also have support, which is how you keep your clients happy.

So when you look at these four pods, you can easily look at your own business and understand, this is what our sales process looks like. This is what we are delivering to our client. This is what our back end looks like, and we need to address x y zed.

But you wouldn't just look at these four little pictures over here and magically know everything that's going on, and that's where we get into the next part over here, which is the orders. Now we do the order particularly because we want to understand where we're gonna get the most return on investment for any skills that we build first.

There's no point in building a skill that has absolutely no relevance to an important part of your business. A lot of people think sales is the best place to start, but like I said earlier, realistically, a lot of the time, it is the back end processes that need addressing first. So with an audit, it helps us uncover all of that stuff.

And, again, the deep dive to this walking you through it step by step is in the link below. But for here, all you need to know is that the audit helps deal with the confusion of how to actually build the skill in the first place. Because if we look at it from the perspective of the human sitting next to you doing your sails or whatever, if you sat down with them and walked to them through a process step by step, you would uncover the exact process that Claude needs in order to build the skill and automate this thing for you.

So the clearer we are upfront before we even look at Claude building a skill for us, the better the skill is gonna be from the get go. But also in doing this, we get to uncover what is the lowest risk with the highest impact that we can have first. We also find out where there are compliance problems where people are actually burning time.

There are so many valuable parts that come into this, and that's why I always tell people never to skip this. Another benefit of doing this is also that we understand what we can actually automate entirely, what we can just have AI assist us with, and what we actually need to keep humans doing. So that immediately gets rid of this overwhelming, I don't know what to do next thing because now you have built a step and you need to start speaking to Claude.

So you've done your audits and now you understand the steps or at least some of the steps that form part of this process. And I like to think of three modes towards building skills. The first one being that you can reverse engineer.

So you look at your goal, which is I want to write DMs to leads automatically. Then you would walk backwards step by step. Okay.

How do we get there? How do we get there? You can walk backwards through your process that way alongside Claude to make sure that it builds out the system for you before it tries to automate anything.

Doing reverse engineering is really good if you don't know all of the steps for your workflow because it helps you understand that goal and then figure out, okay, how would we get there by going backwards? That's why reverse engineering as a principle is so popular in software engineering. Mode number one is the one you want to be aiming for which is the clearest possible picture for Claude because that exact business process is what it's going to use and augment for your skill workflow.

But if you don't have that, and realistically, a lot of people don't have those exact steps or don't know those exact steps, that's where mode two comes in, where you give whatever you can, and Claude will help fill in the blanks with you. The great thing about this is that you can use it as your research buddy. For instance, if you don't know what tools are actually used to go out there and do automated lead gen and things like that, while you're building the skill or the workflow with Claude, you will have a discussion about what is available.

What are other people in the space doing? Something to note here though is you wanna be as specific as possible. Don't just say, what are people doing for lead gen?

Say, I'm a solo founder working in ecommerce. I have x y z clients. I'm trying to achieve this.

Can you tell me what people in this space are doing in 2026 to get leads? That will give you a much better answer, and we always want to be as specific as possible with AI to get the best possible results.

Then the third mode here is simply that you're just not ready to build a skill yet, because if you cannot explain the behavior or the action that you want AI to take, that means that you don't understand it well enough, and therefore, you shouldn't be automating it. You need to make sure that you understand the picture before you give something to Claude.

Otherwise, you're gonna end up with slop. I'm not saying you shouldn't experiment and do proof of concepts here. I'm just saying you need to have a very clear idea or work with AI to get to that clear idea before you try and automate anything.

Cool. So we're nearly ready to jump into the prac, but first, we need to understand about how to actually start this process. We've now identified exactly what we need to do, but it still seems so overwhelming because there are so many different forms or ways that we can build skills.

For me, what I like to do is ground things in doing the simplest thing possible first. You wanna focus on a standard proof of concept. You just wanna know that this thing can actually work and do the thing that you need it to do.

So at this stage, I don't care about the model or the thinking level that it has. I don't care about anything other than ensuring that we have all the connectors that we need to. It's able to understand the process that we have just developed and that it can walk through it, and that the thing that we actually want to achieve is doable.

So that is just our proof of concept, and you don't really need to put much thinking into that sort of thing. Obviously, if you've done this a lot and you do know what this thing is already capable of, then you can start off with a better proof of concept. But for most beginners out there who have no idea, just start.

Just do something and make sure that your idea can work based on the work that you've done in the previous steps. Once you've done that, you can move up a layer and you start refining. You can use Claw to help you go back and forth and understand how can we make this better?

How can we make it more efficient? Do I really need to be using AI for this part, or can I just use scripts? Stage two is all about that refinement.

And then stage three is all about decomposing the skill. So if you have a really long skill and you wanna break it down into chunks for efficiency and cost savings and several other reasons, you would want to start using skill chaining.

Now I have deep dives into each of these things as well that will be in the description below so that you can see what to do when you get to the skill chaining stage. But, realistically, this is mostly cost driven. So if you're a beginner now and you just wanna get your skills working, like I said, start with step one, move on to step two.

And then if you run into a problem because the skill is absolutely nuking your context window, that's when you want to start getting really, really efficient and use skill chaining. I'm So gonna be using Cowork for this part of the demo. Normally, work in Versus Code, but I'm trying to make this more accessible for nontechnical people as well.

Technical people will know how to use both things. What we're gonna be using here is we're gonna be using a plugin that comes with Anthropic called the skill creator. It's not shipped by default, you need to make sure that you get it.

Once you're inside Cowork, all you need to do is head on over to customize, and then you want to go to create new skills. Then you wanna click on this little plus over here, and you're going to go to browse skills.

You'll see that all of Anthropics are in here. We just wanna grab the skill creator up at the top. I've already got it.

All you need to do is click add, and it will install it into your environment. We will be using this to just talk to Claude about the things that we've identified during our audit. Okay.

So now that we've got the skill creator, we're ready to create our skill because this thing will just walk us through it and build it with us. But like I said, if we come prepared to AI, we're gonna get much better output. So don't worry about all of this text over here.

This is just an example output of what AI can help you create once you've gone through the audits and the pod mapping and stuff like that. With my goal being over here to give this thing as much information about the exact workflow that I do, which I would have gotten from sitting next to my sales guy right next to me.

All we're doing over here is we're telling Claude what the input is, which is a LinkedIn profile, and the output is a three message LinkedIn DM sequence with more specifics around it anchored in our actual voice. Then all we are doing is walking through how the human next to me would do this step by step.

Claude will understand these steps with the intent that it can automate these steps for you, obviously. And then further down over here, we would obviously need to characterize the voice and things like that so we can give it examples of what the voice looks like and really go into a lot of details so that our POC is a lot better straight out the door.

Don't worry. It really doesn't need to be as complicated upfront. This is just an example of if you really know your processes, this is the way I would start because you will get that better answer.

For instance, if you don't know what tools are available, you can ask Claude. Like I said, what tools are good for this? You could ask it to mimic people who are really good at copywriting or really good at writing DM messages.

Think about what might have worked for you when somebody DM'd you and you're like, oh my god. I have to get back to this person. Why did that work?

Who was that person? Go and grab some of their writing. Get Claude to analyze it.

Make that your voice for the business. The point is over here, I want you guys to come prepared because when we have that kind of level, we're going to get much better output. Then down here, we're defining what good looks like so it understands a little bit about some of the things that we want.

Again, if you don't know that, just give it examples from people who you do like. On a side note here, if you were using Cowork and you wanna see how to turn this into an AI operating system with acquisition operations support and all of that living inside projects and running autonomously, there's videos down below as well for that.

But for now, we're gonna run this thing, and I'm just gonna show you what Cowork will actually do. We've given it quite a lot of information, so it might not ask too many questions. But like I said, if you didn't know your entire process, Cowork is built to have step by step guidance with you to build the skills that you actually need.

If you give it vague context, it's going to ask for more because it needs that in order to understand what tools you might want, what is the definition of good, basically, all the stuff that I've already listed here for it. While that thing is cooking, I just wanna touch on something else. So a lot of people seem to think that you need to use AI for everything, but NNN and MAKE and those things, they were around before AI for the reason of dumb plumbing.

So if you have a really simple task that does not need judgment or decision making from an AI model, just use dumb plumbing. Use n n n, use make, get that thing done. Even a pure script can do it.

It's much more efficient, and it's much more reliable. It is going to do the same thing every time. If the process itself doesn't need to be repeatable, that is just a one off prompt that you can save if you ever needed to run it again.

We only look at running skills when we need that repeatable workflow automated over and over again. We only look at skill chaining if the cost of running that skill becomes too large or it is overwhelming for that model to do it accurately. And then finally, if the path is unknown, that's where we would start looking at agents because agents are better at kind of yoloing their way and just figuring things out where there are steps missing as opposed to a refined skill, which is the best practice to use for your business workflows.

And at this point, you might be asking yourself, okay, but there are so many models. How do know which one to use for my skill? Which one is the right one?

And what about the right level of thinking? So when we look at the different models out there, if I had to look at Haikyu, that is gonna be the simplest model that they have, and I'd use that if I want this dumb plumbing that we just spoke about, but I wanted to be able to make a very small decision or a very small judgment.

That's where Haikyu comes into play because it is that simple decision that this thing can cater for or a rule or something like that. Think about it like you wanted to search through your Gmail inbox and just give them some simple tags or labels related to the email itself. Haiku can take care of that.

It's only when you want to start having sentiment analysis and analyzing batch emails and all sorts of things where that model would no longer be suitable for that at scale. Next up is Sonnet. It is a very capable model and realistically, that's where most businesses are gonna be sitting for their workflows, especially once you've gotten to the refinement stage or even if you're using skill training with forks.

Again, that's in another video, but you can really get away with these lower models if you set up these chains properly. But for people running a single skill, if you're writing DMs and they're producing the kind of workflow that we're doing now with the sales cycle, Sonic can do most of that stuff quite easily with the right thinking level.

Finally, Opus. Obviously, everybody knows what Opus is. It's genuine complexity.

Historically, before we cared so much about all of these usage limits, Opus for everything was kind of the way to live your life, but that's really not the right approach when you're building these systems and it's not sustainable. There is no way we were gonna be do that forever unless computer hardware gets a massive upgrade in a very short period of time.

Anthropic could never keep up with it. So for me, I use Opus when I'm planning really complex tasks. I use it when I'm running complex workflows that I couldn't turn into skill chains that live in their own little sub agent fork.

It really is just the brain that takes care of my hardest tasks before I delegate it down to Sonnet. Then finally, before we jump back into the prac, we need to look at the effort levels. So for me, I don't ever touch low or medium.

Of course, if you wanted to tinker that much, you could. Generally, that might be with mindless tasks that are really quick that we just want to get done, that don't require a lot of thinking. But for these other three, you need to be quite careful because if you don't make it smart enough, then it's not going to be able to cater for your workflow.

But if you make it too smart, then then it's gonna get stuck in an overthinking loop. So you're kind of stuck between this thing that can't think properly and something that thinks too much and screws up your workflow because of that. Anthropic blatantly say to not use Macs for the majority of your workflows, and when I tried this, it got stuck in an overthinking loop for about eleven minutes versus when just using extra high.

So realistically, that puts the majority of the workflows that I run between high and extra high. Whenever I'm planning or building something, I use extra high. And for my everyday use, I'm currently using high.

Again, this might not be the most efficient path, but in my experience, if I'm already using Sonnet, I wanted to have high so that when it's writing my DMs, has more creative input, and it's able to solve the problem a lot better and follow the instructions as a part of the skill that I've built for it. So you could use that as some kind of bar to sit between while you're building your skills, but equally, could take that constraints approach that I take on this channel for everything that we do and start with something much lower.

And when you run into a problem, that's when you can bring the thinking level or the model up. You don't just replace a model because you got a bad DM output. That would be ridiculous.

If the thing is writing an AI slop, that means you either haven't given it enough guardrails or you've given it bad examples of what good is because those two things alone should increase this thing's ability to give you good by a very high metric. Now back in prac land, we can see that Cowork is giving us some suggestions and asking us some qualifying questions based on the things that we put in here.

For this, it's asking us how to get the LinkedIn profile data. I think these examples here, they're probably given because I can see that I've got these MCPs hooked up in my environment already. So realistically for me, these are not useful.

I would probably wanna use Apify or maybe Relevance AI or a custom scraper that I would put in here. But for now, I'm just gonna say manual paste. Where does the final three d m sequence go?

See, I wasn't clear enough about where we actually want this thing to be stored. So the most common use case would be Google Sheet or Airtable because you want that human in the loop before we push it to HeyReach to view the DM. Then over here, it's asking us which model and effort, and it has recommended Sonnet with medium effort for us.

Realistically, this is where I would start with this kind of thing just to see what we can get away with at this level by giving it really good examples. Can we get exactly what we want? Okay.

And while that part is cooking, we are going to take a look at what you can do if the output that you get the first time isn't actually the thing that you wanted. So the first thing that people do is they will obviously just think, okay. I need a better model.

But realistically, like I've been saying, that's not true. The first thing to do is to make sure your examples are as good as you think they are and see what output you get based on that. Then I might look at bumping up the effort that it has.

So you can see this thing chose Sonnet with medium thinking. I would probably shift that to high before I jump straight into Opus. If the output that I gave you is mostly right, let's say 80% in this case, a rubric in the prompt plus an evaluator loop is probably the better approach to take with this to say, okay.

Well, I've written this thing, and then and then an evaluator goes and looks at this thing and says, that's not right. We need to go back and fix it. And it does that loop until it's done.

Oftentimes, having that simple little loop in there can fix up the problem alongside those good examples that you've given it. If whatever skill you're building is really heavy on context and it produces a ton of output that goes through so many multiple steps, that's where we want to decompose this using skill training and different context forks.

Because then all of that context gets isolated in its own little steps, and there are a few other tricks that we can do in there to make sure that we're really efficient with our context, and realistically, what comes back to the main model is almost nothing other than the answer that it needs. But you'll know when you're at that stage, and that's when you go watch the other videos I've made.

And so after those questions that it wrote, it then gives us our POC that this thing is gonna be building with the proposed name, the proposed front matter for her skill, everything that we're going to be using in order to achieve this goal, why it made these choices, so that's really good to understand. You wanna make sure that you're reading this even if you don't understand it.

I get a lot of you are saying, I don't know what I don't know. But inside here, you have the opportunity to figure out the I don't knows by asking clarifying questions. If you do this once and you do it properly and you take the time to learn while you're doing it, you won't have those unknowns anymore, and you'll be able to do this a lot more efficiently.

Then it will tell you a few things that are not in your environment, so we don't have any Google Sheets MCP in here, so we would need to connect that. It doesn't have any voice examples or what definition of good looks like, so we need to give that to them. And it also has no idea what we're actually selling to people, so it needs that.

So you can see over here where I did leave things out, even where I was really specific with my ask, it still found out the things that it needed and the context that it needs in order to build us a much better skill. And that's why I like Cowork because they're particularly building this for people who don't have the knowledge to go and think about all of these steps.

They've made it very easy for you guys. And then it just went ahead and wrote our skill for us, so our skill.md. It didn't invent a voice.

It didn't generalize anything, and it's still waiting for our research that we could later come and add to this thing, add any of the functionality or tools that we've discovered that are really good for this during our research, put them all in here, and then it would just walk you through it like it has over here to build out your skill.

It will always leave you with any open steps so that you do get to that refinement level when you're running through your POC. And then you would just need to save the skill, and you can choose wherever it is that you want to save it. Of course, you could save it directly in this environment so that you can then run it.

That brings us into the next part where we need to actually test this thing. So testing obviously forms a really important part of this, but it's not as simple as just running something one or two or three times. That doesn't mean it's gonna work the same way every single time that you want.

Like I hop on all the time, we need that kind of determinism and reliability in our business workflows just as we would want with an employee who's consistent and really good at closing sales or whatever their job is. We want the same kind of thing from AI.

So we need to run something called evals, and I have a whole video on that which I'll link below as well how to do that with your skills. But most importantly, like I've been saying throughout this whole video, you need to have a very clear definition of done. If you cannot explain that to the AI or work with the AI to make it understand the idea that you're trying to portray, you're never going to get a successful skill and workflow because even you don't know.

So that brings us back to why the pod mapping and the auditing is so important because it uncovers all of those gaps for you. When you do that, you work with the AI to get exactly what you want. And when you've gotten there, the first thing that you would do is flip back over to Cowork and just run the skill that it built and see if it runs, like, literally goes into the systems and does the things that it's supposed to do.

We don't care about quality yet. We don't care about whether it is the perfect DM. We just care that every single step inside of the skill.md is actually written in the way that you need it.

Because that forms our first level of refinement. Right? We need to make sure that we've got the ground things working before we can realistically go and look at anything else.

Once we've got that out of the way, then we get into the refinement stage where we're looking at the voice and saying, well, it used all this corporate slop over here. I don't want any of that crap in my workflow. Get rid of it.

So we change that or we put in a guardrail. We put in something that would stop that from happening. If it's not funny enough, if you're not funny yourself, go find someone who's funny.

Put an example of funny inside there so it understands what a funny post looks like or a funny DM. Point is, that is level two of testing our skills. Then you would get into things like evals where you're running evaluations on this.

And again, the video is in the link below. Watch that when you need to. Don't start off by watching that and overloading your brain before you've even built something base.

The whole goal here is to iterate as fast as possible to prove your concept and then refine, and then finally make it more advanced if you need skill forking and cost savings and all of that. But that pretty much wraps it up in this video. Actually, doing the work is super simple as long as you can explain and elaborate on the thoughts that you have alongside AI as your copilot.

So I hope this video was helpful. If it was, leave some comments down below. Otherwise, check out the videos on the screen now.

They'll definitely help you in your journey. If you do need extra help with this or you wanna build your own AI operating system, you can check out my community where we're solving that problem every single day. Thanks very much for watching.

See you guys later.

The Hook

The bait, then the rug-pull.

Every builder who has spent an afternoon prompting Claude and gotten slop back has blamed the model. The presenter argues that the model is almost never the problem — the briefing is. This video is the systems cure for that reflex.

Frameworks

Named ideas worth stealing.

00:22model

Four Pods

Acquisition
Delivery
Operations
Support

Maps every business into four functional areas to identify which workflows to audit first.

Steal forany business process audit or AI implementation roadmap

02:00model

Automate / Assist / Keep

Automate (AI does it, no judgment needed)
Assist (AI helps, human decides)
Keep (human owns it, needs brain)

Triage framework applied to each workflow step during the audit.

Steal forAI readiness workshops, change management decks

03:00model

Three Briefing Modes

Mode 1: Reverse-engineer
Mode 2: Fill-the-blanks
Mode 3: Not ready yet

Decision tree for how to approach Claude when building a new skill.

Steal foronboarding documentation for any AI workflow build process

05:00list

Three Stages of Skill Development

Stage 1: Proof of concept
Stage 2: Refinement (rubric + evaluator)
Stage 3: Decompose into skill chains

Sequential build stages preventing over-engineering.

Steal forany iterative AI product development process

10:30model

Complexity Ladder

No AI: dumb plumbing
Haiku - Low: one simple decision rule
Sonnet - Medium: reading + producing
Sonnet - High: real decision space
Opus - Rare: genuine complexity

Maps task complexity to model tier. Haiku if a junior can follow a one-page rulebook; Sonnet if they have to write the rulebook; Opus if they have to invent the rubric.

Steal forany AI system design document or model selection guide

14:10list