Why Modern Creator?

Marketing Against the Grain · YouTube

Automate Boring Tasks With Codex & Claude Code in 26 Minutes

A 26-minute interview where a creator walks through the exact skills-and-evals system he uses to run his podcast and newsletter on near-autopilot.

Posted

June 17th

1 months ago

Duration

26:03

Format

Interview

educational

Views

613

30 likes

Big Idea

The argument in one line.

The difference between AI that saves you time and AI that produces slop is not the model — it is the system you build around it: reusable skills that encode your workflow, and pass/fail evals that enforce your standards automatically.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use ChatGPT or Claude for recurring work tasks and keep re-entering the same instructions every session.
You produce weekly content — newsletter, podcast, social posts — and spend most of your time on assembly rather than on ideas.
You are a non-technical creator or marketer who wants to build automations without writing code.
You have tried AI projects or prompt libraries but found them too manual to maintain over time.

SKIP IF…

You need technical Codex API documentation — this is a strategic walkthrough, not a coding tutorial.
You already run a mature personal OS with skills and evals — the core concepts here will be familiar.

TL;DR

The full version, fast.

Chat-style AI starts from zero every session; agent-mode tools like Codex and Claude Code let you store reusable plain-text skills that compound over time. Chain a podcast-prep skill to a thumbnail-title skill to a post-production skill and a full day of copy-pasting collapses to a supervised run. The eval layer is the unlock: a second AI agent with a pass/fail checklist derived from your best existing work rewrites drafts until every check passes. The honest counterweight is that AI brain fatigue is real — both hosts admit they have lost the ability to start a draft from scratch — and the system only works when human taste and original ideas stay in the loop.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Voices

Who's talking.

00:38guestPeter Yang

00:00hostKieran Flanagan

00:00cohostKipp Bodnar

Chapters

Where the time goes.

00:00 – 01:00

01 · Hot Take: Stop Using ChatGPT

Peter's opening claim: chat-mode AI holds you back; Codex/Claude Code actually execute recurring tasks

01:00 – 02:00

02 · What Evals Are

Quick definition of evals as a mechanism to teach AI to improve its own output

02:00 – 04:00

03 · 3 Steps to Automate Anything

Reflect on weekly tasks, document every step, use Codex to turn it into a system

04:00 – 05:05

04 · What a Workflow Really Is

Most non-systems-thinkers don't see their own workflows; podcast workflow documented step by step

05:05 – 06:20

05 · Inside a Personal OS

Peter's README — skills as plain text files; chaining podcast prep to thumbnail to post-production

06:20 – 07:00

06 · Why He Switched to Codex

Browser use and computer use more robust; software harness beats model quality in daily practice

07:00 – 08:10

07 · Browser Use and The Harness

The software around the model matters as much as the model; Codex fast mode enables higher throughput

08:10 – 09:30

08 · How to Build a Skill

Describe idea to AI, let it ask questions, review output, run it manually first

09:30 – 10:20

09 · Killing AI Slop With a Skill Editor

Meta-skill compresses skills to one page, strips repetitive instructions and AI jargon

10:20 – 12:00

10 · Real Impact: Thought Partner

No longer working alone; Codex handles manual assembly; frees mind for higher-order decisions

12:00 – 13:20

11 · AI Psychosis

Risk of becoming too dependent to think independently; personal adviser skill keeps principles front-of-mind

13:20 – 14:15

12 · AI Brain Fatigue

HBR March 2025: running multiple agents creates cognitive overload; people get migraines, lose focus

14:15 – 16:03

13 · Writing From Scratch Still Matters

Kieran admits he could not start a Grammarly draft; the risk of becoming an editor not a creator

16:03 – 17:00

14 · Live Demo: Draft + Eval

Codex writes a draft post, second agent runs pass/fail eval with checkboxes, rewrites until all pass

17:00 – 17:45

15 · Building a Good Eval

Derive eval from your best examples; refine it through manual iteration runs, not from scratch

17:45 – 18:20

16 · Pass/Fail Beats Scoring

AI cannot reliably distinguish 3/5 from 4/5; binary checks are robust and actionable

18:20 – 20:10

17 · AI Prefers Its Own Writing

Subjective evals always favor AI output; keep criteria formulaic not taste-based

20:10 – 21:15

18 · AI Shaming Is Dumb

Judge outcomes not process; the idea must still originate from a human to avoid drifting to the median

21:15 – 23:50

19 · How Non-Technical People Start

One-week ritual: narrate daily work to Codex, ask it to identify the top workflow to automate first

23:50 – 25:25

20 · Letting Codex Suggest What to Automate

Live: Codex reviews its own memory of the session and recommends workflow improvements unprompted

Atomic Insights

Lines worth screenshotting.

Skills are just text files — there is no coding, only describing your workflow to the AI and asking it to draft the instructions.
The software harness (Codex, Claude Code) matters as much as the underlying model; switching from chat to agent-mode is a bigger leverage point than switching models.
Pass/fail evals are reliable; AI scoring (3/5, 4/5) is not — the model cannot distinguish adjacent scores even with detailed rubrics.
A skill-editor meta-skill that compresses every new skill to one page eliminates the AI slop sprawl that causes personal OS systems to collapse over time.
When evals grade taste and voice rather than formula compliance, AI will always prefer its own writing — keep eval criteria formulaic, not subjective.
The eval layer works because it forces you to clearly commit to what good looks like; most AI output is mediocre because most people never define their standard.
AI brain fatigue is documented (HBR, 2025): running multiple agents simultaneously creates cognitive overload that leaves people unsure what they are working on.
Once you build a capable content system you risk becoming an editor who can no longer generate a first draft from nothing.
High-quality context — an audience profile built from what has worked — is worth more than elaborate prompt engineering.
A non-technical person can discover their own automation candidates in one week by logging daily tasks in Codex and asking it to identify what to automate first.
Computer use in Codex covers browser-based tasks with no API — if it happens in a browser, Codex can do it.
The idea must still come from you; AI-generated content without a human-originated premise drifts to the median and reads as slop.

Takeaway

Skills plus evals is the system that makes AI output reliable.

WHAT TO LEARN

The gap between AI that impresses in a demo and AI that reliably does your work is not the model — it is the infrastructure you build around it.

A skill is a plain text file with instructions — building one means describing your workflow to the AI, not writing code; anyone can do this regardless of technical background.
Chat-mode AI starts from zero every session; agent-mode tools persist skills and context so the system compounds each iteration rather than resetting.
Pass/fail evals are reliable; AI scoring (3 out of 5, 4 out of 5) is not — the model cannot distinguish adjacent scores even with detailed rubrics, so binary checks are the practical standard.
A skill-editor meta-skill that compresses every new skill to one page prevents the sprawl that causes personal automation systems to degrade and get abandoned over time.
Evals derived from your best existing work beat evals written from scratch — feed the AI examples of your strongest output and refine the checklist iteratively through real usage.
When evals judge subjective quality like voice or authenticity, AI will always prefer its own version of the draft; keep eval criteria formulaic and binary to make them enforceable.
AI brain fatigue is a documented side effect (HBR, 2025): running multiple agents simultaneously creates cognitive overload that leaves you unsure what you are personally responsible for.
Once you outsource your first-draft generation to AI, you risk losing the ability to start from nothing — protecting your raw generative capacity is worth the deliberate practice.
High-quality context — an audience profile built from what has and has not worked — matters more than prompt sophistication; garbage context guarantees mediocre output regardless of model quality.
A non-technical person can discover their top automation candidates in one week by narrating their daily work tasks to Codex and asking it to identify the highest-leverage workflow to automate first.

Glossary

Terms worth knowing.

Skill (in Codex / Claude Code): A plain text file containing step-by-step instructions that an AI agent can follow to complete a specific recurring task, functioning like a reusable function in a personal automation system.
Eval: A pass/fail checklist run by a second AI agent on a draft produced by the first; if any check fails, the second agent revises the draft and re-runs the checklist until all checks pass.
Personal OS: A creator's collection of named skills and workflows stored in Codex or Claude Code, organized like a personal operating system to handle recurring tasks across their content business.
Skill editor: A meta-skill that automatically compresses any newly generated skill to approximately one page and removes redundant or slop-style instructions, preventing the sprawl that degrades personal automation systems.
AI brain fatigue: A cognitive side effect documented by Harvard Business Review in 2025 where workers running multiple AI agents simultaneously experience overload, migraines, and confusion about what they are personally responsible for.
Computer use: A Codex capability that lets the agent control a browser or desktop application directly, enabling automation of tasks that have no API — the agent can browse pages, click elements, and extract information on its own.

Resources

Things they pointed at.

00:30toolCodex ↗

00:30toolClaude Code ↗

12:05linkHBR AI brain fatigue article

03:18toolRiverside ↗

Quotables

Lines you could clip.

01:15

“Stop using ChatGPT and Claude and switch your main workflows to Codex and Claude Code.”

Contrarian hot take, standalone, no context needed→ TikTok hook↗ Tweet quote

07:11

“The software around them matters as much or more.”

Counters the model-obsession narrative; tight and quotable→ IG reel cold open↗ Tweet quote

11:16

“If you don't have the human taste review in any of the work you're doing with AI, the work's not gonna be great.”

The counterweight to the automation thesis; pairs with the hot take→ newsletter pull-quote↗ Tweet quote

12:53

“You turned yourself into an editor.”

Most uncomfortable line in the episode; visceral, no setup needed→ TikTok hook↗ Tweet quote

16:25

“AI is very bad at giving scores. Keep your eval simple. Just do simple pass/fail checks.”

Concrete actionable rule, standalone→ IG reel cold open↗ Tweet quote

15:23

“Your skills are only as good as the context they have.”

Portable one-liner, works as a standalone principle→ newsletter pull-quote↗ Tweet quote

Topic Map

Where the conversation goes.

00:00 – 04:00denseWhy chat-mode AI fails for recurring work

04:00 – 09:30denseBuilding a personal OS with skills

09:30 – 20:20denseThe eval system — grading and rewriting automatically

12:00 – 14:15steadyAI brain fatigue and protecting human judgment

20:10 – 21:15steadyAI shaming and the role of the idea

21:15 – 25:25steadyGetting started if non-technical

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

On this episode, we are gonna show you an incredible use case with Codex. Codex is the tool of choice for any real AI practitioners at the moment, and Peter Yang is one of the best of it.

He's gonna come on. He's gonna help you build your very first work flow. He's gonna give you an eval skill that Codex will build that will make your workflow better every single time you use it.

And why does this even matter? If you pick the right workflow, you are gonna get a ton of time back to do the work you actually enjoy doing. You can get that all here first and exclusive on Marketing Against the Grain.

Peter, thanks very much for coming onto the show. You are going to go through one of the topics I'm fascinated by, which is not only how do you create skills to help you build reusable automations within Codex or Claude, but how do you do evals?

How do you know these things are good, gonna drive outcomes?

Yeah. I'm really excited to be here. Yes.

Yeah. For people who might not be familiar, Evalse is, like, very much like a software AI product term, but it's something that is really just a mechanism to help teach AI how to make its output better.

And we're gonna talk about that. We're gonna talk about skills today. And, Peter, before we get started, you you had a pretty hot take

Yeah. That, uh, that maybe we started out with. My number one advice for all of you is to basically stop using ChatGPT and Clod and switch your main workflows to Codex and Clod code.

Chat to PDF Cloud is like, you just chat with it, it replies, and you copy and paste some information back and forth. But Codecs and Cloud Code can actually get stuff done for you. So that has a difference.

Right? And I think anyone can, like, save so much time in their week in three steps. And the three steps are, like, number one, just reflect on your past week.

What are the tasks that take up the most time or are the most annoying to do? Like, the most manual work they have to do. Right?

Step two is just, like, literally lifts out every single step of that manual workflow, like, the stuff they have to do, and then use codecs or cloud code to turn it into a system.

So, basically, just, like, copy and paste all your manual steps into one of these apps and be like, hey. Let us work on streamlining this. And I promise you we'll kinda be blown away by what it can actually do through skills or through APIs and through other integrations.

Peter, one thing I think nonsystems builders struggle with. So if you're not really a systems thinker and you're just doing your work, you might not even think of your work in terms of workflows.

You might not even really know what a workflow is. Right? Like, I go here.

I do some research. I create a blog post. I publish it.

I go get some data to see if it's performing pretty well, and then I iterate and improve on it. But I suspect a lot of content creators would not even think that's workflow. So can you maybe help explain how you think of a workflow, and then what is a workflow?

How do I actually document those things or figure out what workflows I'm even doing day to day? I think a workflow is basically just like a task

that you have to do, like, let's say more than once a week. And, like, Documentum is literally just like listing all the steps that you take. So I have this, like, pretty fancy slide for, you know, how I used to run my podcast manually.

But you don't have make a fancy slide. You just literally just have to list all all the things that you do. Right?

So for example, for running a podcast, before you even record anything, you have to prep the interview guide. You gotta research the guest, you know, manually go through a bunch of YouTube interviews and online research. You gotta figure out what the structure of the interview is.

You gotta write the guide, and then you send it to the guest, and then you record it in Riverside. Right?

And then send the recording to the video editor. Then you gotta pull the transcript because with the podcast, there's so many different other assets they have to generate.

So I have a newsletter post tied to it. I have social media posts. I have the thumbnail copy and title takes forever to figure out for for YouTube.

Right? And then there's other stuff like show notes and other stuff. You gotta, like, make all all this other stuff and then go back and forth and then finally schedule all this stuff or post other stuff to the different platforms.

And, like, I repeat this workflow, like, you know, once or twice every every week because I usually want to guess every week. And it it's just, like, incredibly annual and time consuming.

So

yesterday, I spent the whole day working with Codex to figure out how to streamline and automate all this stuff through scales. I think people who are just used to using Clot or ChatGPT are used to really quick tasks, you know, like a quick back and forth, a quick copy and paste. I think something really important what you said is, like, you spent a lot of your day yesterday

in Codex building this. It's like when you're building systems, it does take a while and take a lot of iteration. Right?

Yeah. Exactly. I used to have all this stuff in projects.

So I have a show notes project that has, like, a bunch of prompts and examples, and I usually just copy and paste stuff back and forth into different platforms. Right?

And this wasn't bad. It still saved me a lot of time, but I think Codex and Cloud Code just took it to the next level. So this is Codex.

Don't get intimidated by this stuff, guys. Like, it's basically just the chat with AI in different platform. Right?

So these are all just, like, different chats with AI. People talk about, oh, I I built a personal OS and, like, I built this and that. And what the hell is a personal OS?

If you wanna build a personal OS, then just think about what are your main workflows during the week. Like, my main workflows as a creator are, like, know, writing my newsletter posts, writing my community, sponsor stuff, and podcast.

And and then just kind of map out the steps. Right? So this is the README file for my various work workflows in my personal OS, and let's look at the podcast one.

So the podcast one, I have these skills that I built, and and a skill is pretty much just a text file. It's like text file with a bunch of instructions for AI. But I have a skill that prepares the podcast for for me.

And by the way, all these skills, it's not just like one one shot and you're done. You still have to go back and forth with AI and apply your human taste.

But, like, it just makes it so much easier to kinda back and forth if you build these skills. Right? So I have a podcast prep skill.

I have a thumbnail scale to kinda just gut check because when you make a podcast, you wanna make sure the thumbnail title and copy is actually interesting. Otherwise, you probably should not make the podcast at all. No.

And then after I record it, I have all these, like, postbugging skill that makes all the individual assets so I can go back and forth with it.

So by chaining all these skills together, I'm able to kind of, like, really save a lot of time in my podcast flow. Hey, guys. We are covering a ton of ground with Peter today, and I know you're probably feeling a little overwhelmed and you wanna learn even more.

Well, good thing. We've got a ton of resources for you. You can scan that QR code or you can click the link in the description below, and we'll get all of those for free.

Are you a recent convert to Codex?

Because you showed us cloud projects. So maybe talk about why Yeah. You migrated to Codex for these systems.

I I guess as a creator, I have, like, unlimited usage of both apps. That's why, You know, I wanna first say that I love both products, but I will say that Codex has really won me over recently, and there's couple things that Codex does really well.

The thing that it does the best, I think, is, like, browser use and computer use. If there's no, like, API to pull information, they can just use the browser and go, like, browse bunch of pay pages and get the information for you. I find that it's, like, a lot more robust than what Cloudco has, and I always turn on fast mode for Codex.

I'm not gonna hit the limits anyway.

So I turn on fast mode, and I'm able to get a lot more work done. You know, so much of the Internet, especially in the early adopter, and it talks about, like, model and model quality, and that matters. But

the software around them matters as much or more, You know? And, like, a lot of why we wanna use codex right now is because of the actual software harness that they've built more so than the core underlying model. Could you then maybe run through how you use this system?

So you've kind of documented your workflow. You've gone into Codex. When you've built a skill, do you have to worry about where the skill is saved?

Like, maybe talk a little bit about just build the kind of how easy it is to build skills in Codex and then how you can kinda run them. So let let's open one of these skills. So I have a podcast prep scale to research the guest.

Right? It's just a text file. Yeah.

And it looks pretty complicated, but I did not type any other stuff. Like, I basically got codecs to do it, and then I reviewed it. The process of building a skill is, like, you have an idea, but, hey.

You know, I I wanna do research for my guests, and I wanna build a skill around it. Let's work on it together. Ask me any questions that that you have.

Right? And then you'll ask a bunch of questions. And then the important thing is after you build it, you wanna run through the skill manually.

You wanna test it with something. Quick question on skills because I build a lot of skills.

I would be really honest that I have stopped kind of reading them, and I think what you're gonna show is I use them, and then I give feedback to Claude to redo the skill. And I'll read them if it doesn't get to the standard that I want, but I try to do it through iteration of usage.

So I think there is a risk here that you just eventually leads to AI slop sprawl. Yeah. Right?

They just have somebody who sprawl everywhere. So I actually built a skill called skill editor.

Each each time it builds a skill, it runs skill editor, and it tries to make the skill as concise as possible, like, ideally to one page, and it cuts out all the AI slop terms. Looks It for, like, repetitive instructions just to keep it a little bit more tight. Yeah.

Because it over complicates everything. Right? And so I have I have a very similar skill that are running in skills and tries to make them as short

Yeah. The least amount of text for the same amount of power.

Exactly. Yeah. Yeah.

So what's this doing right now? So it's running podcast prep, and then it's running another skill called thumbnail, title, and copy to figure out the angle. And it's researching a bunch of your past appearances in YouTube to get the transcripts and and, like, kind of try to figure out, you know, what kind of interesting angles we can talk about.

Right? The thing about things you remember here with the Codex and ClockCode is, like, if you just ask it, like, hey. Can you actually just, like, pull the transcripts from the recent guest from YouTube and, like, just save it?

Just ask the question. Ask the question, and they'll probably figure out how to do it for you. Like, any kind of, like, online work or activity, you'll probably figure out how to do it for you.

So you just have to ask. Especially with computer use in Codex

and giving codex the ability to use your computer, you can get so so much done. Exactly. If I'm a marketeer or someone watching this and I want to build this workflow,

could you maybe talk about how has that changed your ability to do podcasting? Like, are you able to figure out what has been the core impact or upside to you?

I have just a couple of things. Number one, I feel like I'm not doing this work alone anymore. I I think as a creator, it can be kind of a lonely experience.

Right? It's it's just my my brain trying to figure things out. But now I feel like I have a really good thought partner to figure out the thumbnail titles and, like, figure out the angle with me.

So that's, like, super helpful. Like, lot of the work is, like, manual copy and pasting back and forth through Google Docs and formatting stuff, and I do all that stuff. And and, like, you can pretty much just teach Codex to do all that manual work.

So, hopefully, it frees up my mind to, you know, either play more video video games or or to think more about about to grow the overall thing. You know? So so, yeah, it unlocks kind of like a higher order level of thinking.

I do think there is kind of like AI psychosis a little bit, and I almost feel like if I don't have Codex or Cloud Code to think with me, like, I I'm almost a little bit too lay lazy to to think anymore. You know? Which is probably not a good thing.

So I guess that's what I would jump in here is, like, I think Kieran and I violently agree with something you said earlier where if you don't have, like, the human taste review in any of the work you're doing with AI, the work's not gonna be great. But then, I guess, my follow-up would be if, like, if this AI psychosis is real and you're constantly using AI, do you think that, like, dilutes your taste?

I think it can help you refine your taste because, like, you know, for example, I have a personal adviser skill where like, there's some principles that really wanna remember. You know, like, example, one of the principles is keep the main thing the main thing because it's like as a creator, you can easily, like like, I can go make a course.

I can go write a book. I can go do I can go to, like, a bunch of AI conferences. But, like, the main thing is, like, making a podcast and then use that are good.

So so it kinda reminds me of that. Like, sometimes I forget. But, yeah, you you do kinda get a little bit lazy.

You you kinda, like, ask the AI bunch of questions and rely on it to give you a bunch of answers that you kind of make a judgment on as opposed to kind of thinking through the answers yourself. I think that is something to be mindful of because I have experienced the same thing. There's two things you kind of start to experience.

Harvard Business Review came out with that article, I think, in in March of this year Mhmm. Where they talked about something called, I think it was AI brain fatigue. But one of the

things in there, they said, because you kind of feel like you should be running a multitude of tasks at any one time. Right? So like in a pre AI world, you might be multitasking a little bit, but you're kind of like trying to do one task at a time because you can only do one task at a time.

Whereas with AI, you feel like you're underproductive if you're not running a mulch to the different things at once because the agents are running things. And they had this pretty cool article that said people were getting migraines, brain fatigue, and really wondering like, what am I even working on because they have so many things going at once.

One of the best things I built out in the past six months in Cloud Code was like an entire content system, and it works really well. And I got really addicted to using it, and I realized I went to, like, Grammarly just to write something by myself, and I couldn't figure out how to start the thing.

Like, I just couldn't start from scratch. You turned yourself into an editor. Yeah.

And it's not too dissimilar to critical thinking, because you've made that point. I see that as well where, like, you're alright. I'm gonna try and solve this problem, and your brain just doesn't naturally kick in the way it used to because I'm like, okay.

Well, the first thing I would do is I tell Claude, here's the problem. Frame it up in these three ways and do some first principle thinking. And so you definitely have to be very conscientious of offloading your skills to an AI machine.

The AIs get so good that it could just be that the calculator is like, I don't wanna use calculator because I wanna be doing math. Well, you're gonna be behind everyone else, but I do get nervous about that. I feel like we've had three years to learn stuff without AI.

So I just owe people, like like me, still have some critical thinking, but, like, I I worry about the next generation because they have AI access from day one. To be a creator, you need to be the genesis of information, I really do believe. And so I and I think that's why you kinda went back to be like, hey, Kieran.

You're like, I'm gonna write a first draft. And it might be a messy first draft, but it's my first draft. And then I can work with AI from that.

Right? It feels freeing as well. Like, I've I've just forgotten how much I just love to write and not ask AI, do you think this is good?

How would you make that better? I'm not saying AI isn't great as a writing assistant. I still would recommend that, but I don't know.

There's something, like, pretty freeing about just writing words and not having to ask an AI assistant how it thinks about these things and would it improve them.

Like, the reason I said, well, generate the stuff is I gave it just a bunch of really good examples of really good thumbnails and titles. Right? But sometimes they can go overboard.

Once again, it's not like one shot. I have to give a fee feedback.

Be like, hey. You know, don't go too overboard, and then go back and forth and apply my taste to it. The thing I find that has worked really well for these kind of things is it first reads some sort of context file or, like, a couple of files in a context layer or, like, a foundational layer.

So, like, an example for this would be you have your audience profile, and that audience profile that's being built from what has worked and what hasn't worked. And so it first checks that, and then that's how it's determining what are episode ideas and then how does it rank those ideas.

So does it do anything like that?

Yeah. I I have a couple of lines about, hey. Here's my background, and my audience really likes practical, no bullshit tutorials and just, like, stuff they can implement right away.

So, like, try to make it more practical focus. Your skills are as good as the context they have. And so the more information you can give them in terms of what outcome you're looking for, the more valuable they're gonna be.

It's all about managing the context window at the end of the day. Yeah. Yeah.

High quality context is what matters. Okay. So let's switch gears.

Why don't you write a post based on your research of Kiran about how to use AI for marketing? Just write a draft post and put it in chat. Alright.

Cool. So I was gonna write a draft, and then I think what we'll do is we we can run an eval

on the draft. Okay. It's very cool.

Yeah. So this is a draft. And I wanna say, can you run the eval on this draft and also put the whole eval table with checkboxes in chat?

And eval is like a evaluation, which is basically getting the AI to check its own work. You don't want the original agent that wrote the draft to run the eval because there's some bias there.

If you ask me to, you know, review my own work, I'll I'll say, look. It looks great. Right?

So there's different types of evals. The most straightforward eval is just, like, pass or fail. There's other evals where, like, hey.

You wanna give it, like, a score, like, four out of five, three out of five. But, like, I'll I'll tell you, AI is very bad at giving scores. If you're like, hey.

Give me an eval of how authentic this post is. Three out of five, four out of five. It has no clue the difference between a three out of five and four out five, even if you give a bunch of instructions.

Yeah. Yeah. So I just recommend keeping your eval simple.

Just do simple pass fail checks. The great thing about that is once you have these checks in place, if it fails any of these checks, then you can just ask it to keep editing a draft until it passes all of the checks.

Right? So then it kinda kinda, like, self iterate and improve itself. I guess your like, eval will be based upon what is the perfect podcast episode for your channel.

How did you create the eval, and how did you know what were the right things to look for? You can get it to create an initial eval just based on your examples. Like, if it's like a content editing skill, you wanna give it a bunch of examples of your best content, your best output.

Right? And then you can be like, hey. Just like can you just create an eval pass fail based on the examples?

But I think the real way to create evals and make them good is to actually just run the scale through a bunch of manual processes. Right?

So let's say I have an edit newsletter scale, and then I use it to edit one of my newsletter posts. And then through the conversation, it's like using too many em dashes or, like, you know, it's, like, writing too long.

So then I'm like, okay. So based on this conversation, can you update the eval to include some things that we learned in this conversation?

And then it'll be like, okay. I'm gonna check maximum two m dashes. That's like a pass fail.

And then you can either say, yes. That sounds good, or you can say, no. That does not sound good.

So for example, here's the eval that I have for my newsletter post. All my newsletter posts have a specific format. Does it open with dear subscribers?

Is the hook line short enough to kind of, like, catch people's attention? Is the post free of em dashes? Is is it, no slop?

There's lot of bunch of other stuff. So it drafts the post, and then it checks each of these things. Yeah.

Yes. No. And if anything fails, it it, like, gets the other agent to

edit the post again. I think this is a really important point and why so much that of work that people do with AI is not good.

It's not just because a lot of people don't use evals. It's because if you look at what an eval is, it's actually somebody being very clear and committing to what they want and what is good.

And I think so many people have a problem with that. Right? Like Yeah.

It's like, oh, I don't know what I'm actually creating. So I'm just kinda all over the place where you're like, you were like, I know the formula.

And because I know the formula, I can have a real

complete system to evaluate whether I'm meeting that bar every single time or not. You can pretty much build evals for anything. Right?

If you're trying to pull some stats from or do some online research, you could be like, hey. Did you research at least 10 source sources? Yes.

No. Or did you research the recent interviews from YouTube in the last thirty days? Yeah.

Snow. You can pretty much build evals for anything. Another lesson I think is the skill and the eval that's tied to a skill is like a live working document.

Right? You know, I've been a PM, for example, for, like, over a decade, and, like, a lot of engineers ask me, hey.

When the is PRD gonna be done? When is the spec gonna be done? And the spec is, like, never done because you're always learning something new.

So just same thing with the scale and the eval. As you run through, like, manual passes, as you use it, if you learn something new, just ask the AI to improve the scale in the eval. Right?

And double check its work, and then slowly but surely, your skill get more and more effective

over time. I have done evals for that content system, and one of the kinda gotchas is if the eval is trying to do any kind of judgment or of taste, then it primarily likes itself.

Like, one of the things I It does. Claude is I kinda tried to do this eval, and I was like, here's what, like, a great writing sounds like. And it would kinda map my audience profile.

And, like, interestingly, whenever I would update with, like, free form content that I would update the post with versus Claude updated it itself, it always preferred its own writing, which I kinda thought was funny. But you can kinda get in when it when it's, like, when it's doing eval of, like, judgment and taste, it prefers itself.

And you get into this kind of cycle of, like, does it really understand judgment and taste? I think the human is better. But I think what's fascinating is your eval is more around the kind of formulaic way that you wanna create that content.

Like, things that you can actually it's not so much judgment and taste. It's like, there's a formula, and there's, like, core criteria in that formula. And you're, like, evaluating to see if those things exist or not.

And it's actually a pretty interesting thing for creators because if you're a business and your team are creating skills, I think it would be a pretty good service to actually have creators create your evals because creators would know what the criteria is for, like, a winning YouTube video for you.

They could, like, figure out what the winning formula is for YouTube video, or they could figure out what the winning formula is for, like, a LinkedIn post. And they could actually write your eval for you, and it would actually help up level your team pretty quickly, actually. There's a lot of ghostwriters out there.

Maybe there's, like, a eval writer instead. I also think AI shaman is so dumb because AI is gonna be intrinsic to how anything works, and so it's a tool.

And it doesn't matter if you use AI if the outcome is still, like, good. Right? You judge something on the outcome, not how it was created.

We are in the messy middle of, like, AI hate a little bit. There's just, like, shaman of anyone who uses AI in a creative way, which I think is pretty dumb. I feel I feel like, Kieran, that there is a lot of shaming because around this notion that, like, oh, you just used AI and didn't do any work yourself.

You know? And I think that's where a lot of the sentiment comes from. And I think one of the themes that we've talked about the last couple minutes is that if you're creating something, if you're working on something with AI, you kinda have a partner.

And there are some things that AI is really good at, like evals and systems and things that humans are really good at, like taste and judgment. And if you don't have the human side of it, then it is going to feel very average and, you know, shift to the median and not going to actually stand out and feel like AI slop.

And so it's like, yeah, if you just outsource everything to AI, it's not gonna be good. But if you work with AI, there should be no backlash against that. Right?

Well, the idea still matters. Like Yes. That's the big thing I think is there was a piece of content that went really viral on x where someone wrote a post that everyone quoted that is like, wow.

Things are moving much farther than I had expected,

and here's all of the repercussions because of that. And it was quoted everywhere. And Opus wrote that post because I do so much creating with AI.

I could, like, decipher the patterns. It didn't stop it being a great post. Right?

No. And so I still think the idea is

intrinsic to, like, having a good outcome. That's kinda like a balance, like, with anything else. For example, if creators like, they follow a bunch of top accounts and just set up, like, automatic AI replies to all accounts, which which is, like, all of our x these days.

Right? Yeah. Yeah.

And, like, I I I I find that, like, super annoying. Like, I have to block all these people. But, like, is it probably getting the results?

It probably is. Yeah. Yeah.

Just coming back to your workflow, maybe as we run out this conversation,

one of the things I probably would do now after watching you work through that, which is like, hey. I'll get the workflow, build skills, do evals, and I still kinda come back to, okay. Well, I would love to get started.

I'm gonna, like, get my codec set up, and I'm gonna start building some of these skills and automating some of my workflows. Okay. What are some of the things I do each and every day?

I think if you just build a little ritual for yourself across one week where you went in to Codex and had a conversation at the end of the day and told Codex about your day. I did this. I did that.

These are the things I did. And then at the end of that week, it could take all of those transcripts and say, okay.

Based upon what I've heard, here are, like, the top things that I'm gonna start to build skills for. And that would be an easy way for someone who is nontechnical, but really wants to start doing this stuff, I think, to get started on, like, what workflows to prioritize.

Yeah. I totally agree. In fact, why don't we quickly see what Codex says about this?

So it has memory of everything that we do. So I this is just asset based on our conversations, what else can we streamline? So it says, like, there's too many evals.

You should just have one eval for everything. Yep. Turn podcast prep into social newsletter ideas.

So okay. So you do research about Kieran, and then maybe you can turn that into a newsletter angle too. There's a skill for live demo safe mode, not sharing any financial information.

There's a skill to clean up stuff up. Yeah. So, yeah, it'll give you a bunch of ideas.

Yeah. I I think it's pretty incredible when you really start to integrate it into all of your personal kinda data as well. Yeah.

I've integrated into my scale, and I built, like, a fitness app to track workouts. So I asked to build a MCP for that too. So now you can tell me, like, hey.

What's your body fat? And, like, you made some progress on these lifts and so on and so forth. And, uh, yeah.

It feels like a very helpful partner. Yep. Yeah.

Peter, you're living in the future, my friend. And thank you for coming and sharing with our audience a little bit of what the future looks like. Because I think there are a lot of people who could honestly just even do

one basic workflow that they hate every week with, like, a few skills in an eval that would be a life changing experience for them. Don't don't you think so, Kieran? Even if you just do one thing.

Yeah. I think if you watch this, you have a conversation with Codex, you pick one workflow,

you build a skill, you build the eval. That's an incredible way to get started on your journey of using AI to integrate into your work. And you pick the right thing, you're gonna save yourself a lot of time.

Yeah. Just remember if this stuff is intimidating, just like when you say you build it, it's really you asking the AI to build it. So I just have to ask the AI.

You are just a data giver.

Data giver. Pay pay case giver. You know?

Yeah. AI is doing all the work. I I know if that's good thing or a bad thing, but it does it.

So Yeah. It's both. I know that.

Yeah. But with all of that, Peter, thank you so much for joining us today, and we'll see everybody really soon on the next episode of Marketing Against the Crane.

This data is wrong every freaking time.

Have you heard of HubSpot? HubSpot

is a CRM platform where everything is fully integrated. Well, I can see the client's whole history, calls, support tickets, emails, and here's a task from three days ago I totally missed.

HubSpot. Grow better.

The Hook

The bait, then the rug-pull.

Peter Yang opens with a blunt instruction: stop using ChatGPT and Claude as chat boxes. If you are copying and pasting from a chat window into your real work, you are one architectural shift away from getting your time back.

CTA Breakdown

How they asked for the click.

MENTIONED ON CAMERA

00:30toolCodex ↗

00:30toolClaude Code ↗

03:18toolRiverside ↗

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.

*Workflow for building skills with Claude Code & Codex:* ↗

OTHER LINKSAlso linked in the description.

Frame Gallery

Visual moments.

Frame at 00:00 from Automate Boring Tasks With Codex & Claude Code in 26 Minutes

Frame at 00:00 from Automate Boring Tasks With Codex & Claude Code in 26 Minutes

Frame at 00:00 from Automate Boring Tasks With Codex & Claude Code in 26 Minutes

Frame at 00:01 from Automate Boring Tasks With Codex & Claude Code in 26 Minutes

Frame at 00:01 from Automate Boring Tasks With Codex & Claude Code in 26 Minutes

Frame at 00:01 from Automate Boring Tasks With Codex & Claude Code in 26 Minutes

Watch next

More from this channel + related breakdowns.

14:08

Paul J Lipsky · Tutorial

How ChatGPT Sites Turns One Prompt Into a Hosted Website or Private Dashboard

A tour of ChatGPT's built-in site builder — from a class homework page to a private morning-briefing dashboard — built from nothing but plain-English prompts.

July 21st

09:35

Brock Mesarich | AI for Non Techies · Screen Record Demo + Reaction/review

Anthropic Just Changed How We Build Skills Forever

Claude Cowork's new Record a Skill feature turns a screen recording and a voice-over into a reusable skill file — here's whether it actually holds up.

July 21st

19:07

Paul J Lipsky · Tutorial

Claude Just Changed Completely: Here's How It Works (In 2026)

A full settings-to-scheduled-tasks walkthrough of Claude's post-update interface — Chat, Cowork, Projects, Connectors, Skills, and the model tiers that decide what any of it costs.

July 15th

28:52

Peter Yang · Tutorial

How I Use ChatGPT Work and GPT-5.6 to Do Everything

A beginner-friendly walkthrough of running email, calendar, meeting prep, and published websites entirely through ChatGPT Work and Codex.

July 15th

19:21

Nick Saraev · Tutorial

Steal My Actual AI Agent Workflow

A three-part system — a shared AI-and-human task board, a low-friction capture habit, and self-checking evals — that lets one founder run a multi-million-dollar operation while barely touching the work himself.

July 14th

11:38

Rob The AI Guy · Demo

Claude CoWork's New Upgrades: Mobile Access, Model Picks, and a Redesigned Design Tool

A screen-recorded walkthrough of Claude CoWork's move to mobile, its four-model lineup, and a rebuilt Claude Design tool — demoed end to end, including a live investment pitch deck build.

July 8th