Modern Creator
Duncan Rogoff | Learn Claude Code · YouTube

Claude Code + YouTube = $11,769/Month

A 12-minute tutorial that reverse-engineers a faceless YouTube channel earning $12K/month and rebuilds its entire production pipeline inside Claude Code.

Posted
yesterday
Duration
Format
Tutorial
educational
Views
3.9K
231 likes
Big Idea

The argument in one line.

Faceless YouTube channels earning $10K+/month win on content formula, not production quality, and that formula can be extracted, replicated, and automated into a one-prompt Claude Code skill.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You want to start a YouTube channel without appearing on camera.
  • You are already using Claude Code and want to extend it into content production.
  • You are curious how to reverse-engineer what makes a viral channel tick using AI tools.
  • You want a reusable system for producing simple animated explainer videos at low cost.
SKIP IF…
  • You are looking for a live-action or talking-head YouTube strategy — this is specifically for faceless doodle-style content.
  • You have no interest in recording your own voice; the workflow requires human audio to avoid YouTube TTS flags.
TL;DR

The full version, fast.

The Zen YouTube channel built 156K subscribers and $11.7K/month in two months posting simple hand-drawn doodle videos — no camera, no editing, just narrated illustrations. The tutorial shows a six-stage Claude Code pipeline that reverse-engineers that formula: Claude extracts the script structure from Zen's top video via yt-dlp, builds a topic-driven skill that writes new scripts in that style, Whisper transcribes your recorded voice into timestamped phrases, Higgs Field's MCP generates one doodle image per phrase, and FFmpeg stitches images and audio into the final video. The result is a reusable slash-command you can run on any topic.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:10

01 · Hook + proof

Zen channel stats shown on screen: 156K subs, $11.7K/month, 19 videos in 2 months. Host intro and skill CTA.

01:1001:50

02 · Validation + monetization math

Nick Invest channel (185K subs) as second example. YouTube Partner Program thresholds: 1K subs + 4K watch hours.

01:5002:53

03 · Style reverse-engineering

Five Zen video screenshots fed to Claude Code with a style-extraction prompt. Positive + negative prompts produced.

02:5304:27

04 · Style test + Higgs Field intro

Single test image generated for 'elephant at watering hole' — passes. Higgs Field introduced as the image generation platform.

04:2707:54

05 · Script extraction + skill Stage 1

yt-dlp on Zen's 7M-view video extracts full transcript. Claude builds hook/intro/format breakdown. Stage 1 skill created with iterative clarifying questions. Test run on 'why do we dream' produces a full script.

07:5408:56

06 · Voice recording

Host records two paragraphs of the generated script in Mac Voice Memos. File dropped into the project folder.

08:5609:49

07 · Whisper transcription (Stage 2)

Skill updated to detect new audio in folder and run Whisper locally. Per-phrase timestamps output shown.

09:4911:40

08 · Image generation via Higgs Field MCP (Stage 3)

Higgs Field MCP connected in two clicks. Claude Code generates 16 images at 2 credits each (32 credits total) in the project's doodle style. Results shown — high visual consistency.

11:4012:41

09 · FFmpeg stitch + payoff (Stage 4)

Final skill stage stitches images in timestamp order with voiceover using FFmpeg. Finished video plays back. Skill is now reusable with any topic via /faceless-video.

Atomic Insights

Lines worth screenshotting.

  • A faceless YouTube channel with 19 videos and 156K subscribers earns $11.7K/month — the formula, not the face, drives the income.
  • You need only 1,000 subscribers and 4,000 watch hours to monetize on YouTube; a single video getting millions of views can clear both thresholds.
  • Extracting a channel's script format via yt-dlp and Claude takes a few minutes and gives you the hook structure, transition sentences, and body format of any viral video.
  • Negative image prompts ('not photorealistic, not a 3D render, not a photograph') are as important as positive ones for maintaining style consistency across 16+ generated images.
  • Separating style from subject in an image prompt ('focus on style characteristics only, not subject matter') makes the style reusable for any topic without redescribing it each time.
  • Building a Claude Code skill in four incremental stages — with Claude asking clarifying questions at each step — produces a more reliable pipeline than one monolithic prompt.
  • Whisper transcribes audio with per-phrase timestamps locally for free, giving you the image-switch timing without any API cost.
  • AI-generated voices risk YouTube account bans; human voiceover is the safer path even in a fully automated pipeline.
  • Higgs Field's MCP connector lets Claude Code call image generation directly — no copy-paste, no context switching, just a Claude prompt that triggers renders.
  • The same-folder pattern — all Claude Code sessions write to one shared project folder — makes outputs discoverable across separate chat sessions without manual handoff.
Takeaway

Six steps from reference channel to finished faceless video.

WHAT TO LEARN

Any YouTube channel's content formula can be extracted and replicated in a single session — the script structure, the image style, and the edit timing are all derivable from existing popular videos.

  • Style prompts need negative constraints as much as positive ones: blocking 'photorealistic, 3D render, photograph' is what keeps 16 doodle images visually consistent across different subjects.
  • Separating style description from subject matter — 'how it looks, not what it shows' — makes a single style prompt reusable for any topic without rewriting it each time.
  • Building a multi-step automation as a Claude Code skill with iterative clarifying questions at each stage produces a more reliable pipeline than trying to specify everything upfront in one prompt.
  • Whisper runs locally with no API cost and produces per-phrase timestamps, which is exactly the timing data needed to sync image switches to narration.
  • The same-folder pattern — every Claude Code session writing outputs to one shared project directory — eliminates manual handoff between pipeline stages and makes each step's output immediately available to the next.
  • Human voice recording is worth the friction: AI-generated speech risks YouTube account flags, while your own voice adds authenticity that the channel's simple visuals don't supply.
Glossary

Terms worth knowing.

Faceless YouTube channel
A YouTube channel that never shows the creator on camera, using narrated images, animations, or screen recordings instead. The Zen channel referenced here uses simple hand-drawn doodle illustrations over voiceover.
yt-dlp
A free command-line tool that downloads video metadata, transcripts, and audio from YouTube and other platforms. Used here to pull the full transcript of a reference video for script analysis.
Higgs Field (higgsfield.ai)
A platform that bundles access to multiple image and video AI generation models in one place. Offers an MCP server that lets Claude Code trigger image generation directly without leaving the terminal.
Whisper
OpenAI's open-source speech-to-text model that runs locally on your machine. It produces word- and phrase-level timestamps, which are used here to determine when each image should appear in the final video.
Claude Code skill
A reusable slash command in Claude Code that encodes a multi-step workflow. Once built, typing /faceless-video gives it a topic and runs the entire pipeline automatically.
FFmpeg
A free, open-source command-line tool for processing audio and video. Used here to combine a sequence of timestamped images with a voiceover audio file into a finished MP4.
Resources

Things they pointed at.

01:11channelNick Invest
03:30toolGPT Image 2
07:50toolyt-dlp
08:56toolWhisper
11:40toolFFmpeg
Quotables

Lines you could clip.

00:02
No face, no camera, no editing, just simple images that look like something a five year old could draw.
Visceral contrast between the simplicity described and the income implied — self-contained hook in one sentence.TikTok hook↗ Tweet quote
02:35
Focus on style characteristics only, not subject matter.
Tight, actionable instruction that most people skip — short enough to stand alone.IG reel cold open↗ Tweet quote
07:46
Everything is going to be done in Claude code and it only takes a couple of minutes.
Payoff statement after a full pipeline overview — works as a proof-of-concept closer.newsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

00:00I just found a faceless YouTube channel that started posting two months ago and is already making almost $12,000 per month. No face, no camera, no editing, just simple images that look like something a five year old could draw.
00:12This channel is called Zen and they already have a 156,000 subscribers and they've only posted 19 videos. Their most popular video has over 7,000,000 views and if we sort this by oldest, you can see their oldest video is only two months old.
00:25So in this video, I wanted to see if we could recreate a channel like Zen using Claude Code. It'll handle the entire script, all of the image generation, and it'll even stitch everything together in a single video automatically. So if you wanna run a YouTube channel where you never have to show your face that could potentially earn you tens of thousands of dollars per month, this video is for you.
00:44And if we haven't met yet, my name is Duncan Rogoff. I'm a former art director for brands like Apple, PlayStation, and Nissan, and I now run one of the top communities for learning quad code and building income. So focus in, close all your open tabs, and let's build.
00:57If you wanna get access to this entire system already built for you, I packaged everything up into a quad skill that you can install in a single click. Just check the link in the description. Come into the community, find a post that looks like this, open this up, and click download.
01:11So like I said, this channel is called Zen and it's only a couple of months old, but there's other channels like Nick Invest that do this same style of like really simple animation. They have a 185,000 subscribers.
01:22They're making a little bit less per month from YouTube, but they do drive traffic to a school community where they're making thousands of dollars over there. So if you're wondering how to get monetized on YouTube, it's really simple. All you need is a thousand subscribers with 4,000 public watch hours.
01:35So on a channel like this getting millions of views, you can easily get monetized from a single video. So let's just watch one of these videos real quick so we can get a quick sense of it. Tonight, when the sun goes down, you're going to flip a switch.
01:46Light will flood the room and you won't think twice about it. But for 99.9% of human history Okay.
01:52So when I look at this, there are really only two main components that matter. One is the script and the other is the images. So this is what we are going to try to recreate today.
02:00I think the first thing I wanna do is just to see if we can recreate the style of image because I think that's kind of the most fun and then we'll do the script next. So the first thing I'm gonna do is I'm just gonna take a couple of screenshots like throughout this video, and I'm gonna bring them into quad code. So I'm inside of the desktop app.
02:15We're inside of quad code. You just wanna make sure that you're like not in chat, you're not in co work, you're just inside of quad code. And so I'm gonna take a few screenshots just kinda moving through, like trying to find like a couple of different styles so we can get a good sense of things.
02:28So I just gave it five images. I tried to do like a mix of kind of like abstract things like this light switch, a mix of people, like a group of people, and then this one here which is just like some sort of chart. Right?
02:38And this is the prompt that I use. Analyze this image's visual style for use in an AI image generator. Give me one, a short positive style description I can paste as a prompt, and two, a list of negative prompts to prevent the generator from drifting towards polished or realistic output.
02:53And then I said focus on style characteristics only, not subject matter. Because I didn't really want it to describe like any of the specific characters in the scene, like how to draw a fire, right? Like, because I wanna be able to use this for literally anything.
03:04So here we can see some of the positive things. A hand drawn marker doodle illustration, thick uneven black felt tip outlines, etcetera. Things that it isn't are photorealistic.
03:13It's not a three d render. It's not a photograph. And so I think this is pretty cool.
03:16So before we commit to generating like a 100 images for a full video, I just wanna test out to see if this style works for like one image. Using this styling, can you just create a prompt for the scene, the elephant goes to the watering hole. So here's the prompt that it gave me.
03:29And what I'm going to do is I'm just gonna come over here and I'm going to go into Higgs Field. If you don't know, Higgs Field is a platform that gives you access to all of the latest image and video generation models in a single place. One of the things I love about Higgs Field is that it connects really nicely with Claude code.
03:43I'm gonna show you how to do that in just a couple of clicks. But first we might as well just do this the manual way since we're just testing. So I'm just gonna come into image.
03:50I'm gonna go into GPT image two. You can see I was already testing this out a little bit and I'm just going to paste in that description. Right now I just set this to low quality and one k resolution so we can save on credits.
04:00This literally cost me half a credit to do. If you are doing this for YouTube, I recommend bumping this up to at least two k. So this is the image that the prompt generated.
04:08This is super fun, super cute. I am totally happy with this. So we know that this piece works.
04:13So the next thing that we need to do is we need to be able to generate a script. So we're gonna do this in a couple of stages. One, like we already created our first illustration.
04:20The next thing I wanna do is I want to use Cloud Code to actually extract the format of the scripts from the Zen channel. Because the hook and the intro and the transition sentences and the overall length, all of these things actually matter when it comes to going viral or getting a lot of views on your video. So once we understand the format, we can actually write a brand new script from any topic.
04:40We're going to record the script with our own voice because I found that if you're using an AI voice generator, YouTube might actually flag that and get your account banned. Once we have that audio, we can actually transcribe the video for free to get all of the timestamps, which we're then going to use to generate all of our images.
04:55And then again for free, we're going to stitch all of the images together with our voiceover track to create the final piece. Everything is going to be done in Claude code and it only takes a couple of minutes. So I'm gonna come into Claude code and I'm just gonna do this in a new session because I like to keep things clean.
05:10So I'm just gonna go over to Zen's YouTube channel, I'm gonna sort this by popular and I'm just gonna hop into this first And this is that video with over 7,000,000 views. So I'm gonna copy the link from here, I'm gonna come into Cloud Code and I'm gonna paste this in. And I'm just going to say using y t dash d l p.
05:26This is just the name of a free service that will allow you to get transcripts from any YouTube video. I need you to extract the script from this YouTube video. I need you to figure out the hook, the intro, the format of the overall script, any transition sentences or things like that.
05:40Our goal is to be able to recreate new scripts from a new topic in this style. Okay. So this took a couple of minutes and it went through the entire script and did a full breakdown for us.
05:50So the entire format of it, a section that breaks down exactly how the hook happens in these like four separate little beats, like the core promise of the video, the three pillars of the body. And so what's cool about this is that we are working inside of the same folder that we're using for all of our images, and so it actually just put this script breakdown inside of this folder, so now we can reference it in the different Cloud Code chats.
06:11So we've gone ahead and extracted the script format, and now we can actually create our first script. But what's even cooler is we can actually start building our skill to turn this into a repeatable process. I want to turn this script framework into a new skill.
06:25The whole goal is to be able to give you any topic and you're going to write a script following this specific format. I'm just gonna use a really simple prompt like this. So this is actually stage one of the skill.
06:35There's going to be a second stage for generating all of the images and a third stage for actually editing the final polished video. So Claude's gonna ask me a couple questions like whose voice should this be in? Is this in my voice like based off my current brand guidelines or should it match the existing style?
06:49We wanna match the existing style. The formats authority comes from real name studies, dates, and hard stats. So that's really interesting.
06:56So it's asking me like if the skill should do any research first or just build the structure and I definitely wanted to research real evidence. The whole point of this is to mimic what's already working. This is asking me should the skill give me like script and then anything that tells me like what the visual should be or just the clean script.
07:12And right now, I just want the clean script. So this took a couple of minutes and it created an entire skill for us along with a test script. What's really cool about Cloud Code and one of the best ways to learn it it honestly is to click into any of these different pieces.
07:24Like this thing right here, it says this ran an agent to test the skill. So we can actually open this up to see what Claude code did. It created the topic like why do we dream.
07:33It told it to follow the skills workflow precisely like the actual structure that we pulled from the reference video, how to save all of the files, and what to return back to us. And then this is the full script that it actually wrote out to us. And so on the topic of why we dream, like this is the script.
07:48Tonight, when you close your eyes, your brain is going to build an entire world. That's already like a cool hook. People who don't exist, places you've never been, a story you'll move through as if it's real.
07:59So like this is already sounding really awesome and I can already start to visualize this in my head, which is really the whole point. We can cross this off. We've created our script.
08:06So the next thing to do is for us to just record this manually. I'm just gonna use voice memos on my Mac, and then we can use a free tool called Whisper to transcribe this with time stamps. And I'll show you why the time stamps are important in just a couple of minutes.
08:18So I'm just gonna open up voice memos on my machine, and I'm just gonna read like the first like two paragraphs of the script because I don't wanna take a whole bunch of time like doing the whole thing. Tonight, when you close your eyes, your brain is going to build an entire world. And so what I'm gonna do is I'm just gonna right click on this, I'm gonna go into show in finder to actually open up our audio file, and I'm just gonna create a new folder inside of our CCC folder which just lives on my desktop, and I'm just gonna drop in this clip that we just recorded.
08:47And so now this brings us to the second part of the skill. The skill needs to wait for us to record the audio, and then it actually needs to transcribe the audio with timestamps. Great.
08:56So we now need to update the skill that we built. After stage one, which is writing the script, we need to wait for the user to record the audio clip. Once the user tells you that the audio clip has been recorded, search in the folder for the new audio, I need you to use Whisper to transcribe that clip, and I need you to add timestamps.
09:14What's cool about using Whisper is that it runs locally on your machine so it's absolutely free to use. And you can run this as many times as you want. Okay.
09:21So this is awesome. So Whisper actually did timestamps per phrase. So each one of these different beats is a separate timestamp.
09:29This is very cool. So now we've transcribed our script with appropriate timestamps. And now it is time to put the pieces of the puzzle together.
09:37We need to generate images for each of the beats in our script. So in order to do this super efficiently, we're going to connect Higgs Field to quad code. It takes a couple of seconds.
09:46So what you wanna do is come into higgsfield.ai and then click on this MCP and CLI. All you need to do is just copy this link right here.
09:55This is the link to the MCP. So inside of Claude code, if you click the plus and you come into connectors and you go to manage connectors, you can click the plus to add a new connector.
10:06You want to add a custom connector, name this Higgs Field, paste this in, click add, it's going to ask you to sign in with your Higgs field name and password, and you are all good to go.
10:17And once you've signed in, you will see all of the different tools that you have access to. And I would just switch these to always allow so that you don't have to approve every single thing that it does. So you can see here we can generate all of our images and all of our videos if we need them.
10:30So I sent off this new prompt to Claude to tell it that once this has been successfully transcribed, the next stage of the skill is to actually generate the images using Higgs field. And so it's going to generate one image for every timestamp in the script. So it has to read the script carefully, create a separate image for each timestamp.
10:47If the timestamps look like this, each timestamp needs separate image being super super clear. And then if you remember, I had this image prompt from when we first figured out what these Zen style images looked like, and I just said every image must match this styling, and then I pasted all of that in. So we can see here that this is rendering everything in our style.
11:06It costs two credits per image, so about 32 credits for all 16 images. And this is relatively cheap. It's really up to you to figure out how much you wanna spend on something like this.
11:15But if you are generating millions of views and earning thousands of dollars, it might be worth it for you to spend on the credits. It's your call. So look at this cute little illustration that we started with.
11:24I love this. So these all just finished rendering, and honestly, look at how awesome they look and how consistent everything is. What's cool is it even saved everything to the folder on our desktop.
11:33And so there's nothing left to do but take all of these images, stitch them together end to end using a free service called FFmpeg and layer in our voice over. So let's add stage four to our skill.
11:44We just need one final stage, stage four that stitches everything together. So it's going to take the images in timestamped order, edit them end to end based off of the timestamp when they appear, and layer in the audio clip that has our voiceover. This should all be done using FFmpeg.
12:00Night, when you close your eyes, your brain is going to build an entire world. People who don't exist, places you've never been, a story you'll move through as if it's real. This is awesome.
12:12Not only do we have our first faceless video, we have this as a quad scale, which is a reusable pipeline that we can use to create this style of video on autopilot anytime we want. All you have to do is type in slash faceless video, give it a topic like why are cheetahs so fast, and the system is gonna do the rest. If you wanna get access to the skill already built for you, just check the link in the description.
12:34If you wanna see how I use Claude code to make some pretty insane cinematic AI ads, just check out this video right here. I'll see you
The Hook

The bait, then the rug-pull.

A two-month-old YouTube channel with no face, no camera, and no editing — just hand-drawn doodle images over narration — has 156,000 subscribers and $11,700 in monthly ad revenue. This tutorial reverse-engineers how it works, then automates the entire formula into a single Claude Code slash command.

Frameworks

Named ideas worth stealing.

04:16list

Six-Stage Faceless Video Pipeline

  1. Create an illustration (style-extract)
  2. Extract script format (yt-dlp + Claude)
  3. Create script (topic-driven skill)
  4. Transcribe with timestamps (Whisper)
  5. Generate all images (Higgs Field MCP)
  6. Create video (FFmpeg stitch)

The complete Claude Code workflow from reference channel to finished faceless video.

Steal forAny content production pipeline that needs to be repeatable and topic-agnostic
02:50concept

Style Extraction Prompt

Analyze a reference image for (1) a positive style description and (2) negative prompts, focusing on style characteristics only — not subject matter. Separating style from subject makes the description reusable across any visual topic.

Steal forAny workflow where you need consistent AI-generated images across many different subjects
CTA Breakdown

How they asked for the click.

VERBAL ASK
00:50product
If you wanna get access to this entire system already built for you, I packaged everything up into a quad skill that you can install in a single click. Just check the link in the description.

CTA fires at 00:50 — before any value is delivered — then repeats at 12:30. Both reference a community post link. A next-video CTA closes the video at 12:35.

MENTIONED ON CAMERA
07:50toolyt-dlp
08:56toolWhisper
11:40toolFFmpeg
FROM THE DESCRIPTION
PRIMARY CTAWhere the creator wants you to go next.
AFFILIATECommission earned if you click.
Storyboard

Visual structure at a glance.

hook open
hookhook open00:00
Zen channel stats
proofZen channel stats00:25
6-step checklist
promise6-step checklist04:16
script skill building
valuescript skill building07:54
checklist progress
valuechecklist progress09:56
image generation running
valueimage generation running11:40
final video plays
ctafinal video plays12:00
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this