Modern Creator
Nimish (Empulse Labs) · YouTube

Claude Can Now Watch Videos | Full Tutorial

A 9-minute setup guide that turns any YouTube video into a Claude-queryable knowledge source with exact timestamps.

Posted
3 weeks ago
Duration
Format
Tutorial
educational
Views
2.2K
32 likes
Big Idea

The argument in one line.

A local video-download pipeline paired with Claude visual and text reasoning turns any YouTube video into a queryable archive you can interrogate by timestamp, replacing passive rewatching with active querying.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You watch YouTube tutorials regularly but rarely finish them.
  • You want to extract a step-by-step workflow from any video without manual rewinding.
  • You use Claude Code daily and want to extend it to visual and audio content.
  • You want to pre-evaluate whether a long video is worth your time.
SKIP IF…
  • You do not use the Claude desktop app. This skill requires the plugin marketplace and will not work on claude.ai.
  • You are looking for a browser-based or zero-install solution.
TL;DR

The full version, fast.

The /watch skill installs in one step inside Claude Code: paste a GitHub link and Claude handles the rest. Under the hood, yt-dlp downloads the video, ffmpeg extracts around 80 frames, and Groq free Whisper API transcribes the audio. Claude then reads the structured frame-and-transcript dump and can answer questions tied to specific timestamps. Nimish runs three live demos showing the output is substantially more navigable than Gemini flat summaries.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:40

01 · Hook and problem statement

Opens with the claim that Claude can replace passive video watching; explains why Gemini and other tools fall short with no timestamped segment breakdown.

01:4003:35

02 · How the skill works

Walks through the README: yt-dlp download, ffmpeg frame extraction around 80 frames for 3 to 10 min videos, transcript pull.

03:3503:57

03 · Transcript engine options

Explains Groq free Whisper tier around 40 min per day free vs OpenAI paid option. Clarifies this is not Twitter Grok.

03:5705:04

04 · Installation walkthrough

Pastes the GitHub link into a Claude Code session, installs the skill, navigates to groq.com, creates an API key.

05:0406:32

05 · Demo 1 YC robotics video

Runs the skill on a YC interview about robot training data. Shows timestamped segment breakdown. Compares output side-by-side with Gemini.

06:3208:26

06 · Demo 2 SEO tutorial

Runs /watch on a Nico AI Ranking SEO tutorial. Claude generates a step-by-step workflow. Nimish creates an MCP config, it fails, Claude debugs the credential format error live.

08:2609:44

07 · Demo 3 Codex comparison and close

Runs /watch on a Riley Brown Codex video from X/Twitter. Asks Claude how Codex compares to Claude Code. Closes with a call to share use cases.

Atomic Insights

Lines worth screenshotting.

  • Claude does not watch a video stream. It reads a structured dump of frames and a transcript, which is why timestamp-level answers are accurate rather than hallucinated.
  • Groq provides around 40 minutes of free Whisper transcription per day, making the skill effectively zero-cost for most single-video sessions.
  • The skill only works in the Claude desktop app via the plugin marketplace. It cannot be used on claude.ai.
  • Gemini produces flat summaries. The /watch output breaks a video into timestamped segments, which is the practical difference for anyone following a tutorial.
  • When an MCP config failed mid-demo, Claude diagnosed the credential format error in real time. The live debug is the most instructive moment in the video.
  • yt-dlp supports YouTube, TikTok, Instagram, Loom, and local mp4/mkv files, so the skill works on nearly any video source.
  • The skill processes everything in a temporary directory and deletes files after analysis. No storage accumulation.
  • A knowledge graph extension is possible: run multiple videos through the skill and ask Claude to expand a shared document each time rather than creating isolated summaries.
  • Frame budget scales with duration. A 3 to 10 minute video gets around 80 frames and the extractor adjusts automatically.
  • The entire pipeline requires zero manual steps beyond pasting a URL into a Claude Code session.
Takeaway

Any video becomes a searchable document in four steps.

WHAT TO LEARN

Claude cannot watch video natively, but a lightweight local pipeline, download, frame-extract, transcribe, query, closes that gap at essentially zero cost.

  • Claude reads a structured dump of frames and a transcript, not a live stream. That is why its timestamp answers are grounded rather than guessed.
  • Groq free Whisper tier around 40 min per day makes transcription cost-free for most single-video sessions. You only pay if you exceed the daily limit repeatedly.
  • The skill only works in the Claude desktop app via the plugin marketplace. Attempting to use it on claude.ai will fail.
  • Gemini flat summaries omit timestamped segment breakdowns. The difference matters most when you need to locate a specific step in a tutorial.
  • Running a video through the pipeline before watching it lets you decide in under a minute whether it contains anything you do not already know.
  • When the pipeline produces a step-by-step workflow, you can paste errors directly into the same Claude session and debug without switching tools.
  • The same pipeline works on TikTok, Instagram, Loom, and local files, not just YouTube, because yt-dlp handles the download layer.
  • A knowledge graph is a natural extension: run multiple videos through the skill and ask Claude to expand a shared document each time rather than creating isolated summaries.
Glossary

Terms worth knowing.

yt-dlp
An open-source command-line tool that downloads videos from YouTube and hundreds of other platforms. The /watch skill uses it to fetch the video file and pull any available captions before falling back to Whisper transcription.
ffmpeg
A widely-used open-source video processing library. The skill uses it to split a downloaded video into a set of JPEG frames at regular or scene-change intervals.
Groq
A model inference provider at groq.com, unrelated to Twitter Grok, that hosts OpenAI Whisper speech-to-text with a free daily transcription tier of approximately 40 minutes.
Claude Code skill
A plugin installable via the Claude desktop app marketplace that extends Claude with new slash commands. Skills are typically installed by pasting a GitHub repository link into a Claude Code session.
MCP (Model Context Protocol)
An open protocol that lets Claude connect to external data sources and tools. In the video, Nimish installs a DataForSEO MCP and debugs a credential error live.
Resources

Things they pointed at.

02:41toolGroq
05:04linkYC Robots Don't Need More Compute. They Need This.
06:33channelNico AI Ranking SEO with Claude tutorial
08:27linkRiley Brown Codex capabilities video
Quotables

Lines you could clip.

00:00
I don't watch YouTube videos anymore. I ask Claude Code to go through it and give me entire summaries.
Punchy opening claim that reframes a universal behavior, instant hook.TikTok hook↗ Tweet quote
06:20
The detail that you get with using the skill is much better than Gemini.
Direct head-to-head verdict, quotable for AI productivity content.IG reel cold open↗ Tweet quote
08:29
All of this is happening for free. We are not spending anything on the APIs or the tools that we are using.
Strong closing value claim, resonates with cost-conscious builders.newsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogystory
00:00I don't watch YouTube videos anymore. I ask Cloud Code to go through it and give me entire summaries. It goes through the entire video frame by frame and through the entire transcript.
00:11But this does not come out of the box within Cloud Code. You have to get a specific configuration for this to happen. If you use Cloud Code for getting YouTube video summary, you might be generating the transcript first using all the free tools on the Internet, or you might be using Google Gemini for getting summaries of the video or notebook element or something else.
00:31But why this is much more better than all the other options is? Because in this case, I can reference exact time stamps within the video. I can ask what is being discussed over there, and my cloud code extracts the exact frame from the entire video.
00:45It extracts the specific segment from the entire transcript and tells me what is happening. And that's why this is much more powerful than any other way of going through YouTube videos.
00:56In this video, I'm gonna show you the best way of setting this up on your own plot code, and I would be showing you a couple of use cases that I use this skill for. And plot code can natively not understand videos, and the reason for that is they decided to make the best text models rather than focus on everything out there.
01:15They wanted to focus on only one thing, making their text models the best ones, and they have actually nailed it very well. I've been using it for a couple of weeks. Now you will see that all the YouTube videos that I watch, I first put them through the skill, and then I actually go through the entire video.
01:32That saves me a lot of time, and I don't have to go through videos that already contain something that I know. So let's get started. This skill is created by Brad.
01:43Shout out to him. This is a great repository. If I scroll down, it will also explain us to us that how does this skill work.
01:51The first thing is you paste a video or you drop a question. It uses a package called y t d l p for downloading the entire video. This supports YouTube, Loom, TikTok, Instagram, or any other platform.
02:05This also supports local path, and you can upload a m p four video or MKV video or a bunch of different formats. And all of this happens on a temporary directory, so it deletes the file once it has gone through it. It uses a package called FFMPEG.
02:22This splits the video into multiple frames. If it's three to ten minutes long, it splits it into around 80 frames, and this depends on how long the video is.
02:32And the fourth thing is it extracts the transcript. So if the transcript is natively available on the video, it gets it directly from the YTDLP package that it used earlier for downloading it.
02:44If it's not available there, then it uses Whisper from OpenAI. Whisper is their model for converting speech into text, and it uses using Grok.
02:56And cool thing about it is that Grok has a free tier. So if you are transcribing under a certain limit, which is actually pretty considerate, they give you around forty minute of transcription for free every day.
03:10So you can use it from there directly, or you also have an option of using OpenAI's key directly here. So you can use the model directly from OpenAI's platform, in which case, it would be paid, but still a very marginal cost.
03:24Next thing it does is it gives all those frames and transcripts to Clod. And then Clod goes through them systematically and gives you a summary of everything that is in the video.
03:36And then you can use it for asking specific questions and understanding individual concepts. This is very, very useful if you are trying to learn something new or if you are watching a tutorial. And in that case, everything that is on the screen along with what is being spoken is of importance.
03:54So that's about this entire repository. And all you have to do for installing it is you just copy the link of this repository. You go to a new session in your Claude code.
04:06Please note, this does not work with cloud dot a I. Like, you cannot use it on the website.
04:13It only works with the Cloud desktop app because it uses a plug in from the marketplace. So you can use it only on the desktop version of Claude.
04:23So all you have to do is you just drop the link and ask it to install the skill. And, yeah, it's as simple as that. Once you do it, then it will ask you a couple of questions.
04:34It will ask you how you want to use their transcription model. Once that happens, you have to go to Grok and grok.com.
04:42This is not Twitter's Grok. This is a different Grok, which is a model provider. You go over here.
04:48You go to your own dashboard, and you go to API keys, and you create a new API key for specifically this. And then you come back and paste it here once it asks for it.
05:02And that's it. That's the entire installation. Now let's go back and check our results that we got from running the scan.
05:10It scanned through the entire video. It gives us a summary of different key moments and what was being discussed in which part of the video. This video was from YC, and it discusses that robots don't need more compute.
05:26They need this, and that this is what would make people to watch this video. Let's start with that. What is meant by this in the title?
05:39Since it understands the entire video, it can ask it can answer any question that you have around it. By this, they mean high quality, well labeled training data, which is requirement of every LLM.
05:51That's there. Now you can dive deeper into different sections of the video, but you got a gist of what was in the video. And on the other hand, if we look at the summary that was created by Gemini, it's good for a summary, but it's not as detailed.
06:06It does not mention different segments of the video and specifically what that covers. So you clearly know that the detail that you get with using the skill is much better than Gemini.
06:18And you can arrange it locally on your files, so you can have a file structure where you're saving summary of different videos. You can ask it to even create a knowledge graph where it expands it every time you ask it to go through a new video. Another cool thing is that this becomes really interesting when you are using this video to go through a tutorial.
06:40For example, one of the tutorials that I recently wanted to go through is by Nico. Nico created a video around how to do SEO using ChargeGPT.
06:52It was interesting, but I never had time to go through the entire thing. So I will just copy this link of this video, and I will come back, and I will open up a new session.
07:05I will just use the same skill, watch, and I will drop the link of the video. So this video is a tutorial in which this guy actually uses Claude and shows that how exactly he uses this for SEO.
07:19And in this case, this is interesting because I can extract each of these frames using the skill, and then I can just follow the entire process.
07:28And wherever I'm stuck, I can directly ask Claude, and it will tell me exactly how to use it.
07:36So I have the steps right in front of me. I see that it has created a step by step workflow for me to follow. I will start from the top.
07:45I will start creating a data SEO account, and I can go all the way until step third and start implementing this. And it has also created these visual highlights of the frames where something important has been mentioned. If I get stuck at one of these places, for example, once I have created the MCP, if it's not working, so I can mention the problem that I'm facing.
08:09I have created MCP, but it isn't working. So it asked me a couple of questions, and then it also told me what exactly needs to be done. It mentioned that after filling in my credentials, I need to restart this.
08:22And then if I share the error message, it can help me debug this further. So this way, makes learning anything new using a YouTube video becomes really simple.
08:33That's one of the interesting use cases. Along with that, another cool thing is this also works with video from other platforms. And the thing is that not all of them work directly with any platform.
08:45But with this, you can run any video. For example, this video from Riley Brown where he explains how to use codecs. So I don't want to go through the entire video, but I want to understand what is the gist of this and how does it compare with plot code.
09:01So I will just copy the link of this video, and I will paste this here. I will first mention the skill, watch, and then I will paste the link here. And then I will ask it, can you please explain basis this video that how Codex is better than Cloud Code?
09:17And it will do the same thing over here again as well. It will download the video, extract the frames. It will also extract the transcript of the video, and then finally give me the entire output.
09:28And all of this is happening for free. We are not spending anything on the APIs or the tools that we are using along with this. That's all about the skill.
09:37Do let me know if you try it out and what are you trying it out for. And I will see you in the next one. Bye bye.
The Hook

The bait, then the rug-pull.

Most people still scrub through YouTube videos the old way. Nimish opens by announcing he has stopped watching videos entirely, delegating the job to Claude Code and a community-built skill that reads frames and transcripts so he can query the content like a database.

Frameworks

Named ideas worth stealing.

01:40model

Video-as-database pipeline

  1. yt-dlp download
  2. ffmpeg frame extraction
  3. Groq/Whisper transcript
  4. Claude analysis + Q&A

Four-step pipeline that converts any video into a structured artifact Claude can query with timestamp precision.

Steal forAny workflow where you need to extract structured knowledge from video without watching it manually
CTA Breakdown

How they asked for the click.

VERBAL ASK
09:36next-video
Do let me know if you try it out and what are you trying it out for. And I will see you in the next one.

Soft verbal close with no subscribe push or product pitch, just curiosity about viewer use cases.

MENTIONED ON CAMERA
FROM THE DESCRIPTION
Storyboard

Visual structure at a glance.

hook
hookhook00:00
README
promiseREADME01:40
install
valueinstall03:57
demo 1
valuedemo 105:04
demo 2
valuedemo 206:32
demo 3
valuedemo 308:26
CTA
ctaCTA09:36
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

09:38
Chase AI · Tutorial

10 Minute Masterclass: Claude Code Skills

Everything you need to know about Claude Code skills — what they are, how they load, how to trigger them, and how to build benchmarked custom ones — in under ten minutes.

March 16th
Chat about this