Modern Creator
Creating with Conor · YouTube

The Only AI Filmmaking Workflow You'll Ever Need

A 15-minute step-by-step walkthrough of building a complete cinematic AI short film — story, assets, continuity, and edit — using Claude and Higgsfield Cinema Studio.

Posted
2 weeks ago
Duration
Format
Tutorial
educational
Views
35.3K
Big Idea

The argument in one line.

AI-generated videos look like random clip collections because creators skip the three invisible layers every real film requires: a story that gives every clip purpose, a saved asset library that locks visual consistency, and video references that chain each generation to the last.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You have tried AI video generators and every clip looks like it came from a different film.
  • You want to make short cinematic AI films with consistent characters and continuous narrative flow.
  • You are comfortable with AI tools and want a structured, repeatable workflow you can reuse.
  • You are a content creator or filmmaker exploring what AI production pipelines can deliver today.
SKIP IF…
  • You want a platform-agnostic AI video tutorial — this walkthrough is built entirely around Higgsfield Cinema Studio.
  • You are looking for free tooling; Higgsfield is a paid and sponsored platform.
TL;DR

The full version, fast.

Coherent AI films require three things before a single frame is generated: a story with structure, a complete asset library of saved characters, locations, and props, and video references that chain each clip visually to the previous one. The tutorial walks through building a 13-clip action short using Claude for the story, Higgsfield Cinema Studio for character generation via auto mode and AI Cast, and emotion controls to shape each scene. Feeding the last 5 seconds of each clip as a reference into the next is the mechanism that produces consistent lighting, world, and character continuity. The final edit in any video editor replaces per-clip audio with one music track.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:45

01 · Cold open: finished film excerpts

Opening and cliffhanger clips from the completed AI short spliced together as the hook, before any tutorial context.

00:4502:28

02 · Story first

Why skipping story is the number one AI video mistake. Using Claude to generate a contained story: father, daughter, combat drone, remote desert farm.

02:2803:53

03 · Asset building: characters 1 and 2

Photo-reference auto mode to generate Connor (the host) and Leah (the daughter). The save-to-elements flow every character follows.

03:5305:34

04 · Asset building: characters 3 and 4

Switching to AI Cast for Hara (caregiver archetype, weathered ranch hand) and Boone (jester archetype, comic relief). Why archetype, budget, and era settings matter more than physical descriptions alone.

05:3406:35

05 · Asset building: locations and drone prop

Cinematic location model for the farm exterior and barn interior. Soul Cinema model for the threatening military drone prop.

06:3508:47

06 · Generating clips 1 and 2: continuity via video reference

Opening wide shot with serenity/joy emotion settings. Chaining clip 2 by using the last 5 seconds of clip 1 as a video reference to maintain world and character consistency.

08:4710:47

07 · Key clips: emotion shift and drone POV

Switching to terror/rage/fear for the drone attack scene. First-person POV from inside the drone targeting HUD as a deliberate directorial choice.

10:4711:52

08 · Prompt precision and cliffhanger clip

Concrete before/after demonstration of vague vs. specific action descriptions. The final cliffhanger clip: they sent one to find us.

11:5212:31

09 · Assembly: CapCut edit

Arranging 13 clips, trimming, replacing per-clip audio with one consistent background music track.

12:3115:28

10 · Finished film and workflow recap

Full playback of the completed AI short. Closing summary of the four-layer framework and CTA to Higgsfield.

Atomic Insights

Lines worth screenshotting.

  • The reason AI video clips look disconnected is not the generator but the absence of a story that defines what every clip must belong to.
  • Saving characters, locations, and props as named elements before generation is the equivalent of a production designer building the film world before cameras roll.
  • Feeding the last 5 seconds of each clip as a reference into the next generation is what creates lighting and world consistency across shots.
  • Emotion settings in Cinema Studio act as directorial instructions: faces, body language, and energy measurably shift when you change them.
  • The AI Cast workflow produces more controlled character results than freeform prompts because it structures the specific inputs the model responds to.
  • Prompt precision is the highest-leverage editing skill in AI video: vague physics produces weak generations; explicit consequences produce accurate ones.
  • Cinema Studio generates different background music per clip; keeping that audio ruins cinematic feel and a single consistent track is mandatory.
  • A cliffhanger ending works in an AI film for the same reason it works in any film: everything before it was built with intention.
  • The budget setting in AI Cast is a signal to the model about production quality and visual register, not a literal dollar figure.
  • Using an AI writing tool to generate the story first means the premise has structure before any video generation runs.
Takeaway

Three layers every AI film needs before you generate

WHAT TO LEARN

Coherent AI video comes from story structure, a saved asset library, and video references chaining each clip to the last — skip any one layer and the result is a demo reel, not a film.

  • Lock the story before generating a single frame: a 3-sentence premise with characters, escalating tension, and a twist gives every clip a reason to exist.
  • Build your asset library first: save every character, location, and prop as a named element so faces, lighting, and world stay consistent across all 13 clips.
  • Chain clips with video references: use the last 5 seconds of each generated clip as the reference input for the next to maintain lighting and visual continuity.
  • Emotion settings are directorial instructions, not labels: switching four characters from serenity and joy to terror, fear, and rage produces a measurably different clip.
  • Prompt precision multiplies on every generation: vague action verbs underdeliver; explicit physics and consequences produce accurate results.
  • Replace per-clip audio in the final edit: AI video tools generate a different music track per clip and keeping it breaks cinematic continuity across cuts.
Glossary

Terms worth knowing.

Cinema Studio
Higgsfield's dedicated filmmaking environment with a full production suite including image generation, video generation, emotion controls, camera movement settings, and a saved elements library — built around the idea that you are directing something, not just prompting.
Auto mode
A generation mode in Cinema Studio that follows the prompt closely without the AI layering its own creative decisions, producing the most faithful result from a reference image or detailed description.
AI Cast
A structured casting interface in Cinema Studio where you build a character by selecting genre, budget, era, archetype, and physical attributes rather than writing a freeform prompt — producing more controlled and specific character results.
Soul Cinema
Cinema Studio's dedicated model for high-quality cinematic image generation, used for props and assets that need to look sharp, detailed, and photorealistic rather than character-portrait quality.
Video reference
A previous video generation fed into a new generation as a reference clip, allowing Cinema Studio to maintain visual and narrative continuity — same lighting, same world, same character register — across successive shots.
Caregiver archetype
One of several character archetypes in the AI Cast interface; selecting it produces a character with a warm, approachable energy that affects facial expression and body language in generated clips.
Resources

Things they pointed at.

Quotables

Lines you could clip.

01:55
This is the step that almost every AI video creator skips, and it is the exact reason their content ends up looking like a collection of random clips.
Diagnostic hook that names the problem precisely, no setup neededTikTok hook↗ Tweet quote
08:40
That is continuity and that is what separates a film from a mood board.
Tight one-liner punchlineIG reel cold open↗ Tweet quote
10:50
When I wrote the water tank falls and hits the drone the result was underwhelming. When I rewrote it as the iron tank drops from its mount and slams down onto the drone, crushing it completely against the ground, the generation came back exactly as intended.
Concrete before/after that proves the point without theoryTikTok hook↗ Tweet quote
11:03
Read every prompt, tighten the action verbs, and make the consequences explicit.
Three-rule prescription, standalone and immediately actionablenewsletter pull-quote↗ Tweet quote
15:00
The story created direction, the assets created consistency, the video references created continuity, and Cinema Studio gave us the control to direct it all.
Clean four-part closing framework summarynewsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogystory
00:00It's crooked.
00:02Crooked is fine. Hand me another one. There's
00:05no more.
00:10The hell is that? Drone.
00:38Yeah. We made it.
00:42Someone find us. They found us.
00:45AI films have been on the rise lately, but have you ever wondered how they're made? Because if you try to create one yourself, you'll just end up generating a bunch of disconnected clips that look nothing like a real film. That's why in this video, I'm gonna walk you through the exact workflow I use to go from a blank page to a complete AI film that looks like it came out of a real production studio.
01:04Before we generate anything, we need a story. So the very first thing I'm doing is opening up Claude and generating my story. You can absolutely write your own if you already have something in mind, but I'm going to paste this prompt in and let it do the work.
01:16What we've got here is a tight contained story of a father and his daughter living on a remote desert farm when a combat drone strays into their valley and starts hunting them. This contains all the important elements a film needs. It has characters you care about, escalating tension, a moment of victory, and then an ending that leaves you wanting more.
01:33And that structure is what's gonna make every clip we generate feel intentional. This is the step that almost every AI video creator skips, and it's the exact reason their content ends up looking like a collection of random clips. When you're building a film, every single clip needs to feel like it belongs to the same world with the same characters and the same narrative thread.
01:52Without that foundation, nothing you generate is going to hold together no matter how good the individual shots look. That's the difference a story makes. So lock yours in before you generate a single frame.
02:02Now that we have our story, we need to build the world around it. And that means heading into the tool that makes this entire workflow possible. For this, I'm using Higgs Field because it has a unique tool that's crucial for AI filmmaking.
02:13Once you're inside, click on Cinema Studio. This is the feature that makes Higgs Field completely different from anything else in this space right now. Cinema Studio gives you a dedicated filmmaking environment with a full production suite built around the idea that you're directing something, not just generating.
02:28Once you're inside Cinema Studio, make sure you're on image mode. This is where we're going to build all of our assets before we touch the video generator. Four characters, two locations, and one prop.
02:37And I wanna be clear about why we do this first. Every element we create here gets saved and referenced directly inside our video prompts later. That's what gives the whole film visual consistency.
02:47So the characters, locations, and objects stay consistent across every single clip. Let's build them one by one. The first character is based on me, so I'm going to upload a photo of myself, set the model to auto, and paste in this prompt.
02:58Auto mode basically tells Cinema Studio to follow the prompt closely without the AI layering its own creative decisions on top of it. It's the most faithful way to generate from a reference. That's a clean photo real result.
03:10Details like the skin texture, the asymmetry, and the stray hairs make this feel like an actor on set, not a generated face. Once you're happy with your results, go to video mode, add reference media, then element, then new element. Upload the image, set the name to Connor, set the category to character, and save.
03:26Every character follows this exact same save flow, so remember it because we're going to be doing it a few more times. Now let's make our second character the same way. Still in auto mode, paste in the prompt, and generate.
03:36This is the daughter in our film. Save her the same way and name her Leah. For our other two characters, we're switching the method.
03:42Instead of auto, select AI cast and then click build your cast. This opens up a completely different workflow where you're not writing a prompt, but you're making selections across a full casting interface. With the options we get, it genuinely feels like casting a real actor for a real production.
03:57The reason we used auto for Lia instead of this method is because AI cast doesn't give us the option to generate children characters. So auto with a detailed prompt is the right call for her. For Hara, I'm selecting drama as the genre.
04:0980,000,000 budget, twenty twenties era, and the caregiver archetype. Then for the physical side, I'm going with male, white, 55 years old, stocky build, average height, short beard. And in the details, I'll select custom and type weathered ranch hand sun creased with a workwear outfit.
04:25Every single one of these is crucial and affects the final output way more than you think. The caregiver archetype will make him feel friendly and more approachable, which is perfect for his role in the story.
04:35And the budget is not too high because we want this to be clean but not look like a Hollywood blockbuster. We're creating a film with grounded and emotional storytelling, so 80,000,000 is the perfect spot. Let's generate.
04:45He looks really good. That's exactly the kind of result you get when you're specific about who this person is rather than just describing what they look like. With AI cast, you get so much more control because there's only so much you can write in a prompt and you end up missing something.
04:59This way you can control every single detail. Save him and name him Hara. It's the same AI cast workflow for our final character.
05:05I'll select these settings. I'm keeping drama in 80,000,000, but switching the archetype to jester. Then male, white, 25 years old, athletic build, average height, just some stubble, suntanned in the details, and workwear for the outfit.
05:18This is a younger character with lighter energy, kind of like the comedic relief that every film like this needs, so that the viewer gets room to breathe. You can already feel the contrast between him and Hara just from the faces alone. That contrast is gonna do a lot of work in the film.
05:31Save him and name him Boone. Now we're moving on to our Still in image mode, change the model to cinematic locations, set the resolution to two k so we get that film like quality and paste in this prompt.
05:43We get back this cinematic still of a modest farmhouse with a water tower standing tall on the left side of the frame. That water tower is going to matter a lot later. Save it under the location category and name it farm.
05:53It's the same process for our second location. So I'll paste in this prompt and generate. The way the lighting reacts with the environment and the texture of the wood make this look like a real set.
06:02This is our interior, so go ahead and save it. I'll name it barn. For our last asset, we're switching the model to Soul Cinema.
06:08This is Cinema Studio's dedicated model for high quality cinematic image generation and it's the right call for a prop that needs to look sharp, detailed, and genuinely threatening. I wanna create the drone that will attack our characters, so I'll paste in this.
06:21This looks clean with a military aesthetic. It's genuinely threatening. And let's just say it's not something you would want hovering over your farm.
06:28Save it and name it drone. These are all of our assets made and saved, and having them is super important. If we were to skip straight to generating, we wouldn't be making a film, just a random collection of high quality clips.
06:39But when making a coherent story, having the same characters and settings is crucial. And now that we have our assets, we can start creating our film. Stay in Cinema Studio and switch from image mode to video mode.
06:50The interface shifts when you do this and it's genuinely one of the most impressive things about this tool. You get a full set of production controls that feel like you're actually directing something. You can select your saved elements, set the emotions of your characters, set the genre, the camera movement, and so much more.
07:07And every one of these affects how your final output will turn out. So you're not just prompting, you're directing. Directing.
07:12For this film, I'm generating 13 clips in total. I'm not gonna walk through every single one because the process is identical each time, but I'm gonna take you through the key ones so you understand exactly how it works. Stick around to the end to see the result.
07:25I'll start with our opening shot and it's the simplest generation in the whole film because we're only working with images at this point. Here's how I set it up. I'm dropping in Connor and Leah as my saved characters, selecting the farm as the location.
07:37And then for emotions, setting Connor to Serenity and Leah to Joy. This is important because for our opening shot, we want to establish a happy moment with our characters before the action kicks in. Genre is action.
07:48Duration is fifteen seconds. Resolution at ten eighty p. Aspect ratio 16 by nine, audio on, and shot control set to smart.
07:56Then I paste in the prompt and hit generate. And look at that. We get this wide shot of the farm at sunrise with our characters walking across the yard.
08:03It already feels like the opening of a real film. And a big part of that is the emotion settings. That serenity on Connor, that joy on Leah.
08:11You can feel how peaceful and normal life is for these people right now, which is exactly what makes everything that comes after hits so much harder. Now let's make the second video. But before generating this one, click add reference media, then video generations, find the clip we just made, and save the last five seconds of it.
08:26Because this time, we're using it as a video reference. And this is where the workflow gets a lot more powerful. Cinema Studio lets you feed your previous generation in as a reference for the next clip.
08:35It uses that to maintain visual and narrative continuity across shots and keeps the same lighting, the same world, and the same vibe. This is what makes the film hold together rather than looking like a series of separate generations. Settings this time include all four characters, the farm location, and video one as the reference.
08:52For the emotions, I'm keeping them the same for Connor and Lia and selecting Serenity for both Hara and Boone. Every other setting stays the same. I'll paste in this and generate.
09:01We now see all of our characters in the same setting, all interacting with each other and the environment. And everything has remained perfectly consistent with all the elements being carried from the first clip. That's continuity and that's what separates a film from a mood board.
09:14For the next one, I wanna jump ahead to the first major turning point in the story and it's a good example of how the emotion settings completely change the energy of a generation. We're still referencing the previous clip, but look at the emotion shift. Connor is now set to fear, Leah to terror, Hera to rage, and Boone to terror.
09:30Compare that to the serenity and joy from the opening shot. That's the entire emotional arc of the film compressed into four settings, and cinema studio actually reads those. The faces, the body language, and the energy of the clip all reflect what you set.
09:44Let's generate and see what we get. It's just a That's a brutal scene in the best way possible.
09:56The drone appears and disturbs the calm and peace of our characters. The emotions we selected really come through and Boone meets his end. The timing between his reaction and the drone firing isn't the best, but it's not that noticeable.
10:08Let's jump ahead a bit more. This one is worth showing because the prompt structure is completely different from everything else we've done. Instead of a standard third person shot, this is a first person POV from inside the drone's targeting heads up display.
10:19It's honestly one of my favorite ideas to try out. Only Connor is selected as a character and I'll set his emotion to rage. He's obviously angry after what just happened and wants to take the drone down.
10:30I'll paste in this. That shift in perspective, suddenly we're inside the machine that's been hunting them, changes the entire feeling of the sequence. It's a directorial choice.
10:38And the fact that you can make that choice inside Cinema Studio is exactly what makes this tool different from everything else.
10:47Come on. The
10:50first person POV looks really good and everything remained intact. This is also a good moment to talk about prompt precision because it matters more than almost anything else in this workflow. I wrote all of these prompts with Claude, but I reviewed every single one before running it.
11:04There is a real difference between vague action descriptions and specific ones. When I wrote the water tank falls and hits the drone, the result was underwhelming. When I rewrote it as the iron tank drops from its mount and slams down onto the drone, crushing it completely against the ground, the generation came back exactly as intended.
11:21Read every prompt, tighten the action verbs, and make the consequences explicit. Let me also show you the final clip. All three remaining characters are on the farm.
11:29Connor set to vigilance, Leah to terror, and Hara to fear. And the prompt describes what happens after they think it's over.
11:41They sent one to find us. They found us. This is an incredible cliffhanger.
11:45And it works because everything before it was built with intention. The characters, the world, the story, the continuity between clips. Every decision we made from the moment we opened Claude feeds directly into why that final shot lands the way it does.
11:58Now we have everything we need, so let's put it all together. Open your video editor. I'm using CapCut, but any editor works.
12:04Arrange your clips from video one to video 13, trim anything unnecessary, and add one consistent background music track across the entire edit. This matters more than it sounds. Cinema Studio generates different music for every clip.
12:17So if you keep the original audio, the soundtrack changes at every cut and ruins the cinematic feel. Now let's look at our final result.
12:31It's crooked.
12:33Crooked is fine. Hand me another one.
12:36There's no more.
12:45Leah, you sure you're not nine going on 40?
12:48You're not funny.
12:50He's really not.
12:55The hell is that? Drone.
13:02Don't move. It's just a boom.
13:05Don't Horn, go.
13:15Cover.
13:33We can't outshoot it. I know. Then what?
13:40Tower.
13:43Hey. Over here.
13:55Climb. Don't look back.
13:57Dad. Climb.
14:37Yeah. We made it.
14:41Dad? I'm here.
14:55They sent one to find us. They found us. What you end up with is a complete AI film with a clear story, consistent characters, and scenes that flow naturally together.
15:05That consistency comes from planning the whole film from the beginning. The story created direction, the assets created consistency, the video references created continuity, and Cinema Studio gave us the control to direct it all.
15:16And the best part is that this workflow works with any story you have in mind. So if you wanna get started and make your own cinematic film, click the link in the description to sign up to Higgs Field. Thanks for watching, and I'll see you in the next one.
The Hook

The bait, then the rug-pull.

The video opens not with an explanation but with the finished product: a cinematic wide shot of a desert farm at golden hour, two characters working, a drone appearing on the horizon, a shotgun raised, a question shouted into the sky. Forty-five seconds of AI-generated film before the creator speaks a single instructional word.

Frameworks

Named ideas worth stealing.

15:00list

The 4-Layer AI Film Foundation

  1. Story creates direction
  2. Assets create consistency
  3. Video references create continuity
  4. Cinema Studio controls create direction

The four elements the host identifies as necessary for a coherent AI film, named explicitly in the closing recap.

Steal forAny AI video or content production workflow that needs to move from individual generated clips to a coherent narrative output.
03:53model

AI Cast Casting Interface

  1. Genre
  2. Budget in millions
  3. Era
  4. Archetype
  5. Physical attributes
  6. Details / custom

A structured casting workflow that replaces freeform character prompts with a cascading selection interface, producing more controlled and expressive character results.

Steal forAny workflow where consistent AI characters need to carry emotional register across multiple generations.
06:35model

Emotion Control System

  1. Serenity
  2. Joy
  3. Terror
  4. Rage
  5. Fear
  6. Vigilance

Per-character emotion dropdowns that function as directorial instructions, shaping faces, body language, and overall clip energy.

Steal forScene-level emotional contrast in any AI video production.
CTA Breakdown

How they asked for the click.

VERBAL ASK
15:11product
Click the link in the description to sign up to Higgs Field.

Single clean line after the finished film playback. No hard pitch; the 2-minute film demonstration does the persuasion work. Affiliate link in description.

MENTIONED ON CAMERA
FROM THE DESCRIPTION
AFFILIATECommission earned if you click.
Storyboard

Visual structure at a glance.

open
hookopen00:00
Claude story
promiseClaude story00:49
host explain
hookhost explain01:04
Cinema Studio
valueCinema Studio02:52
Connor char
valueConnor char03:09
AI Cast
valueAI Cast04:25
farm location
valuefarm location05:42
drone prop
valuedrone prop06:28
elements lib
valueelements lib07:01
13-clip grid
value13-clip grid07:14
clip 1 result
valueclip 1 result08:42
continuity demo
valuecontinuity demo09:51
drone POV prompt
valuedrone POV prompt10:30
drone HUD clip
valuedrone HUD clip10:50
prompt precision
valueprompt precision11:09
CapCut edit
valueCapCut edit12:02
CTA
ctaCTA15:11
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

18:27
metricsmule · Tutorial

I Gave Agent One an Idea...It Created a Movie

A live walkthrough of InVideo Agent One — an AI filmmaking system where you direct a creative producer agent through a screenplay and character references to produce a full two-minute cinematic short.

June 9th
Chat about this