Modern Creator
AI Samson · YouTube

Google Omni Does What Every AI Creator Has Been Waiting For

A 23-minute walkthrough of every capability Google's new thinking video model unlocks — from reference-based editing to character consistency to a live demo of the Google Flow toolset.

Posted
4 days ago
Duration
Format
Tutorial
educational
Views
75.6K
3.9K likes
Members feature

Chat with this breakdown.

Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.

Create a free account →
Chapters

Where the time goes.

00:0001:22

01 · Cold open — what Omni can do

Fast montage of Omni demo clips: physics sim, text animations, audio-synced motion. Host intro and promise statement.

01:2203:35

02 · Physics engine and cinematic realism

Rube Goldberg machine demo, action sequences, real-world physics understanding. Amino acids visual explainer as the thinking-model payoff.

03:3506:40

03 · Reference inputs — images, video, audio

Mirror-touch special effects, hand-telescope zoom, music-synced light animation, image-over-hand compositing. References as a new creative language.

06:4009:20

04 · Iterative turn-based editing in Google Flow

Chat-based video editing: swap objects, change environments, make violin invisible, alter camera angle. Firefly lighting and deep-fake character transforms.

09:2012:00

05 · Sketch-to-video and motion paths

Pen-and-paper sketches as AI input. Motion path control — bird flying in a circle, seedlings blown by wind. Summary of Omni's full capability stack.

12:0013:58

06 · Character consistency and voice attachment

Google Flow's Characters tab. Upload a photo, name a character, reference with @ in prompts. Demo: self as Olympic sprinter. Voice attachment for consistent vocal delivery.

13:5813:58

07 · Built-in agent and community tools

Omni's agentic assistant for brainstorming, prompt refinement, multi-image generation. Explore Tools gallery: Simple Sketch, Mockup, Scene Explorer, Shot Explorer, Converge.

13:5817:45

08 · Sponsor — Artlist AI Agent

Demo of Artlist's conversational AI creative assistant: meta-prompting, image generation, describe-image-to-prompt, video creation from chat.

17:4521:00

09 · Prompting framework for Google Omni

Five-element prompt structure: shot framing + motion, style, lighting, location, action. Screenshots from Google's own guidance with keyword callouts.

21:0023:35

10 · Advanced editing — camera work and storyboarding

Edit camera angles via natural language. Complex action references. Cinematic moves: push in, dolly zoom, tilt. Storyboarding with a grid of reference stills. Outro with next-video CTA.

Takeaway

Own the creative language, not just the tool.

Creator playbook

Samson's real argument is that AI democratized production — the new moat is original ideas and the ability to brief AI precisely.

  • Use references (images, sketches, video clips) instead of word-only prompts — the AI closes the gap between what you show and what you want far better than text alone.
  • Build characters once in Google Flow's Characters tab, then reference them by name across every video you make — this is the consistency unlock most creators are sleeping on.
  • Structure prompts with the five-element framework: framing/motion, style, lighting, location, action — each element is a cinematic decision, not a description.
  • Edit iteratively via chat: establish your base scene, then swap one element per turn. Don't regenerate from scratch.
  • The sponsor integration (Artlist AI Agent) models a clean pattern: demonstrate the capability, then show the tool solving the same problem. Worth borrowing for any sponsored tutorial.
Resources Mentioned

Things they pointed at.

Quotables

Lines you could clip.

02:48
Previously, you would have to pay thousands of dollars for animators to create something meaningful.
Crisp cost comparison — lands the democratization argument in one lineIG reel cold open↗ Tweet quote
06:05
When we can say we want something to look like this, it goes away from having to use words as a complete abstraction for what we're looking for.
Clean articulation of why references beat text promptsTikTok hook↗ Tweet quote
09:08
We need to have an original idea and we need to be able to communicate that effectively to the AI.
Reframes the AI era: the bottleneck is human creativity, not the modelNewsletter pull-quote↗ Tweet quote
16:57
We're getting to the point in the AI art life cycle where we need to go beyond creating individual images that work, and we need to create systems that allow us to create complex and meaningful pieces of art.
Strong thesis on the current inflection point — system-thinking over single outputsNewsletter pull-quote↗ Tweet quote
23:18
The deeper question becomes, not what the tool can do, but what do you want to say with it?
Tight closer that reframes AI video from capability to intent — strong standalone clipTikTok hook↗ Tweet quote
The Script

Word for word.

metaphoranalogystory
00:00Google's new video model, Omni, allows us to do things that were never before possible like this. We can add interesting special effects whilst maintaining perfect cohesion to the video.
00:11Now Google has unlocked a whole host of impressive features for this new tool, including much better character consistency, incredible possibilities with text to create animations that perfectly match the audio that you input, and so much more.
00:26In this video, I'm gonna break down exactly what Google Omni can do, why it matters, and exactly how you can use it too. If you're new here, I'm AI Samson. Now Gemini Omni has a whole host of exciting features, and these are built around the omni idea, which means we can use multiple different outputs from different sources to create almost anything.
00:46But it's not just the capabilities that are vastly expanding. It's also the quality of the video that we're getting out. Google Omni is quite possibly the most cinematic and real AI video we've seen, specifically under intense physics situations.
01:00Let me show you exactly what I mean. You can now create output that follows real world physics. Omni has an intuitive understanding of forces like gravity, kinetic energy, and fluid dynamics for more realistic movement.
01:13Now here you have an example of a Rude Goldberg machine, which requires a consistent chain of events to achieve an objective. And we're seeing a huge amount of different influences impacting the way that this ball navigates this obstacle course.
01:27Now this has major implications, specifically for when we're trying to create complex moving scenes.
01:34Now I've been experimenting with this myself, and here is an action sequence that I created. And as you can see, we've got myself here going through an incredibly dramatic series of movements.
01:47But one of the most exciting capabilities of this is enhanced text understanding because that means we can create useful, meaningful motion graphics that can be applied to videos. Now here's another example of this from Google itself, and this example is particularly interesting.
02:02The video shows items of the alphabet. An unusual item starting with each letter is shown sitting on a table, like a capybara for c, disco globe for d, and lava lamp for l. Now this is a complex and interesting type of video that was not before possible to have this amount of intelligence integrated deeply into the model.
02:20And what makes this truly different is that Google Omni is a thinking model, and it's leveraging the intelligence of Gemini underneath to make intelligent decisions for every part of the video. That means we can reduce hallucinations in the videos that we create and actually get it to create much more complex and meaningful works of media.
02:40For example, here, you can see this visual explainer of amino acids. And what Google Omni says is that it combines an intuitive understanding of physics with Gemini's knowledge of history, science, and cultural context, bridging the gap from photorealism to meaningful storytelling.
02:57And it truly unlocks a whole host of possibilities if you're looking to create a YouTube channel, for example, that is talking about specific scientific concepts and you need accurate and useful visuals to explain these.
03:10Previously, you would have to pay thousands of dollars for animators to create something meaningful. Now one of my favorite YouTube channels is the school of life, and they painstakingly animate individually every single video that they create.
03:22But one of the most interesting ways that we can use this creatively is by referencing anything. And by that, it means that we can take multiple images or video or audio and use them to influence the creations.
03:35Let me show you what I mean. Here, the individual has uploaded a video, and then he asked Google Omni to add this effect where he touches the mirror, and it changes the entire reality.
03:45Now this is where we're able to leverage real footage and apply cinematic special effects over the top. And this truly unlocks a wonderful hybrid area of not replacing real footage and real acting, but actually using AI to enhance it and create possibilities that would have cost a whole host more before.
04:04Now you can see another example of this where he's touching the mirror and it's completely changing the cinematic effect. Now what's interesting about this is that we get a whole host of possibilities. You can really use your imagination.
04:15Here we see this individual turning into this cute little character. Now this is an interesting concept where the individual is using their hand as a telescope, and Google Omni is able to create a zoom in effect on the circle in the middle of his hand.
04:31And this is what's giving us the opportunity to create really engaging, surprising video concepts, and it's democratizing the possibility for people to create meaningful and absurdist works of art. Here is another example.
04:43And what happens is the individual can touch these little wooden sculptures, and they will make the sound of the animal that they are representing.
05:01Now this is quite simply one of my favorite examples, and that's because the user has uploaded an image and a piece of music. And you're able to create this cinematic video that is perfectly in time to the music.
05:24So the lights switch on as we have each note played, and it creates a mesmeric effect. Here you can see a user uploading a view of their hand out in front of their face, and then adding in a image and asking the image to be placed above the hand.
05:39So this way, we can take a reference video and a reference image and combine them in a realistic way. And of course, you can do this for different images. And this is the beauty of it, that it still maintains the motion of the original video but changes the addition.
05:53Here you can see this solar system, and here is a little plane, and it gives us a lot of opportunity for fun. Now what I find the beauty of this is is that we're able to leverage the power of references. References allow us to speak and communicate with the AI in a much more specific and creative way.
06:11When we can say we want something to look like this, it goes away from having to use words as a complete abstraction for what we're looking for. And this allows us greater control, greater creativity, and greater fun in the whole process.
06:24Now Google only allows you to go through a turn based approach to this. And there are a couple of ways to use Google only, and one is directly inside of Gemini, which allows you a chat interface to converse with AI and update your videos turn by turn. Now another way to use it is in the Google flow tool, which is my preferred way because it's specifically engineered just for visual creation, whereas Gemini is obviously also a large language model chatbot, whereas this is much more specifically aimed at just creating images and video.
06:59But we're gonna come on to that a little bit more later, and I'm gonna show you exactly how to use it. But the only reason I'm telling you this is because that there is this turn based approach that we can use in the chat where we are iteratively improving. And you can see, first of all, they put an input video, they change the environment, they make the violin invisible, and they change the camera angle.
07:19But what's great is that there is a homogeneity between all of these different shots, but it gives us us the opportunity to update small objects inside of a video. For example, here you can change a plane, you can change a spaceship to a seed, to a flying clock, or a red frisbee, or even a raven.
07:34Now what I particularly love about this is the way that we can work so creatively without adding in extra elements and also intelligent sound. Now this example is absolutely beautiful because the individual has asked to add in these fireflies, but also add in the sound of a harp synchronized to him brushing it across the leaves of this fern.
08:01Now you can also notice just how well that the piece has been implicated with the lighting effects of these fireflies. So you can see the original, and you can see that the fireflies have added in their own lighting influence to the piece.
08:14And one way that we can do this is by looking at almost deep faking ourselves. So you can see here, there is an input video of this woman and then it's completely changed to being this anime character. Here you can see the woman changed into a raccoon.
08:26Then I can even do it myself here where I changed myself into a horse. Now, Now another creative way that we can work with Google Omni is by using our own little drawings and sketches to communicate what we're looking for.
08:40And I personally love this. As a very visual and tactile person, I truly am excited about the possibilities of of using the rudimentary form of pen and paper, uh, to influence and create the idea, the concept. It gives us this ability to paint with broad brush strokes and have them refined by the power of AI.
08:59And I honestly believe that this is going to be an important part of the process of creativity moving forward. The ability to communicate something in a very abstract and rough form and then giving that to the AI with enough information that it can understand exactly what we want.
09:15The challenge with AI now is that we can do anything. We can create anything with AI, but we need to be able to do two things. We need to have an original idea, and we need to be able to communicate that effectively to the AI.
09:26And this gives us a lot more control over things like motion paths. As you can see, we've accurately been able to get this little bird to fly in a circular motion. You can also get this person to blow and have these seedlings fly out.
09:39So Google Omni offers us cinematic realism, the ability to input audio, music, and video references.
09:46We have the power for video to video editing where we can take original footage and change specific elements of it, like characters, objects, or even add in special effects. But there are some other remarkable features that I want to explain to you because they really give us many of the tools that we've been asking for for years.
10:03And the first of those is complete character consistency, and there's an entirely new way to work with characters inside the Google Video Models. If you're in Google Flow, you can come to the characters tab.
10:14From here, you can build and reuse characters for consistent videos. Use a sample prompt below or create from scratch. Now there's even a second opportunity to do this with yourself where you can create a motion capture video and use that as the influence for your creations.
10:30But for now, I will show you the other method, which is where you can create a current character of anyone simply with an image. All you have to do is upload an image.
10:38I'll pop myself in for now, a little smiley photo from my Tinder profile photos. Oops. Is that too much information?
10:45You know, sometimes I don't know how much to reveal all here. Then you can describe your character. I'm British man, athletic, and add them in.
10:51You name your character, and from there, you're able to reference them for whatever video you'd like. So now you can simply type in the at, and it will bring up your characters.
11:01So I can select myself, add a prompt, so I can have the prompt showman willing an Olympic 100 meter. I was actually always a bit more of a middle distance runner than a 100 meters, but you know there's certainly an allure to sprinting that one cannot deny and we can get out a video like this. Yes.
11:16I did it.
11:22Now character consistency gives us the power to repeatedly use the same characters scene after scene, which allows us to create much more complex projects.
11:32Now the cool thing about Google Omni is we can also attach a voice to this character so it maintains not only a consistent likeness, but also a consistent vocal delivery. Now you have to make sure that you've selected the Omni model in the drop down inside of Google Flow to use Now the other cool new feature is an agent built directly into Google Omni that you can leverage for a number of different tasks.
11:53You can ask it to brainstorm new concepts, refine prompts, polish dialogue, and make different adjustments with its own intelligence.
12:01The agent allows us to perform more complex tasks at once, so we can ask for multiple images of a sudden scene, for example. But that's not all because Google has really treated us.
12:11It's a little bit like AI Christmas today because there are so many new features to explore. And another one is tools. And these are tools that individuals can build inside of Google that they can then release and help each other with their creative processes.
12:27The thought is that an idea and a description are all it takes to make whatever you need. Now let's go through some of these tools that are available. And for example, one is a simple sketch, which is where you can take a simple drawing and turn it into a realistic and refined image.
12:42So I might just put in a happy little face, and you can see this is the image we get out. But let's look at some of the more interesting tools here. One is a mock up generator, which allows you to create product mock ups for different items.
12:55There's also an image generator allowing you to transform objects, add text, and adjust image sizing. There is a specific shot explorer, which allows you to generate multiple shots from different angles of the same scene.
13:08Converge allows you to create photorealistic renderings of your sketches. Now these tools are separated into different categories. We have image tools, video tools, prompting tools, and experimental tools.
13:19Now some of these are really exciting, like the 360 degree environment from an image creator, which allows us to explore a 360 degree environment from just an image.
13:30And this is really going from image to three d. Now let's take a look at how we can prompt effectively inside of Google Omni because the prompting mechanics have changed. But first, I want to show you something else, and that is about the power of agentic AI creative assistance.
13:45Now what we can do with creative media is getting more and more complex, and understanding how to leverage agentic AI for creative purposes is an extremely valuable skill. And for that, I want to introduce you to today's sponsor, which is Artlist AI agent, which introduces a much more conversational way for us to create our projects.
14:04Now what this does is it leverages the power of an LLM to help us with our creative process. We can have a chat window to collaborate and iterate together with an LLM to create our work. This means we have a number of interesting features, and you can simply access it in the AI toolkit inside of Artlist.
14:23You go to AI agent. And from here, we can have a nice conversational discussion with our AI. So there are a number of prebuilt templates that we can use to create our prompts.
14:34First of all, we can use the help me write a prompt. Now this technique is known as meta prompting, and that's where we get the AI to help us write our prompts so they deliver more accurate results. So what do we need to ask for in this circumstance?
14:46We need to input the idea that we have so that AI can enhance it for us. So I've given it a basic prompt that encapsulates the idea that I have, a realistic cinematic image of a woman on a cliff at sunset in full Victorian costume.
14:59I can go ahead and send that in, agree to the terms, and we get out a much more refined prompt, a cinematic full body portrait of a woman standing on the jagged edge of a sea cliff wearing an intricate dark velvet Victorian morning gown with a lace detailing. Now if we're happy with this, we can immediately instruct the agent to go ahead and make our image.
15:20Now the model is built to select the correct model for your intention and helps us iterate rapidly and quickly in a chat interface. It helps us to take a rough idea and refine it into something that can become usable and professional in minutes. Now what's great about this is it remembers everything that we've said in the conversation.
15:39So if we're working on more complex projects where we're trying to retain details of a world or certain consistent elements, it helps us do that by applying it to its memory. So once we got the image out, this is the one that we have.
15:52You can see that we've got this beautiful cinematic scene and that it used the Nano Banana two model. It outputted this in two k quality, and now we can simply even ask it to make it into a video.
16:03This saves us having to drag and drop things into different interfaces and selecting exactly what we need to happen. You can simply use a basic text prompt and get out what we're looking for. Now this model works on using our existing AI credits to generate and also gives us some other interesting capabilities.
16:21And one of those that I love is the describing image. And what this does is it means we can take a reference image and get a workable prompt from that. So I might go find a image that I like.
16:30For example, this one of a woman with some dogs. I can take a screenshot of that. Then you can go ahead and upload the file.
16:36Then it will give us an image point for this, and we can go ahead and ask it to create the image. Now this is great for rapidly taking inspiration and turning it into our own work. Now what's interesting about this is we're getting to the point in the AI art life cycle where we need to go beyond creating individual images that work, and we need to create systems that allow us to create complex and meaningful pieces of art and design.
16:58The area where people can really stand out creatively right now is by creating much more complex works of art. It's extremely easy to make individual images or clips that have a sense of taste and aesthetic, but to go beyond this and create true stories or true brands is something that we are exploring in much more depth now.
17:19And I think their tools like agents are allowing us to do that more sincerely and more comprehensively. So here's the image that we got out, and you can see we instantly got out a fantastic representation of a very similar scene with our own taste applied.
17:35And we can also take a look at the video that we created of the one standing on the cliff edge, and you can see which model it used as well as how long it is and the quality. Artlist agent is available for you to try today, and I'll leave a link to that in the description below. And I'd also like to say a big thanks to Artlist for sponsoring this segment of the video.
17:54Now the key part with prompting in Google Omni is that the more detail you add, the more control you'll have over the final output. Use a mix of the elements below to create results that better reflect your imagination. And this is the art of going from idea to execution.
18:09And the better we get at this, the more profound the works we can get out are. Now the first thing to consider is shot framing and motion. This is about how do you want to frame your shot, wide angle, medium, or close-up?
18:21How do you want your camera to move? Should it glide gently or rush suddenly? Experiment to find the right approach for your scene.
18:27Now you can see they've selected this part of the prompt here, which is a wide angle tracking tracking shot. Glides gently across a serene lake. The next part to consider is the style, which is how your scene should feel.
18:39Is it realistic or cinematic, grounded or majestic? Tell Gemini omni the effect you want to create, and leave the model to work out all the details.
18:48Next up is lighting. Lighting is a crucial element for any type of emotional resonance inside of your work. So the questions you've gotta be thinking about here is how should your scene be lit?
18:57Where does the light come from? The sun, a street lamp, or off screen? And what effect does it create?
19:02Is the lighting crisp, warm, or ethereal? And in the example they presented, you have brilliant sun crests behind the floating anomaly, bathing the entire scene in crisp, ethereal daylight.
19:14And, of course, location. Where is your scene set? Tell Gemini Omni the landscape you imagine, like an alien landscape with clear azure water.
19:23But you don't need to describe every single little detail as Omni will work with your overall intention. And that's the key here is as long as we can communicate intention, we can get out much more detailed outputs.
19:36Gemini does not need every single intricate detail of what where one particular rock is going to be, but it needs to get a gist of where you're going, what is the direction that we are looking for. And this is where the art of communication becomes so important. As you can see, the keywords that have been pulled out for this serene lake, majestic cliffs, alien landscape.
19:56And, of course, with video, the key is action. We don't have a video without movement.
20:01And so the questions you have to consider for this part is, what is happening in your scene? Who are the characters and objects? How are they moving and interacting?
20:09A colossal or reflective chrome like bean shaped object levitating effortlessly above rotating slowly to reveal its distorted reflections. Now we can also edit iteratively using you can ask for a specific update like a background change or a new caption.
20:24And you can do this without needing to prompt the entire scene again because Omni will preserve your video across multiple amends, keeping what works and allowing and allowing us to focus on what isn't.
20:34So you can see here, the individual has used just very simple prompts. Change the butterfly to a bee and then change the bee into a small swarm of fireflies. There are a lot of fireflies appearing in these examples.
20:45Now, what I love is that we can also edit how your camera works. You can change the camera angle, point of view, and movement through natural conversation. So here is the example of playing the violin and then changing the angle to be over the shoulder.
20:58And if we just pay particular attention, you'll see that even the subtle finger movements remain consistent. Now you can really push this to its limits and reference complex actions. When you refer to a complex action, Gemini understands your intention and how this action should be applied across your video.
21:15You don't need to describe it across every frame. So examples here is edit is keeping everything the same and animated motion effects coming out of the skateboard. And it works so beautifully well.
21:27It absolutely helps add a cinematic quality to our AI videos is controlled camera work. And it's very much worth learning some of the most frequently used cinematic camera techniques. Different movements like push in, punch in, or dolly zoom give us real cinematic feel to our pieces.
21:46They also communicate a lot about what is going on in the pieces. Often, zoom in closer when things begin more intense, more intimate, or we pull out if we're looking to take a wider perspective or allow things to breathe. So here you can see in this example, the prompt was change the camera angle, a close-up on his shoes, quickly tilting up to medium shot then widening.
22:07And a lot of what happens in cinematic storytelling is based on how we communicate camera work. Now another great way to work with Google Omni is with storyboarding, and this is where we upload a number of different stills that define the exact heartbeats of a sequence.
22:26So here, the individual has uploaded six images with a short description of what's going on in each and also a input image.
22:36Now I do have a full video breaking down how to create complex grid image prompts below, which allows you to do this in a even more advanced way. That's Google Omni. It allows you to use any input and get any output, and it takes the possibility of creative AI media making to new heights.
22:55I'm extraordinarily excited to explore this in more depth, and we'll create more videos over this in the coming days. And now the deeper question becomes, not what the tool can do, but what do you want to say with it? Now one of the keys to getting better with AI is writing better prompts, and the best way to do that is to leverage an LLM to do it for you.
23:14And in this video, I explain the best process for writing complex prompts with Claude, and I suggest you watch that next to be able to write the best prompts possible for Google Omni. Now most of all, I wanna say thank you so much for being here.
23:28Thank you for watching till the end. Please subscribe if you've enjoyed this video. And most of all, I want to wish you a delightful day.
The Hook

The bait, then the rug-pull.

The thumbnail promise is blunt: every AI creator has been waiting for this. Samson wastes no time — five seconds in, he's already showing physics-accurate special effects applied to real footage, characters deep-faked into anime versions of themselves, and text animations that sync perfectly to input audio. The implication is clear before he even says his name: the ceiling on what one creator can produce alone just moved.

Frameworks

Named ideas worth stealing.

18:07list

Omni Five-Element Prompt Structure

  1. Shot framing and motion
  2. Style
  3. Lighting
  4. Location
  5. Action

Google's own breakdown of what to include in an Omni video prompt. Each element answers a specific cinematic question: how is it framed, what feel, how is it lit, where is it set, what is happening.

Steal forAny AI video prompt template — maps directly onto a storyboard brief
06:05concept

Reference as Creative Language

The idea that uploading images, video clips, audio, and hand-drawn sketches replaces word-based abstraction. References give the AI a concrete target rather than forcing the creator to describe everything in text.

Steal forFraming for a tutorial on how to brief AI tools — 'stop describing, start showing'
07:00model

Turn-Based Iterative Editing

Chat-first editing loop: generate a base, then refine one element at a time in natural language. Omni preserves unchanged elements across turns. Enables complex final results through small, controlled steps.

Steal forTeaching AI tool workflows — applies to image gen, video, and copy equally
CTA Breakdown

How they asked for the click.

23:10next-video
In this video, I explain the best process for writing complex prompts with Claude, and I suggest you watch that next to be able to write the best prompts possible for Google Omni.

Clean next-video CTA with a specific reason to click — ties the outro directly to actionable follow-through for the viewer. Executed in the last 25 seconds alongside a subscribe ask.

Storyboard

Visual structure at a glance.

cold open
hookcold open00:00
amino acids demo
promiseamino acids demo00:22
reference concept
valuereference concept06:05
character system
valuecharacter system10:04
sponsor
ctasponsor13:58
prompt framework
valueprompt framework18:07
closer + CTA
ctacloser + CTA22:40
Frame Gallery

Visual moments.