Modern Creator
Matt WhoisMatt Johnson · YouTube

Text to Speech in DaVinci Resolve: Full Tutorial

An 8-minute walkthrough of DaVinci Resolve 21's built-in AI voice cloning — from a clean 10-second audio sample to a convincingly matched line replacement in under two minutes.

Posted
2 days ago
Duration
Format
Tutorial
educational
Views
4.9K
249 likes
Big Idea

The argument in one line.

DaVinci Resolve 21's on-device AI Speech Generator lets video editors fix mispoken lines in under two minutes using just 10 seconds of clean audio from the subject and a text box — no third-party tools, no cloud, no reshooting.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You film weddings, events, or interviews where talent messes up a line and reshooting is not an option.
  • You edit in DaVinci Resolve 21 and want to use its built-in AI voice tools without third-party plugins.
  • You are curious whether local on-device AI voice cloning is convincing enough to use professionally.
  • You want a step-by-step reference for the Speech Generator workflow you can follow in real projects.
SKIP IF…
  • You are not working in DaVinci Resolve — this workflow is entirely specific to Resolve 21's Speech Generator.
  • You need lip-sync matching — this tool only replaces audio, the mouth movement in the footage does not change.
TL;DR

The full version, fast.

DaVinci Resolve 21 includes a built-in AI Speech Generator that clones a person's voice from as little as 10 seconds of clean audio and lets you type a corrected line that plays back in their voice. The workflow has two stages: first export a WAV audio-only file from the Deliver page using in/out range, then use Timeline > AI Tools > Speech Generator in the Edit page, load the WAV as a Custom Voice model, type your corrected text, and click Generate. The result drops onto a new timeline track in 10-30 seconds. The one hard limitation is that lip movement in the footage does not update — cover the shot with B-roll so the audio-video mismatch is invisible.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:09

01 · Hook + demo + ethical disclaimer

Before/after audio demo of a flubbed groom vow fixed by voice cloning. Acknowledges the ethical risks of making anyone say anything and notes the tool runs entirely on-device with no cloud upload.

01:0902:32

02 · Sponsor: Crowdreel

Crowdreel is a guest photo/video gallery service for wedding videographers. QR code at reception, guests upload, couple gets the gallery the next day. $29/mo, coupon MATT30 for 30% off.

02:3203:35

03 · Why line-fixing matters

Real-world scenario: groom mispronounces vow, editor does not catch it until post. Voice cloning restores what was intended. Host frames this as ethical when you have the person's permission.

03:3504:38

04 · Step 1 — Export audio sample (WAV)

Mark in/out points around 10+ seconds of clean solo-voice audio. Go to Deliver page, uncheck Export Video, set format to WAV / Linear PCM / 48kHz / 24-bit, render In/Out range.

04:3806:22

05 · Step 2 — Speech Generator setup

Return to Edit page, position playhead at insert point. Open Timeline > AI Tools > Speech Generator. Type corrected line, select Custom Voice, load WAV file. Tune Speed, Variation, Pitch. Check Add to Timeline + New Track. Click Generate.

06:2207:02

06 · Result: before vs. after playback

Side-by-side comparison of original flubbed line vs. AI-generated clone. Host reaction: 'Dude, it sounds the same.' Notes the 1.6 GB one-time model download required on first use.

07:0207:53

07 · Limitation + outro CTA

Lip movement does not update with new audio — cover with B-roll. CTA: subscribe, playlist of Resolve tutorials, Edit Videos Like A Pro guide.

Atomic Insights

Lines worth screenshotting.

  • DaVinci Resolve 21's voice cloning runs entirely on your computer with no cloud server, no data upload, and no ongoing API cost.
  • You only need 10 seconds of clean, single-voice audio to create a convincing voice clone.
  • The first time you click Generate, Resolve downloads about 1.6 GB of model data — build that one-time wait into your first project using this tool.
  • Voice cloning does not alter lip movement — always plan to cover the replaced line with B-roll, not just the audio swap.
  • If there are multiple people speaking in your audio sample, the cloning tool will fail — isolation of one voice is required.
  • The Speech Generator places the result on a new timeline track automatically, so it never overwrites your original audio.
  • Generation ID numbers let you reuse the same voice character across multiple sessions without re-importing the audio sample.
  • The Speed, Variation, and Pitch sliders let you tune the generated voice to better match the original performance — do not skip them.
Takeaway

How to fix a flubbed line without a reshoot.

WHAT TO LEARN

DaVinci Resolve 21's AI Speech Generator clones a speaker's voice from 10 seconds of audio and lets you fix a mispoken line directly in the timeline with no outside tools and no cloud upload.

  • You only need 10 seconds of clean, single-voice audio to create a convincing voice clone — the shorter the sample requirement, the more rescue scenarios this tool becomes useful for.
  • Running entirely on-device means no audio ever leaves your machine, which matters when the footage is of a private individual at a wedding or interview.
  • The model download of about 1.6 GB only happens once; after that, generation takes 10-30 seconds per line regardless of file size.
  • Lip-sync does not update with the new audio — always plan B-roll coverage over any replaced line so the mouth-movement mismatch is never visible.
  • The Generation ID system lets you reuse the same voice character across multiple sessions, so a single audio sample can fix multiple mistakes in a long project.
Glossary

Terms worth knowing.

Speech Generator
DaVinci Resolve 21's built-in AI tool (found under Timeline > AI Tools) that generates a spoken line in a cloned voice from a text input and a short audio sample. All processing runs locally on the user's machine.
Custom Voice
A voice model option in Resolve's Speech Generator that uses a WAV file you provide as the basis for voice cloning, rather than a preset voice. Requires a clean, single-voice audio sample of at least 10 seconds.
Generation ID
A unique identifier assigned to each generated voice output in Resolve's Speech Generator. Saving this ID lets you reproduce the same voice character in future sessions.
Linear PCM
An uncompressed audio encoding format. The recommended export settings for the voice clone audio sample are WAV / Linear PCM / 48kHz / 24-bit depth.
In/Out range
A segment of the DaVinci Resolve timeline marked with I (in) and O (out) key presses. Setting an in/out range tells Resolve to export or act on only that portion of the timeline.
Resources

Things they pointed at.

Quotables

Lines you could clip.

00:35
You can now literally make anyone say anything, which I agree is terrifying.
Ethical gut-punch in six words — stops the scrollTikTok hook↗ Tweet quote
00:53
The AI technology behind this voice cloning feature is all completely run and handled on your computer. There's no servers or data center or water being wasted to use this technology.
Addresses the single biggest AI objection (environmental + privacy) in two sentencesIG reel cold open↗ Tweet quote
06:57
Dude, it sounds the same. Isn't that wild? This is so impressive.
Authentic reaction moment at the reveal — makes the demo feel real, not scriptedTikTok hook↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogy
00:00I'm about to show you how to clone any voice in DaVinci Resolve 21 just by typing what you want them to say in a text box. And all you need is at least 10 seconds of a person speaking to go from audio like this where they flubbed a line. I love you, Jillian.
00:13I can't wait to take you as wife. To this. I love you, Jillian.
00:17I can't wait to take you as my wife. As you can hear, this is really impressive and it only takes two minutes of work, which is way faster and easier than it was to do in the previous version of DaVinci Resolve, which I also made a tutorial about. And I'm sure there are certain people preparing to comment how terrifying this is too, either because you can now literally make anyone say anything, which I agree is terrifying, and which I also recommend that you be very careful whenever you're doing this, making sure that you have permission from a person to do it.
00:45In addition, another reason you may be terrified is that this is using AI, and a lot of creative people are understandably very against AI. If there's a bit of good news about this tool though, it's that the AI technology behind this voice cloning feature is all completely run and handled on your computer. There's no servers or data center or water being wasted to use this technology.
01:06It's all running locally on your machine, which I think is a good thing. Also, this video is sponsored by my friends at Crowdreel. If you want to make extra money with every wedding that you film, Crowdreel is a service that you can offer that is basically passive income.
01:20Here's how it works. You join Crowdreel and they give your wedding couple a QR code that they can print out and put around their wedding reception. Guests scan the code and then can upload photos and videos to the event that they take.
01:33And there are usually over 500 photos and videos that are uploaded per event. Crowdreel handles all of the hosting and you get to deliver a full gallery of all these photos and videos to the couple the day after their wedding, making them extremely happy and possibly even giving you access to some extra cell phone angles that the guests filmed that you can sprinkle throughout the full wedding video.
01:54For all this, Crowdreel starts at only 29 bucks per month complete white labeling, meaning that you can fully customize the branding, colors and website to match your brand. You can then charge your couples anywhere from 300 to 600 bucks or more for this service, easily making 10 times what you paid for it and it only takes you five to 10 minutes to set up and run.
02:14Like I said, it's basically passive income. You can check out Crowdreel at the link in the video description. And when you use that link, enter coupon code MAT30 for 30% off your first year of Crowdreel.
02:25If you want to learn more about Crowdreel, I also have a deep dive video on the service on my YouTube channel, which I will link to down in the video description as well. Now back to the tutorial. Now in this example, we're gonna be fixing a line that a person was speaking that was flubbed.
02:38This was a groom at a wedding reading his letter to the bride and he messed up the line. And this is a very common mistake that you're gonna run into whenever you're filming weddings or interviews or anything like that. You have someone you're filming who says a very important line, but they mess it up and you don't notice until you get to the editing and you basically have to edit around this messed up line, which can be very difficult to do.
02:59In an ideal world, the person would have just spoken the line correctly in the first place. Well, now with this voice cloning tool, and DaVinci Resolve, you can fix that line that they messed up, restoring the original content of what they were saying. This feels ethical to do, but please, I would still always recommend getting a person's permission whenever you do it.
03:18Now to use this tool, all you have to do is have your video project open with your video clip of the person saying the line that you want to change. And the first thing you're gonna want to do is select a portion of your timeline by pressing the I key at the start of the clip and the O key at the end of the clip, so that way your in and out points are created.
03:35And you want this clip to be at least 10 seconds of audio of the person speaking whose voice you want to clone. And you're gonna wanna make sure that their voice is the only voice that you can hear when you play back the video, because trust me, if you have multiple people speaking, this tool is gonna get very confused and it's not going to work.
03:53In this case, I have very clear audio of the groom speaking from a lav mic that he was wearing. You are then going to want to go over to the Deliver page in DaVinci Resolve, uncheck the Export video box, title your file, in this case we'll call it Groom Letter Demo, choose the location where you want to save it. And then under Audio here, you're gonna want to choose Wave for your format, linear PCM is great, 48,000 with a bit depth of 24.
04:19Then for Render, make sure that in slash out range is selected and go ahead and click Add to Render Queue and hit Render, which will take a few seconds and result in you having a fancy new wave audio file on your computer waiting for you to use it. With that done, that is the only ingredient that you need for this voice tool to work.
04:38So then you're gonna wanna go back to the Edit page in DaVinci Resolve, move your play head to right about where you want to begin cloning your voice, because that is where the newly generated audio file is going to go and let's hit play on this audio so we can hear how it sounds before we generate the speech. I love you Jillian, I can't wait to take you as wife.
04:57That sounds a little boring, doesn't it? Take you as wife, well, let's fix that. So I'm gonna go up here to Timeline, AI Tools, Speech Generator, and then for text, I'm going to type, I love you Jillian, I can't wait to take you as my wife.
05:15There we go, we gotta have the my, let's not capitalize either, might do something weird if we capitalize it, we're just gonna leave it lowercase. Then for a voice model, you have some presets here, but we're gonna wanna go down here to Custom Voice, because we're gonna be cloning a voice. You're then going to want to click Load and navigate to where you saved that wave audio file that's going to be used for this voice clone.
05:34Open that, and then there are more settings down here, like speed that will adjust how quickly the line is read, a variation slider that will literally vary how the audio sounds, and a pitch slider to make the voice deeper or higher. Anytime you generate something, it's gonna be given a generation ID, so if you find a voice that you like and you wanna keep using it, you can remember that generation ID number.
05:54Lastly, for file name, you can leave that set to the default and make sure this Add to Timeline box is checked, because that's gonna add this audio to your timeline right where your play head is at, and I would also have it use a new audio track for this generation, and it's gonna keep it from overwriting any other audio on your timeline.
06:10With that done, go ahead and click Generate, and the first time you click this button, an extra Download Manager window will pop up, telling you you need to download an additional 1.6 gigabytes of data for the AI speech generator to work. Go ahead and let that download, and then come back to this Speech Generate window and click Generate again, and in anywhere from 10 to 30 seconds, depending on the speed of your computer and the size of the audio file that you're using as a voice clone model, a new AI-generated voice clip will then appear on a new track right down at the bottom, and yeah, let's go down here and check it out.
06:44We'll play the original first so you can remember how it sounds. I love you, Jillian. I can't wait to take you as wife.
06:50Okay, now let's play the new voice clone. I love you, Jillian. I can't wait to take you as my wife.
06:57Dude, it sounds the same. Isn't that wild? This is so impressive.
07:02If there is one con to this tool, though, it's that the speech generation tool will not alter the person's lips when they're speaking to have them better match up with the new audio. So when you use this tool to fix a line that has a mistake, you may also want to cover up the footage with some B-roll as it's playing, and then no one will know.
07:18Now, this is just one tool of many that Blackmagic added to the latest version of DaVinci Resolve, and I have more tutorials coming showing you how to use all of the other new tools that Blackmagic has added as well. Please consider subscribing. If you want to see all those videos, I will link to a playlist of all of my Resolve tutorials down in the video description, as well as link to my edit videos like a Pro Guide, which is gonna show you the rules that I follow as a video editor to create better videos.
07:43This guide is gonna be super helpful to you, and I feel like I have to say this now, AI didn't write this guide. Me, a real human, wrote it. It's linked down in the video description for you to check out.
07:51Thanks so much for watching, and have a great day. (upbeat music)
The Hook

The bait, then the rug-pull.

A wedding groom reads his vow letter and drops the word 'my' — a mispronounced line no reshoot will fix. In this tutorial, Matt Johnson walks through DaVinci Resolve 21's built-in AI Speech Generator, showing how a single WAV export and a text box can replace any flubbed line with a convincingly cloned voice that lands on a new track in under 30 seconds.

Frameworks

Named ideas worth stealing.

03:35list

Voice Clone Workflow (2 stages)

  1. Stage 1 — Export: mark in/out on 10s+ clean solo audio, Deliver page WAV export at 48kHz/24-bit, In/Out range render
  2. Stage 2 — Generate: Edit page playhead at insert point, Timeline > AI Tools > Speech Generator, Custom Voice + load WAV, type corrected text, tune Speed/Variation/Pitch sliders, Add to Timeline on New Track, click Generate

Two-stage workflow for cloning a voice and inserting a corrected line into a Resolve timeline.

Steal forAny event or interview edit where a speaker messes up an important line that cannot be reshot
CTA Breakdown

How they asked for the click.

VERBAL ASK
07:29link
I will link to a playlist of all of my Resolve tutorials down in the video description, as well as link to my edit videos like a Pro Guide.

Soft triple CTA at end: subscribe, playlist of Resolve tutorials, own guide. Low friction, clearly sequenced.

MENTIONED ON CAMERA
Storyboard

Visual structure at a glance.

hook — Speech Generator UI preview
hookhook — Speech Generator UI preview00:01
demo footage — groom flub close-up
hookdemo footage — groom flub close-up00:13
title card — old way of voice cloning
contexttitle card — old way of voice cloning00:28
host talking-head — ethical disclaimer
contexthost talking-head — ethical disclaimer01:00
sponsor — Crowdreel step graphic
sponsorsponsor — Crowdreel step graphic01:49
timeline — in/out points on audio clip
tutorialtimeline — in/out points on audio clip03:35
Deliver page — export audio settings
tutorialDeliver page — export audio settings04:18
Speech Generator dialog — text input
tutorialSpeech Generator dialog — text input05:00
Speech Generator — Custom Voice + WAV loaded
tutorialSpeech Generator — Custom Voice + WAV loaded05:32
Speech Generator — New Track option
tutorialSpeech Generator — New Track option06:22
timeline — new AI-generated audio track
resulttimeline — new AI-generated audio track06:58
outro CTA — Edit Videos Like a Pro guide
ctaoutro CTA — Edit Videos Like a Pro guide07:36
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this