Modern Creator
Brendan Jowett · YouTube

Build A Website AI Voice Agent In 10 Minutes

How xAI’s self-describing Agent Instructions let a vibe-coder ship a page-navigating website voice assistant in a single Replit session.

Posted
4 days ago
Duration
Format
Tutorial
educational
Views
1.2K
64 likes
Big Idea

The argument in one line.

xAI’s Grok Voice Think Fast 1.0 ships a self-describing implementation guide that any AI coding agent can read and execute autonomously, collapsing the entire voice API setup into a single copy-paste step.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You want to add a conversational, voice-first assistant to a business website without writing backend code.
  • You are evaluating real-time speech-to-speech models and want a side-by-side comparison of Grok, Gemini, and OpenAI options.
  • You use Replit or similar vibe-coding platforms and want a concrete example of what a single plain-English prompt can produce.
  • You are building AI agency demos and need a fast, visually impressive proof-of-concept for clients.
SKIP IF…
  • You need production-grade error handling, custom branding, or tight CMS integration — this is a proof-of-concept workflow, not a production architecture.
  • You are already comfortable with real-time voice APIs and WebSocket-level implementation details.
TL;DR

The full version, fast.

xAI’s Grok Voice Think Fast 1.0 API console ships an Agent Instructions template — a self-describing setup guide any AI coding agent can read and execute without the human touching a single config parameter. Copy those instructions, paste them into Replit’s agent alongside a plain-English site brief, and the vibe-coding platform builds and deploys the full site with a working voice widget in one shot. The resulting agent navigates between pages in real time while holding a natural conversation. Minor rough edges — transcription text in the widget, page-name echo on navigation, voice ID confusion — are each resolved with a follow-up natural-language prompt, no code editing required.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:00

01 · Demo first

Live voice agent demonstrated navigating a multi-page website via spoken commands. End state shown before tutorial begins.

01:0002:31

02 · Stack overview

Single provider: xAI Grok Voice Think Fast 1.0. Real-time voice model leaderboard comparison. Grok chosen for ease of install.

02:3102:58

03 · The key unlock: Agent Instructions

xAI console ships a copy-pasteable Agent Instructions block in the Implement tab. Pasting it into any AI agent eliminates all manual voice API configuration.

02:5805:08

04 · Replit intro and prompting

Replit introduced as the vibe-coding and hosting platform. Full spoken prompt dictated: multi-page construction firm site with bottom-right voice widget and page navigation.

05:0806:27

05 · First build result

BuildCraft site rendered live. Grok auto-generated portfolio images. First voice agent test: contact page navigation confirmed working.

06:2707:27

06 · Iteration 1: remove transcription text

Widget transcription text swapped for animated audio waveform via follow-up prompt. Cancellation error noted.

07:2709:19

07 · Second test and iteration 2

Waveform widget working. Services navigation clean. Issue: agent reads page name aloud on transition. Follow-up prompt requests male voice and silent navigation.

09:1910:20

08 · Third test and voice ID reality check

Orion voice not found in xAI console (Replit agent hallucinated the ID). Nav echo appears fixed. Contact navigation confirmed cleaner.

10:2011:13

09 · Wrap and CTA

Summary of what was built. Replit affiliate link. Subscribe pitch: 90% of viewers are not subscribed.

Atomic Insights

Lines worth screenshotting.

  • xAI ships an Agent Instructions template inside its API playground that any AI coding agent can read and execute, eliminating all manual voice model configuration.
  • A website voice agent that navigates pages in real time is more valuable than a chatbot because users never have to stop speaking to read a response.
  • The entire Grok voice agent setup is a copy-paste: one text block from xAI’s console, pasted into the coding agent alongside your project brief.
  • Replit’s built-in cloud hosting means a vibe-coded site with an embedded voice agent goes live without touching servers, GitHub, or deployment pipelines.
  • Grok auto-generated portfolio images using the xAI API key embedded in Replit — the image generation happened without an explicit prompt for it.
  • Vibe-coding deploy loops let you fix UI issues with follow-up natural-language prompts rather than code edits.
  • Real-time voice models are production-viable today, but ease-of-install varies significantly — Grok wins on zero-config among the options compared.
  • Voice IDs must be verified against the provider’s console: vibe-coding agents will hallucinate plausible-sounding but nonexistent voice names.
  • Showing the broken version before the fix adds credibility that a polished single-take demo cannot.
  • Website voice agents can be extended to take actions (add to cart, submit forms) using the same real-time model, not just answer questions.
Takeaway

You can ship a website voice agent without touching code.

WHAT TO LEARN

When an AI provider ships a self-describing setup guide, pasting it into a vibe-coding agent is the entire integration step — configuration disappears.

  • xAI’s Grok voice API console includes an Agent Instructions template that an AI coding agent can read and execute autonomously, removing all manual model and voice configuration.
  • Real-time speech-to-speech models are categorically different from chatbots: the user never stops speaking to read a response, making the experience feel conversational rather than transactional.
  • Vibe-coding platforms collapse the build-test-fix cycle into natural-language prompts: each bug or UI issue is resolved by describing the desired outcome, not by editing code.
  • Voice agent rough edges — transcription noise, page-name echo on navigation, wrong voice ID — are expected in a first-pass vibe-coded build and each takes a single follow-up prompt to address.
  • Voice IDs must be verified against the provider’s own console before trusting a vibe-coding agent’s choice: plausible-sounding but nonexistent names will be hallucinated.
  • A website voice agent that navigates pages on spoken command is a stronger AI agency demo than a chatbot because the navigation action is visible and immediately impressive to non-technical clients.
Glossary

Terms worth knowing.

Grok Voice Think Fast 1.0
xAI’s flagship real-time speech-to-speech voice model. Processes spoken input and returns spoken output with low latency, designed for agentic and multi-step conversational workflows.
Real-time voice model
A model that ingests live audio and returns audio directly without an intermediate text step, enabling fluid conversation rather than the request-response pattern of text chatbots.
Agent Instructions
A copy-pasteable setup guide xAI provides inside the Grok voice playground Implement tab. When pasted into an AI coding agent, it configures the voice model autonomously with no manual parameters.
Vibe coding
Building software by describing what you want in plain English to an AI agent, which writes and runs the code. No manual coding or deployment required.
Replit
A cloud-based vibe-coding and hosting platform. Users describe what they want to build in natural language; Replit’s AI agent generates, runs, and publishes the application without any local setup.
Speech-to-speech
An AI pipeline where audio input is processed and audio output is returned directly, bypassing text transcription and synthesis as separate steps. Produces more natural conversational flow than STT + LLM + TTS chains.
Resources

Things they pointed at.

01:45productGemini 3.1 Flash Live
02:58toolReplit
Quotables

Lines you could clip.

01:07
It’s much more conversational than a chatbot. We don’t have to sit there chatting to it and reading the responses. It just talks straight back to us.
Clean chatbot-vs-voice-agent positioning, standalone punchline, no setup neededTikTok hook↗ Tweet quote
02:48
We literally don’t have to do any of the config ourselves. We don’t have to tell it which model to use or how to set it up. We literally just paste these instructions for agent.
The core unlock of the video distilled to three sentencesIG reel cold open↗ Tweet quote
03:55
If you are somebody that isn’t technical, this is gonna be the easiest way to go about this.
Audience filter and promise in one sentencenewsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

00:00Today, I'm gonna show you how to build a Website AI voice agent just like this one right here, which is sitting in the bottom right corner of this website, and I'm gonna be able to ask this voice agent anything about all of the content on this website, any questions that I have. It's also got the complete capability to navigate throughout this entire website.
00:17So if I wanted to ask it, uh, about the contact section, it's gonna take me to the contact section. Hey. How would I get in contact with you guys?
00:26You can reach us through our contact page. I've brought you there now. What's on your mind?
00:31A project in mind or just a quick question?
00:34Could you actually show me a little bit more about your services first?
00:38Sure. I'll take you to the services page this page where you can see the full picture.
00:44Anything specific you're curious about? Can you take me to your portfolio now?
00:49Sure. Head into the portfolio. Let me know if you want to dive into any particular project.
00:55Or if you have a certain style in mind, I can point you to something similar.
01:00And there we have it. I was able to ask it those questions. It navigated throughout the web page.
01:04I could obviously ask it more in-depth questions if I wanted to actually have a conversation with it as well. But, ultimately, the goal is to have a really awesome personal assistant on the website to be able to navigate. It's much more conversational than a chatbot.
01:16We don't have to sit there chatting to it and reading the responses. It just talks straight back to us, and we can sort of just flow through this web page. We can also build quite a lot on top of this as well if we wanted this to take action, add items to our cart, buy things for us, the whole lot, we can really do as much as we want, and it's probably a lot simpler than you think to get this system up and running.
01:36So for the stack that we're using to build this, we're actually only using one provider, and that is gonna be x a i's Grok Voice Think Fast one point o model. Ultimately, this is just a speech to speech real time model that we're gonna be able to add to our website. Now I've played around with the OpenAI speech to speech models and the Gemini speech to speech models as well.
01:54And just from what I personally found, this Grok model is, to be honest, the easiest to install, and I'll get into that in just a minute. But it's also definitely one of the faster models as well. The Gemini models are also pretty good.
02:04They've actually got a comparison here, I think, as well. You can have a look at the leaderboard and some of the and some of the stats that they're showing on their benchmarks here. So if you don't wanna use the Grok model, the Gemini 3.1 flash live model, I would definitely say is the second best, very similar to what they're showing here.
02:18But really all of these real time models are pretty good. If you are familiar with the models that we use for building out phone systems, we don't use any of these real time models usually when we're building out a phone system. But also super conveniently, one thing that I'd really do like about this Grok model is that on their API console here, if we go into API, we go into voice.
02:37There's a playground here if you wanna test out the different voices. What I do really like is that in the top right here, you'll see implement. If we click into this, you'll see that there is some code as well as some agent instructions.
02:48And so literally, all that we need to do is just copy these agent instructions. You can see here, this is a set of instructions for the x AI voice agent real time API guide.
02:58So this is a specific guide that we give to any of our AI to be able to read through this to understand exactly what the setup process is for this real time voice model. So we literally don't have to do any of the config ourselves. We don't have to tell it which model to use or how to set it up.
03:14We literally just paste these instructions for agent, and it's able to read through this and understand exactly what it needs to do. So just go ahead and copy this instruction set and we're good to go. Now when it comes to getting the website set up as well as the voice agent on the website, the platform that I use to build out this website and host it live on the Internet is a platform called Replit.
03:33If you're not familiar with Replit, ultimately, it's a vibe coding platform. Just through plain English, I'm gonna go ahead and tell it to build out this website and build out this voice agent connected into it. And the real advantage of Replit is that they've got cloud hosting built into their platform.
03:47So as soon as we build this website, as soon as we build the voice agent, we're gonna be able to deploy and publish this to a live website. We're not gonna have to do any sort of manual configuration of sending up some servers or pushing it off to GitHub or anything like that. So if you are somebody that isn't technical, this is gonna be the easiest way to go about this.
04:03There's a lot of other benefits as well in terms of data and security if we ever have our website crash or our application crash that we build through Replit because they're storing a lot of that data themselves through databases that are natively built into this app. Replit has a more secure database, so there is a certain amount of time after your data might get lost that you can actually recover it through Replit.
04:22Whereas locally building an application or website yourself doesn't have any of that security and you are running a pretty big risk. Anyways, I'm gonna jump in. I'm gonna tell it to go ahead and build me the website and then I'm gonna provide it with the instructions for the Grok voice model and we'll see what happens.
04:34Hey. Are you able to build me a website for a house building firm? I want you to build a website in which has multiple different pages for showcasing their portfolio and their services and maybe a contact page as well.
04:45On this website, I wanna have in the bottom right corner a voice agent in which any user can come and click into and talk to about any of the services that this business has, and also I want it to be interactive. So I want it to be able to navigate through the different pages on our website. So if somebody wants to ask about getting in contact, it will automatically move itself to the contact page as they're speaking conversationally with the voice agent.
05:09I've also now provided you with a set of instructions on how to get the Grok voice agent real time model setup. So please follow along with the instructions that I've provided for the setup of that voice agent. Alright.
05:20I've just gone ahead and spoke directly to my computer. I'm just using a speech to text tool right there. Just make sure to come to the agent instructions, copy this, jump back to Replit.
05:28If we paste it in, you'll see at the bottom, it's now pasted this right here, which is just the instruction set. It's pasted in as text because it's quite big. But just make sure to tell it to reference what you've actually pasted here so it's got that context.
05:40I'm gonna go ahead and send this off and we'll see what happens. Alright. So that prompt has gone through and completed.
05:45I haven't touched anything after that prompt. I've just moved myself so you can see the bottom left and bottom right. Scroll through the page, can see this is the website that it just created.
05:53Obviously, a lot more that I could do in terms of design when it comes to this website. There's a lot more that I could have done to prompt it. I didn't give it much of a actual brand kit to work off of, but good enough for now to demonstrate exactly what building with the voice agent.
06:05And if I go ahead and click on the portfolio part of this website, I scroll down a bit. So these images have actually also been generated by Grok. I was actually looking at the process that it was going through.
06:14I could see that it actually used my x AI, uh, API key in order to generate these images. So, uh, obviously, got some nice images that we could obviously use to to start us off. But, ultimately, let's see if this voice agent actually works.
06:26Hey. How are you?
06:29I'm doing great. Thanks. How can I help you today with your building or renovation plans?
06:34Feel free to ask about our services, portfolio, or how to get in touch.
06:39Okay. No worries. Are you able to get me to the contact page?
06:43Sure. I'd be happy to help you get in touch. There you go.
06:48You're now on the contact page. If you need anything else, just let me know.
06:54Okay. So there are definitely some things that we need to fix here. But for the most part, it is working, and it was actually able to navigate us over to this contact page as well, which is pretty cool.
07:02One thing that I would change is that I probably don't need this transcription happening in the voice agent In the voice agent chat box itself, it's not necessary for me to actually read this out. I don't think like my previous example, I'd probably rather just have kinda like a voice icon. So it's as simple as just jumping into Replic here and just telling it to make that update, and it's gonna make that change for us.
07:20Hey. Are you able to remove the transcription text from myself and from the AI? Just add a sort of generic, um, audio, uh, graphic instead of having text actually read out, um, on the actual chatbot itself.
07:33I also see there's an error that says cancellation failed. No active response found. Uh, the voice agent seems to work.
07:38Not sure what that error is about if you could take a look at that. And now I'll go ahead and send it off, and hopefully, that will make some updates. Alright.
07:45So that is now finished up. We can see that the voice agent is looking a bit different. It's got graphic for the audio instead of the text conversation, so I'll give it a test.
07:53Hey. How are you?
07:56I'm doing great. Thanks. How can I help you today with your home building or renovation needs?
08:01If you're looking for custom home building, renovations,
08:04or anything else, just let me know. Perfect. Yeah.
08:07I just want to learn a bit more about your services.
08:10I'd be happy to help you explore our services. Well, I've navigated to our services page for you. There, you'll find details on our key offerings,
08:19including custom home building from initial concept to final Okay. Working much better. Doesn't have all the text on the page, and it seems like it was able to navigate me quite quickly and respond back to me.
08:29Couple things I probably would change. Seems like when we get it to navigate to a different page, reads out the name of that page. As you can see, it kinda messes up and says portfolio, um, you know, an extra time than it should.
08:38So, um, anyways, seems to be working quite well. I should now be able to also ask a general questions about the properties, about the services, whatever, and be able to have a natural conversation with the website. So working pretty well.
08:48And because this is connected to that Grok voice agent API, we should be able to change the voice as well. There's a few different voices. We can head over to the console to check out the different voices that they have.
08:57I could just tell it to change it to a male voice if I wanted to, and that might work. It should be able to find that voice ID and just start using that. Hey.
09:04Are you able to swap over the voice agent voice to a male voice? And also when it is changing pages, it seems to read out the name of that page or there's like a little bit of a glitch when it does that transition. So if you could look into that and make that update to it just stays as a seamless conversation, um, you know, when you're navigating between these pages, that'd be great.
09:22So I'll send that off very quickly. It should be able to easily fix that for us, and, uh, we should be good to go. Alright.
09:28So those updates should be made. Let's give it a try. Hey.
09:30How are you?
09:33I'm doing great. Thanks. How can I help you today with your building project?
09:37Alright. So it seems like that voice is not a male voice. Not sure what happened there.
09:41I might have to head over to the x AI console here and actually have a look at the different voices and just see, you know, which one is the actual male voice. Looks like it says on repl dot it swapped it over to the Orion voice. If have it over here, we can see that the Orion voice, if I search it up, doesn't actually come up with an Orion voice.
09:58I'm not sure which one it managed to swap to here. It seems to have switched to an American accent. So anyways, just have to look at the specific name of the voice and then you can swap it over, whatever.
10:07And it seems like the animation for navigating pages supposedly is fixed as well. Hey. Are you able to quickly navigate me to the contact page?
10:15Sure. I'll take you there right away.
10:20Okay. That looks a bit better. It didn't repeat the name of the page.
10:23So, anyways, looks like we're pretty much done here. We've a voice agent that's able to navigate the entire page, answer any questions that I have, and have a natural conversation with to really help out the user. Once again, I'll have a replet linked down below in the description and pinned at the top of the comments if you do wanna check replet out for building something like this.
10:38If you are gonna be building a brand new website or building a website with a voice agent like this, Replit's really good because you can just host it right away, so I would recommend it. It really does just save you the time and money from having to go ahead and get external services in order to do that. And then, obviously, the backup and everything like that is pretty helpful as well.
10:55Just for my last note, I've recently discovered that 90% of you watching right now are actually not subscribed to the channel at all, which is pretty insane. So make sure you're subscribed. Just scroll down below and click on the subscribe button.
11:05I've got a lot more great content just like this coming out, especially voice agent related. So if you don't wanna miss that, you're gonna be one of the first people to know about it if you do subscribe.
The Hook

The bait, then the rug-pull.

The demo runs before the explanation. A voice agent is already live in the corner of a real website, navigating from contact to services to portfolio on spoken command — and only then does the tutorial begin.

Frameworks

Named ideas worth stealing.

02:31concept

Agent Instructions as zero-config setup

xAI ships a self-documenting implementation guide inside their API playground Implement tab. Any AI coding agent can read it and configure the voice model autonomously.

Steal forAny tutorial where you need to onboard an AI agent to a specific API without manual configuration
02:58model

Vibe-coding deploy loop

  1. Prompt
  2. Build
  3. Test
  4. Follow-up prompt for each fix

Iterative build cycle using only natural-language prompts. No direct code editing. Each problem solved by describing the desired outcome to the agent.

Steal forAI agency demos, rapid prototyping, client-facing build walkthroughs
CTA Breakdown

How they asked for the click.

VERBAL ASK
10:50subscribe
90% of you watching right now are actually not subscribed to the channel at all, which is pretty insane. So make sure you're subscribed.

Stat-based guilt play at end of video. Replit affiliate link also dropped in description and pinned comment.

MENTIONED ON CAMERA
FROM THE DESCRIPTION
Storyboard

Visual structure at a glance.

open
hookopen00:00
demo site
hookdemo site00:13
Grok announcement page
valueGrok announcement page01:31
Agent Instructions
valueAgent Instructions02:31
Replit dashboard
valueReplit dashboard03:36
BuildCraft live
valueBuildCraft live05:43
voice agent first test
valuevoice agent first test06:30
waveform widget
valuewaveform widget08:00
wrap
ctawrap10:20
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this