Modern Creator
Better Stack · YouTube

Developers Finally Got an Open-Source Voice AI Platform (Dograh)

A 6-minute dev tutorial reverse-engineering the open-source VAPI alternative that gives you visual workflow building, full observability, and self-hosting — without the platform tax.

Posted
1 months ago
Duration
Format
Tutorial
educational
Views
74.9K
2.4K likes
Big Idea

The argument in one line.

Dograh gives developers a self-hostable voice AI platform with visual workflow building and full observability, eliminating the platform fees and vendor lock-in of hosted services like VAPI.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • A developer building voice AI agents who currently pays platform fees on VAPI, Bland, or Retell and wants to reduce costs through self-hosting.
  • An engineer managing live voice AI systems in production who needs call-level observability — transcripts, traces, tool call logs, and recordings — to debug agent failures.
  • A backend developer comfortable with Docker and GitHub who wants to build voice workflows visually without writing orchestration glue code.
  • A team lead evaluating voice AI infrastructure who needs full system control, provider flexibility (LLM, TTS, STT swaps), and the ability to self-host on your own infrastructure.
SKIP IF…
  • You're a non-technical founder or product manager looking for a plug-and-play voice AI solution — this requires Docker, GitHub, and comfort in the terminal.
  • You need production-grade support, SLAs, and vendor accountability — this is open-source software you'll maintain and troubleshoot yourself.
  • Your primary goal is rapid prototyping with minimal setup — the self-hosting and infrastructure work upfront adds friction compared to managed platforms.
TL;DR

The full version, fast.

Voice AI agents look simple on paper but break in production because real calls involve interruptions, silence, tool calls, and provider fees stacked on top of LLM, TTS, and telephony costs � and hosted platforms like VAPI, Bland, and Retell leave you without ownership or visibility when things fail. Dograh is an open-source, self-hostable alternative that bundles three layers usually duct-taped together: a voice engine connecting telephony, STT, LLM, and TTS; a visual workflow builder for mapping prompts, branches, API calls, and human transfers without orchestration code; and a platform layer with tracing, recordings, tool-call logs, and analytics. You bring your own providers, inspect the code, and swap models when pricing shifts � control hosted platforms cannot offer.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:25

01 · Stop Renting Your Voice AI Stack

Hook: stacked fees and no ownership. Sets up the core developer pain before the product is named.

00:2600:59

02 · Why AI Phone Agents Get Expensive Fast

Animated pipeline diagram (phone call to STT to LLM to TTS). Looks simple from the outside — reality is messier.

01:0001:28

03 · Voice AI Is Not Just ChatGPT With a Phone Number

Real calls: interruptions, silences, topic pivots, weird questions. When it breaks, the bot gave a bad answer is not enough.

01:2901:56

04 · Dograh Demo: Build a Voice AI Agent Locally

Clone GitHub then cd then docker compose up. Docker-first as a developer credibility signal.

01:5703:35

05 · Creating a Lead Qualification AI Phone Agent

Visual workflow builder: prompt node, qualification step, API tool call, branch, transfer. Live test call with AI agent Sarah. Post-call observability: transcript, trace, tool call log, recording.

03:3604:10

06 · What Is Dograh?

Three things: Voice Engine plus Visual Workflow Builder plus Platform Layer (testing, tracing, recordings, analytics).

04:1104:34

07 · Voice AI Agent Workflow

Animated: Map the flow. Skip the boilerplate. BYOP — bring your own LLM and TTS providers.

04:3504:46

08 · Testing, Tracing, Recordings, Analytics

Open source means inspect, change, self-host. Low GitHub stars signals an early-stage find.

04:4705:22

09 · VAPI, Bland, Retell: Fast but Less Control

Hosted platforms move fast but pricing, limits, and deployment options are out of your hands.

05:2305:49

10 · Pipecat and Vocode: Flexible but More Glue

Raw frameworks give control but require building everything — no UI, no workflow editor.

05:5006:25

11 · Where Dograh Fits for Devs

Write code where code matters, use the builder where your flow matters. Subscribe CTA.

Atomic Insights

Lines worth screenshotting.

  • Voice AI agents are not chat with a phone number — they are live systems with speech-to-text, LLM, text-to-speech, state management, tool calls, and real-time interruption handling.
  • When a voice call fails, 'the bot gave a bad answer' is not enough — you need a trace, a recording, and tool call logs to know whether the problem was the prompt, the model, or the handoff.
  • Dograh's visual workflow builder lets you design branch logic, API tool calls, and transfer conditions without writing orchestration code.
  • Self-hosting a voice AI platform eliminates the stacked fees of paying separately for LLM, voice provider, phone call, and platform layer.
  • Bring-your-own-provider means swapping LLM or TTS vendors when costs change, without rebuilding your workflow logic.
  • The three paths to voice AI are: hosted platforms (fast, less control), raw frameworks (full control, no UI), and Dograh (visual builder plus self-hosting).
  • A Docker compose up spin-up is the right first signal that a tool is built for developers — it means local testing before any cloud commitment.
  • Live trace, call recording, and state change logs are not nice-to-haves — they are the minimum required to debug a production voice agent.
  • VAPI, Bland, and Retell are best when you need speed and clean dashboards; the tradeoff is pricing lock-in and limited deployment control.
  • A visual no-code canvas for voice agents is valuable not because it removes code but because it eliminates the orchestration glue code that ties everything together.
  • Open-source voice AI infrastructure with low GitHub stars is a buying signal for early adopters — the architecture is inspectable before the community validates it.
  • Writing code where code matters and using the builder where flow matters is the correct division of labor in an agentic voice pipeline.
Takeaway

Steal the format.

Better Stack playbook

Pain hook then problem depth then live demo with observability layer then landscape positioning — this is a repeatable formula for any dev tool reveal.

  • Open with the financial or control pain, not the product name — let the problem breathe for 20+ seconds before the solution appears.
  • Show Docker first if your audience is developers — it is a credibility signal, not a friction warning.
  • The demo must include failure-state tooling (trace, logs, recording) — showing only the happy path reads as marketing, not engineering.
  • Use a named three-tier landscape comparison to position against both over-controlled and under-controlled alternatives.
  • The title formula Developers Finally Got [category] [tool name] signals arrival and relief — test it for JoeFlow or any tool reveal.
  • Subscribe mid-roll at ~1:26 (after pain is established, before demo) is well-timed — viewers are engaged but not yet at peak value delivery.
Glossary

Terms worth knowing.

Voice AI agent
An automated phone or voice-chat system that listens to a caller, understands them, and responds in spoken language using AI models for speech recognition, reasoning, and speech synthesis.
Self-hosting
Running software on your own servers or infrastructure rather than using a vendor's cloud, giving you full control over data, configuration, and costs at the price of managing the system yourself.
VAPI
A hosted commercial platform for building voice AI agents that handles the phone, speech, and model plumbing in exchange for usage fees and platform lock-in.
Bland
A hosted voice AI platform similar to VAPI that lets developers spin up phone agents through APIs without managing the underlying infrastructure.
Retell
Another hosted voice AI provider that gives developers managed APIs and dashboards for building phone agents, competing with VAPI and Bland.
Dograh
An open-source, self-hostable voice AI platform that bundles a visual workflow builder, observability tools, and provider flexibility as an alternative to hosted services like VAPI.
Speech-to-text (STT)
Technology that converts spoken audio from a caller into written text the rest of the system can process.
Text-to-speech (TTS)
Technology that takes written text from a language model and synthesizes it into spoken audio the caller hears.
LLM
Large Language Model — an AI model like GPT or Claude that takes text in and produces text out, used here to decide what the agent should say next.
Tool call
When an AI agent invokes an external function or API mid-conversation, such as creating a CRM record or looking up data, instead of just generating text.
Trace
A step-by-step record of everything that happened inside a single call or AI run — prompts, model responses, tool calls, and state changes — used to debug why a system behaved a certain way.
Observability
The ability to see inside a running system through logs, traces, recordings, and metrics so you can diagnose failures and understand behavior rather than guessing.
Docker
A tool that packages software and its dependencies into containers so it runs the same way on any machine, making local setup as simple as one command.
Docker Compose
A command that starts multiple Docker containers together based on a config file, useful for spinning up an app that needs a database, backend, and frontend all at once.
Lead qualification agent
An automated assistant that asks an inbound caller standard questions — company, size, budget, intent — to decide whether they're a real sales prospect before routing them to a human.
Visual workflow builder
A drag-and-drop canvas where you design an agent's logic as connected nodes — prompts, branches, API calls, transfers — instead of hard-coding every step.
Branch
A decision point in a workflow where the agent takes one path or another based on the caller's answer or a tool's result.
Pipecat
An open-source Python framework for building real-time voice and multimodal AI agents by wiring together speech, language, and audio components in code.
Vocode
An open-source framework for building voice-based LLM applications, giving developers low-level control over the audio and conversation pipeline.
LiveKit
An open-source real-time audio and video infrastructure platform often used as the transport layer underneath voice AI agents.
Resources

Things they pointed at.

04:47productVAPI
04:47productBland AI
04:47productRetell AI
05:23productPipecat
05:27productVocode
05:29productLiveKit
Quotables

Lines you could clip.

00:00
That's not even the worst part. The worst part, you still don't really even own the system.
Strong emotional escalation — sets up pain before the product is namedTikTok hook↗ Tweet quote
01:08
A voice agent is not just ChatGPT with a phone number, it is a live system with a bunch of moving parts.
Debunks a naive assumption developers actually holdIG reel cold open↗ Tweet quote
02:48
The value is not no code. The value is not wasting code trying to tie everything together.
Tight reframe of what no-code means for developersnewsletter pull-quote↗ Tweet quote
05:57
Write code where code matters, use the builder where your flow matters, inspect the runtime when things break, and swap providers when costs change.
Four-part maxim — quotable thesis statement for the whole videoIG reel cold open↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

analogy
00:00You just built a voice AI agent, it works, then the bill shows up and you're paying for the LLM, the voice, the phone call, and then another platform fee on top of that. That's not even the worst part. The worst part, you still don't really even own the system.
00:14Today, I'll show you Dobre and an open source VAPI alternative you can self host, inspect, and control.
00:26Voice AI nowadays can look somewhat simple from the outside. Take a phone call, turn speech into text, send it to the LLM, turn the answer back into speech, it's done.
00:37That's easy. Right? Well, as any of us know who've tried this, not really because real calls are messy.
00:44People interrupt, people go silent, they're gonna change topics, they can ask really weird questions. Your agent needs to call APIs and when it breaks, you need to know why.
00:56That is where most voice AI projects become more of a pain. A voice agent is not just chat GPT with a phone number, it is a live system with a bunch of moving parts, right? That's speech to text, LLM, text to speech, state, tool calls, a boatload of other things, you get it.
01:12There's a lot of moving parts that we don't actually see happening. And when the call fails, the bot gave a bad answer is not enough. Was it the prompt?
01:20Was it the model? What was it? Why did it fail?
01:23And this is where Dogebra comes in. If you enjoy coding tools that speed up your workflow, be sure to subscribe. We have videos coming out all the time.
01:31Alright. Now, let's look at this in practice. I'm gonna start locally because if a tool says it's built for devs, I wanna see Docker before anything else, this was super easy to spin up.
01:40I'm gonna clone it from GitHub, I'm gonna c d into the folder and then I just have to run Docker compose up. That's simple enough, easy enough for us. Once the containers are running, we can jump into the Dobre UI.
01:53Now I'll build a simple lead qualification agent. So what do I mean by that? Someone's gonna call in, the agent asks what they wanna build, then it asks about the company, the size, the budget, small things like this.
02:06It'll then call an API tool to create or update a CRM lead if we embed that. And maybe I could even say if the lead was qualified, it transfers to a human. So I add a prompt node, then a qualification step, then an API tool call and then I can add a branch and a transfer.
02:28There's no custom orchestration code yet and that's kind of the point here. This looks like a no code canvas but for devs and the value is not no code. The value is not wasting code trying to tie everything together.
02:41Now, let's try to run a test call here.
02:44Hi. This is Sarah from inbound calls. Are you still there?
02:49We're looking for an AI phone agent for inbound demo requests.
02:55That's great. I can definitely help you with that. To make sure I connect you with the right solution, could you tell me a little more about what you're looking to achieve with an AI phone agent for your inbound demo requests?
03:06Let's say around twenty thousand minutes. Thanks for sharing that. And what is your company size and industry?
03:12Now we can see the transcript here. We can see the trace. We can see the tool call that actually happened and we can see the state changes.
03:22Plus here's the recording which I wanted in the first place and that is what I want as a dev, not just the bot worked, I wanna know why it worked, when it fails, I want evidence of this actually happening. So what is Doga? Doga appears to give us three different things out of all this.
03:40We get a voice agent, a visual workflow builder and the platform layer you usually have to build yourself. The voice engine is the part that connects the caller, the phone provider, speech to text, the LLM, and text to speech. That is what makes the call actually happen.
03:57The workflow builder is where you design the logic of this whole system, so instead of hard coding every prompt, branch, API call and transfer, you can map out the flow visually. So huge win here, I like these kind of maps.
04:10Ask this question, wait for the answer. That's kind of what we're mapping out here.
04:14I can call this API branch here, transfer there, that kind of logic should be easy to change. Then to all this, there's the platform layer, testing, tracing, recordings, analytics, that is the boring stuff every series voice project eventually needs.
04:30With all this, you can bring your own providers, your own LLM and your own TTS. Because Doga is open source, you can inspect the code, change how it works and self host it. As of this recording, GitHub stars are low.
04:42So this is a super new find that I found but it's honestly a rather cool one. Now let's compare Doga to other things we already have out here. You have three main ways to build voice agents.
04:53First is hosted platforms, VAPI, Bland, Retail. These are good when you wanna move fast and you don't wanna run infrastructure. You get clean dashboards, APIs, transcript, testing tools, all that's really useful, but you start to lose control right there.
05:07If the platform changes pricings, you deal with it. If the platform changes limits, deal with it.
05:14Right? If you need custom deployment, anything like that, again, you might hit a wall.
05:19Hosted tools are fast though, so I guess that's a win. You have some of these raw frameworks like I came across PipeCap, Vocode, LiveKit I think is one of them.
05:30These give you a lot more control, you can build almost anything. But now, you're building everything around this framework, off UI workflow editor, so that's a big trade off using things like that.
05:42Now, Doga is still way too new but it's here, so I think their bet is kinda simple. What if you could use a visual voice agent builder without giving up the self hosting, choosing a provider, tracing and control? That's what this appears to be.
05:58Write code where code matters, use the builder where your flow matters, inspect the run time when things break, and swap providers when costs change. Self hosting gives us a lot of control which is huge. VAPI bland retail are best for fast hosted deployment, but the trade off cost lock in and less control.
06:19If you enjoy coding tools like this, be sure to subscribe to the BetterStack channel. We'll see you in another video.
The Hook

The bait, then the rug-pull.

You shipped a voice AI agent. It worked. Then the bill arrived — LLM, STT, TTS, telephony, platform fee — stacked four layers deep. That is the problem Dograh is trying to solve, and Better Stack walks through the entire platform in under seven minutes: from Docker spin-up to live test call to a landscape comparison that names every major competitor by name.

Frameworks

Named ideas worth stealing.

04:47model

The Three-Tier Voice AI Landscape

  1. Hosted platforms (VAPI, Bland, Retell) — fast, locked in
  2. Raw frameworks (Pipecat, Vocode, LiveKit) — flexible, high glue
  3. Open-source platforms (Dograh) — builder UX plus self-hosting plus observability

A positioning triangle for any developer tool category: speed vs. control vs. ownership.

Steal forPositioning JoeFlow against SaaS alternatives — replace the three categories with your own tier labels
03:36list

The Three Things Product Explanation

  1. Voice Engine
  2. Visual Workflow Builder
  3. Platform Layer

Dograh reduces to three named components, each solving a distinct layer of the problem.

Steal forAny product with multiple components — lead with the three nouns, then expand each one
CTA Breakdown

How they asked for the click.

VERBAL ASK
06:10subscribe
If you enjoy coding tools like this, be sure to subscribe to the BetterStack channel. We will see you in another video.

Clean verbal close with on-screen SUBSCRIBED animation. Mid-roll subscribe ask also appears at ~1:26. No product upsell or link CTA in closing.

MENTIONED ON CAMERA
04:47productVAPI
04:47productBland AI
04:47productRetell AI
05:23productPipecat
05:27productVocode
05:29productLiveKit
Storyboard

Visual structure at a glance.

hook
hookhook00:00
pipeline
promisepipeline00:26
docker demo
valuedocker demo01:29
agent build
valueagent build01:57
what is it
valuewhat is it03:36
comparison
valuecomparison04:47
cta
ctacta06:10
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this