Hermes Agent Just Got a Desktop App (Anyone Can Run It Now)
A 14-minute walkthrough of the Noose desktop installer that finally lets non-technical users run one of the most capable open-source AI agents without touching a terminal.
June 6thA 15-minute setup guide for Voicebox, the free open-source app that replaces both ElevenLabs and Wispr Flow without a subscription.
One free, offline app covers both ElevenLabs-style voice cloning and Wispr Flow-style dictation, eliminating up to $37 a month in subscriptions while keeping your voice data entirely on your own machine.
Voicebox is a free, open-source desktop app (MIT license, 29k GitHub stars) that bundles voice cloning and local dictation into one offline tool. It targets the combined $37/month cost of ElevenLabs and Wispr Flow. Part 1 covers the install path on Mac including the in-app updater bug and Gatekeeper warning, then walks through the models tab to help you pick between Qwen TTS 1.7B (best quality), Kokoro 82M (low-spec fallback), and Chatterbox Multilingual (emotion control, multilingual). The Captures tab does hotkey dictation locally via Whisper and never sends audio to any external API.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
Hook and series overview: free local voice cloning, mini-series structure covering install, cloning test, dictation, and MCP/agents.

Feature breakdown: clone voices, seven TTS engines, dictate into any app. Cost comparison: ElevenLabs $22/mo, Wispr Flow $15/mo, Voicebox free.

voicebox.sh, Jamie Pine as creator, 29k stars, MIT license, 4k forks, documentation at docs.voicebox.sh.

Download from voicebox.sh/download, Apple Silicon vs Intel vs Windows builds. In-app updater broken on Mac — download fresh copy instead. Gatekeeper workaround via System Settings > Privacy & Security.

First tab to check on fresh install. Two categories: voice generation models and transcription models. Shows what is already downloaded.

Qwen TTS 1.7B recommended (4GB, best quality). TTS 0.6B for mid-spec. Kokoro 82M for low-spec/low-disk fallback. Parameter count explained simply.

Chatterbox Multilingual (3GB, emotion tags, multi-language). Chatterbox Turbo (English-only, faster, distilled). Whisper base/small/medium for transcription — pick based on disk space.

Generate tab: voice generation UI with engine and language selection. Stories tab: multi-track timeline editor to combine voices and produce podcast-style content.

Hotkey-activated dictation into any app. Drop in audio files for transcription. Runs Whisper locally — data never hits OpenAI or Anthropic.

Voice library (cloned + licensed built-ins). Effects: robotic, radio, echo, deep voice, custom presets. Settings overview. MCP/API interface teased for part 4.
When a tool bundles voice cloning and local dictation under a single MIT license, the model selection decision — not the software itself — is where most people waste an afternoon.
“It never leaves your system — it does not hit any APIs from OpenAI, Anthropic, or anyone else because the models are running on your machine.”
“I cloned my own voice on my Mac last night completely for free with a tool that claims it can replace ElevenLabs and Whisper Flow completely.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
A free, open-source app called Voicebox quietly arrived on GitHub and promised to do the job of two paid subscriptions — voice cloning and local dictation — entirely on your own machine. Part 1 is the setup: installation pitfalls, model choices, and a tour of every tab before the real tests begin in parts 2 through 4.
A decision framework for picking which TTS model to download based on hardware constraints.
“So you do not miss the next part where we focus on the voice cloning, the generation, the captures, and also on the AI agent side of things.”
Repeated subscribe CTA at natural series break points — mid-video before voice cloning section and at the end. Honest framing: the real test is in future parts.
00:00
00:17
00:29
00:40
00:52
01:04
01:15
01:27
01:39
01:48
02:01
02:14
02:20
02:37
02:49
03:00
03:12
03:24
03:35
03:47
03:59
04:11
04:22
04:34
04:46
04:57
05:09
05:21
05:32
05:44
05:56
06:07
06:19
06:31
06:42
06:54
07:06
07:17
07:29
07:41
07:52
08:04
08:16
08:27
08:39
08:51
09:02
09:14
09:26
09:37
09:49
10:01
10:12
10:24
10:36
10:47
10:59
11:11
11:22
11:34
11:46
12:00
12:11
12:23
12:35
12:45
12:57
13:09
13:19
13:31
13:43
13:54
14:06
14:18
14:29
14:41
14:53
15:04
15:16
15:28A 14-minute walkthrough of the Noose desktop installer that finally lets non-technical users run one of the most capable open-source AI agents without touching a terminal.
June 6thHow an Apache-licensed local clone removes the weekly cap, unlocks every LLM, and lets you build client work without burning Anthropic credits.
May 1stAndrew Warner and Peter Cooper rank the week's top 10 AI GitHub repos and debunk most of the headlines.
June 5thA 19-minute walkthrough for running a fully private AI operating system on your laptop, free and offline-capable.
June 5thHow to wire a top-10 ranked free reasoning model into an open-source persistent agent harness and what you can actually do with it.
May 25thBetter Stack puts the open-source Claude Design alternative through its paces — and it works, even with a non-Claude model.
May 15th