Codex Just Quietly Changed How People Work FOREVER (Role Specific Plugins)
A hands-on walkthrough of OpenAI Codex role-specific plugins and three live demos that show what it looks like when an AI runs your entire job function.
June 5thA 19-minute build walkthrough: four prompts to a coding agent, and your Mac responds to your voice across every app -- browser, SaaS, Premiere Pro.
GPT Realtime 2 bridges conversational voice AI with real computer control through tool calls, and four plain-English prompts to a coding agent are all it takes to build a working voice-operated Mac.
GPT Realtime 2 differs from ChatGPT voice mode in one way: it fires tool calls mid-conversation, meaning it can take real actions on your computer while still talking back. The build takes four prompts to a coding agent: open a WebSocket to the model, add push-to-talk to stop always-on cloud streaming, add web search via three Chrome tools, then connect apps through MCP servers or the macOS accessibility tree plus agent-desktop. Honest caveats: latency, patchy accessibility support in some apps, and per-command API costs that compound fast without push-to-talk.
Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.
Create a free account →
Cinematic CRT-to-studio open, then live demo: voice commands control Premiere, Spotify, Claude Desktop, and a script rewrite in real time.

Explains the tool-call distinction from standard ChatGPT voice mode. Shows OpenAI playground JSON function schema. Visualizes terminal logs of tool calls firing.

Pastes GPT Realtime 2 API docs into Claude/Cursor. One prompt builds the WebSocket app. First test opens Safari by voice. Always-on streaming problem surfaces.

One prompt adds a global hotkey, floating waveform HUD, stops streaming until the key is held. Resolves always-listening problem and cuts costs to pennies per command.

Adds three Chrome tools (open tab, type query, click link via vision). Searches for local restaurants and clicks first result. Grants accessibility permissions once.

Installs Obsidian Local REST API MCP plugin, hands URL to agent in one prompt. Creates note, fetches GPT Realtime 2 docs from web, pastes full page content into Obsidian by voice.

Introduces accessibility tree concept and agent-desktop (open-source). One prompt wires it up. Live demo: pause/play, nudge playhead by frames, cut, mark in/out, ripple delete by voice.

Three caveats: not every app exposes full controls, real API costs per command, noticeable latency. Frames this as a near-future preview, not a daily replacement.
The gap between speaking a command and a computer acting on it has collapsed to a few prompts and a WebSocket -- and the three-tier access model tells you exactly which approach works for any app.
“Voice mode will chat with you, but it can't reach out and touch anything else on your computer. GPT real time two... can also call tools as in fire off real actions while it's still in conversation with you.”
“This isn't just for Obsidian. You can apply this same setup for any application, any SaaS with an MCP server or an API.”
“It's cheap. It's a few pennies per command, but it's not free. And if you leave it streaming to the cloud all day... those costs are just gonna balloon really quick.”
See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.
The video opens with a dramatic rewind through computing history before cutting mid-sentence to a live edit suite where the creator simply says 'hey chat, pause Premiere' and it does. That 15-second sequence is the entire thesis made visceral before a single word of explanation.
Every app falls into one of three tiers for voice control, each requiring a different implementation strategy. Add each tier with a single prompt to a coding agent.
Solves always-on streaming cost and ambient-pickup problems. Nothing streams to the cloud until the key is held, keeping costs to pennies per deliberate command.
“Let me know if you're experimenting with GPT real time too. If you have any questions, happy to help.”
Soft close, no hard pitch. GitHub repo and newsletter linked in description rather than pushed verbally.
00:00
00:21
00:36
00:50
01:05
01:25
01:27
01:41
01:56
02:17
02:26
02:41
03:00
03:15
03:29
03:43
03:58
04:12
04:27
04:48
04:50
05:06
05:25
05:39
05:54
06:08
06:22
06:37
06:51
07:06
07:20
07:35
07:55
07:57
08:24
08:28
08:47
09:01
09:16
09:30
09:43
09:57
10:07
10:28
10:43
10:57
11:11
11:26
11:35
11:59
12:09
12:24
12:38
12:52
13:07
13:25
13:36
13:50
14:05
14:14
14:27
14:42
14:58
15:16
15:33
15:53
16:00
16:15
16:28
16:44
16:58
17:13
17:27
17:41
17:56
18:10
18:24
18:38
18:54
19:07A hands-on walkthrough of OpenAI Codex role-specific plugins and three live demos that show what it looks like when an AI runs your entire job function.
June 5thThree identical one-shot prompts. Two models. The gap was not close.
June 11thAn 11-minute case for why manually prompting AI agents is already dead — and what building business loops looks like in practice.
June 12thA 27-minute walkthrough of every Claude feature beginners skip — from smarter prompts to reusable skills that do your work for you.
June 15thSix short phrases a non-coder uses to stop Claude from handing work back and to keep every session on track.
June 14thA 69-minute live workshop walking fitness coaches through the 5 C's framework that turns Claude from a chat tool into a scheduled, connected AI agent stack.
June 16th