Modern Creator
STARTUP HAKK · YouTube

Stop Renting Your AI. Here's How To Own It.

Spencer reverse-engineers Claude Code's source to prove the gap between cloud AI and local AI is engineering — then ships a free, open-source terminal agent to close it.

Posted
3 weeks ago
Duration
Format
Tutorial
educational
Views
94.9K
1.8K likes
Big Idea

The argument in one line.

The gap between cloud AI and local AI is engineering, not capability—Open Mono Agent proves developers can own a production-grade coding agent on standard hardware for zero cost instead of renting cloud subscriptions.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • A software developer or CTO currently paying for cloud AI coding assistants who wants to understand the engineering gap between rented and local models.
  • A founder or tech lead managing AI costs across a team and exploring whether self-hosted solutions can reduce subscription spend without sacrificing productivity.
  • An engineer comfortable with terminal workflows and C#/.NET who wants a free, open-source coding agent they can run locally with full control over model behavior.
SKIP IF…
  • You're not comfortable managing your own infrastructure or debugging local model setups — this video assumes hands-on technical capability, not turnkey solutions.
  • You rely on closed-source vendors' safety guardrails and compliance features as non-negotiable — local models shift responsibility for output quality and risk assessment to you.
TL;DR

The full version, fast.

Cloud AI coding tools charge subscription prices for what is mostly infrastructure: only about 1.6% of Claude Code's reverse-engineered codebase is actual AI decision logic, while the rest is context pipelines, memory, permissions, and scaffolding any competent team can build. The gap between cloud and local AI is therefore engineering, not magic, and it closes once someone ships the harness around open models like Qwen, DeepSeek, and Gemma. OpenMonoAgent is that harness: a free, open-source, C#/.NET terminal agent that installs in one command, runs Docker-sandboxed on a $1,000 gaming PC at 40-plus tokens per second, and uses typed playbook gates instead of skippable skill prompts. Own the stack, swap models freely, and keep proprietary code on your hardware.

Members feature

Chat with this breakdown.

Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.

Create a free account →
Chapters

Where the time goes.

00:0000:41

01 · The 1.6% Hook

Reverse-engineered stat: only 1.6% of Claude Code is AI logic. The rest is infrastructure. If that's true, why pay subscription prices? Cloud models also get quietly nerfed over time.

00:4701:53

02 · Spencer + The Engineering Gap

Channel intro. Fractional CTO background. The gap between cloud AI and local AI is an engineering gap, not a magic gap. StarterPack has built something to close it.

01:5302:23

03 · Model Drift Problem

Cloud model behavior shifts over time. Guardrails tighten. Output quality drifts. Prompts that worked last quarter produce different results now. Rate limits stifle innovation.

02:2304:19

04 · OpenMonoAgent Launch

The product: a free, local-first terminal coding agent. Single install command. Run on gaming PCs ($1K with RTX 3090) or mini-PC bricks (~20 tok/s, ~25W). No metering, unlimited tokens.

04:1905:23

05 · Feature Walkthrough

Embedded inference with zero setup. Docker-sandboxed by default. 20+ MCP tools built in. Built in C#/.NET. Blazing fast LSP for C# and TypeScript.

05:2307:09

06 · Playbooks vs. Skills

Skills are prompts — the model can drift, skip, or misinterpret. Playbook gates are code — the executor calls them, the LLM is not in the loop and cannot hallucinate past them. Typed, composable, stateful workflow automation.

07:0908:01

07 · Giveaway CTA

Free Ryzen mini-PC inference box giveaway. Sign up at openmonoagent.ai. Manifesto restatement: AI shouldn't be a subscription.

08:0109:21

08 · Zero-Cost Architecture

C# choice explained — infrastructure-grade, not a weekend project. Model-agnostic (swap the engine without buying a new car). No telemetry, no tracking. Install command on the landing page.

09:2110:14

09 · Privacy Argument

Every cloud AI prompt leaves your machine. For client code or NDA work, that's real exposure. OpenMonoAgent has no server to exfiltrate data to — everything runs on your hardware.

10:1411:34

10 · Why C#/.NET

Production-grade, cross-platform, type-safe, long-term maintainability. Python is for experiments; C# is for things meant to run for years. Onboarding is a first-class concern — single command because bad DX kills open source projects.

11:3413:38

11 · Linux/Git Historical Precedent

Linux was called a toy. Git was called a toy. The pattern repeats: incumbents dismiss → developers adopt → it becomes the default. Local AI agents are next. Spencer is taking that bet.

13:3814:42

12 · Democratization

Real democratization: a developer in Nairobi has the same AI coding tools as Google engineers. No credit card. Free permanently — because free is the only price that's truly universal.

14:4217:02

13 · Live Demo

SpencerFiresup OpenMonoAgent on a snake game project. 41 tok/s on RTX 3090. Reviews the project, spots missing .gitignore, fixes code quality issue, initializes git repo. Comparable to Claude Code in real usage.

17:0217:47

14 · Outro CTA

openmonoagent.ai install command. Star the GitHub repo. Like and subscribe. If you need custom software, starterpack.com.

Atomic Insights

Lines worth screenshotting.

  • Only 1.6% of Claude Code's source code is actual AI decision logic — the other 98% is infrastructure, context pipelines, memory systems, and safety scaffolding you could build yourself.
  • Cloud model behavior shifts over time silently — guardrails tighten, output quality drifts, and prompts that worked reliably last quarter stop working without any announcement.
  • The gap between cloud AI and local AI is not a magic gap — it is an engineering gap, and engineering gaps close when developers decide to close them.
  • Most people try a local model once, compare it to cloud, and conclude it isn't ready — what they're actually missing is the harness, not the model.
  • A $1,000 gaming PC running a local model at 41 tokens per second is a one-time capital expense versus an ongoing subscription — the economics are not close over 24 months.
  • OpenMonoAgent — a free, open-source, C#/.NET local coding agent — runs entirely on local hardware with a full playbooks system and zero data sent to external servers.
  • You adapt to cloud model quality degradation without realizing it — which means you're shipping worse output without a benchmark to notice the change.
  • Paying cloud subscription prices for infrastructure you could own is, in engineering terms, a tax — and the receipt is the 98% of Claude Code's codebase you're funding but not using.
Takeaway

The stat-hook + manifesto format.

Self-host revolution playbook

One precise, counterintuitive number does more work than five minutes of explanation — find yours and open with it.

  • Find JoeFlow's 1.6% equivalent: cost-per-hour of Whisper API vs. local Whisper over 12 months, or what percentage of a SaaS tool's code is actually the AI vs. the scaffolding around it.
  • Pair the stat with the manifesto line in the same breath — the number creates the opening, the manifesto closes it.
  • Use the Linux/Git toy pattern for the self-host revolution arc: every tool that's now default infrastructure was called a toy. Self-hosted Supabase, Nginx, PM2 were all toys. The $6 Stack is next in the sequence.
  • The playbooks-vs-skills framing (code vs. suggestion) is the right way to talk about agent reliability for JoeFlow sessions — JoeFlow skills could be described the same way.
  • Spencer's demo ran 41 tok/s on a $1K gaming PC. If JoeFlow ever does a local Whisper benchmark, lead with tokens-per-dollar or minutes-per-dollar vs. cloud.
Glossary

Terms worth knowing.

Claude Code
Anthropic's command-line coding agent that runs in the terminal and uses the Claude model to read, write, and modify code in a developer's project.
Local AI
Running large language models on hardware you own rather than calling a hosted cloud API, so prompts and data never leave your machine.
Inference
The act of running a trained AI model to produce output. "Inference machine" means the computer doing that work, separate from the one you're typing on.
Harness
The surrounding software layer that turns a raw language model into a useful agent — context management, memory, tool use, permissions, and orchestration.
Context pipeline
The system that gathers, filters, and feeds the right information into a model's prompt window on each turn so it can answer accurately about a specific codebase.
Rate limits
Caps that cloud AI vendors put on how many requests or tokens a user can send in a given time window, often interrupting long coding sessions.
Token
The basic unit of text a language model reads and writes — roughly a short word or fragment. "Tokens per second" measures generation speed.
Qwen
A family of open-weight large language models released by Alibaba that can be downloaded and run locally for chat and code tasks.
DeepSeek
An open-weight model family from a Chinese AI lab known for strong coding and reasoning performance at a fraction of the cost of closed frontier models.
Gemma
Google's family of open-weight small language models designed to be downloaded and run on consumer hardware.
H100
NVIDIA's data-center AI accelerator card, costing roughly $30,000, used by cloud providers to serve frontier models at scale.
RTX 3090 / 4090 / 5090
Consumer NVIDIA gaming GPUs that have enough video memory to run mid-sized local language models, making them a budget alternative to data-center cards.
NUC / mini PC
A compact, low-power desktop computer (originally Intel's "Next Unit of Computing" form factor) small enough to sit on a shelf yet capable of running a local AI model.
Ryzen AI 9 HX
An AMD laptop-class processor with a built-in neural accelerator, used in small form-factor PCs to run modest local AI workloads at low wattage.
Docker sandbox
Running code inside an isolated Docker container so the AI agent can read and edit files in a project without being able to touch the rest of the host machine.
MCP tools
Tools exposed to an AI agent through the Model Context Protocol, an open standard for letting language models call external functions like file editors, search, or databases.
TUI
Text-based user interface — an interactive app that runs inside the terminal with menus and panels, instead of a graphical window.
.NET / C#
Microsoft's cross-platform application framework and its primary language, known for strong typing, performance, and long-term maintainability in production systems.
LSP
Language Server Protocol — a standard interface that lets editors and agents get smart code features (autocomplete, go-to-definition, errors) for a given programming language.
Skills
Reusable instruction files (popularized by Claude Code) that tell an AI agent how to perform a recurring task. They're suggestions the model can still ignore or misinterpret.
Resources Mentioned

Things they pointed at.

02:35productOpenMonoAgent
02:35toolQwen 3.6 model
02:35toolDeepSeek
02:35toolGemma 4
Quotables

Lines you could clip.

01:33
The gap between Cloud and Local AI is not a magic gap. It's an engineering gap, and we've helped close that gap.
Clean thesis, no setup needed, provocative framingTikTok hook↗ Tweet quote
07:15
A skill is a prompt. The model can drift, skip, or misinterpret. A playbook gate is code. The executor calls this, and the LM is not in the loop. It cannot skip it, hallucinate past it, or decide it knows better.
Technically precise, quotable for developer audience, no setup neededIG reel cold open↗ Tweet quote
03:55
AI shouldn't be a subscription that you rent. It should be infrastructure that you own sitting on your desk, serving your code, answering only to you.
The manifesto line — read off a slide on-screen, perfectly pacednewsletter pull-quote↗ Tweet quote
12:40
The agent is the layer. The model is the engine. Changing engine should not require you to buy a new car.
Tight metaphor, self-contained, no context neededTikTok hook↗ Tweet quote
13:20
Free is the only price that actually is universal.
Six words. Clean thesis.IG reel cold open↗ Tweet quote
The Script

Word for word.

metaphoranalogy
00:00So researchers reverse engineered Cloud Code's entire source and found something that should make every developer stop and think. Only about 1.6% of the code base is actually AI decision logic.
00:091.6%. The other 98% is infrastructure, context pipelines, memory systems, permission layers, safety scaffolding. So here's the question.
00:17If the intelligence is 1.6% of the equation, why are we paying cloud subscription prices? That's like a 100%.
00:23And here's the part nobody's talking about. Cloud models get quietly nerfed over time, and you adapt to those changes without even realizing it. Today, I wanna tell you about a new release of something we've released here at Starter Pack.
00:32Make sure you stay to the end because we're gonna give you two free things in this, and I mean free as in free as in free. And I'm gonna show you what we built to prove this. Let's dive into it today.
00:47Welcome to Starter Pack. I'm Spencer, and here at Starter Pack, we love to build custom software solutions for companies. With a decade of executive leadership as a fractional CTO on twenty five years in software development, I helped transform tech teams and products, including building out custom AI solutions.
00:59Now look, the AI tooling market wants you to believe that renting access to a model like Cloud Code or Codecs is just the cost of doing business now, but it's not. The gap between Cloud AI and Local AI is way closer than you think. I've got a tool here today that's gonna help you, and it's absolutely free.
01:14I'm not selling you anything. Before we get into it, one of the biggest things you can do is drop a comment, and as always, make sure you follow what I'm gonna give you here. So researchers who reverse engineered Cloud Code found that the actual AI decision logic is around 2% of it.
01:26The remaining 98% is context pipelines and a lot of other pieces. This matters because it reframes the entire conversation. The gap between Cloud and Local AI is not a magic gap.
01:35It's an engineering gap, and we've helped close that gap with a free service we're gonna give you guys today. Engineering gaps close when developers decide to close them. Most people try to local model once, compared it to one of the others and said this wasn't quite ready, but what they were missing was they were actually just missing the harness.
01:50The models are are moving fast, and this is Quinn. This is DeepSeek. There's a lot of the other models, and there's a lot you can do.
01:56Gemma four was released. And here's something that doesn't get talked about enough. The cloud model behavior shifts over time, and most users absorb those changes.
02:03Guardrails tightened. Output quality drifts. We've seen clawed code dramatically drop over the last few months.
02:08Prompts that worked reliably last quarter start producing different results this quarter. Now you adapt and you rewrite your prompts and you try to modify things with skills and you do this thinking you can beat it, But the biggest problem here is rate limits are absolutely stifling innovation. Price restructures, capability rollbacks.
02:24When a vendor controls your model, they control your whole workflow. You've gotta take that control back. That's why we launched Open Mono Agent.
02:31It's an AI that you don't have to meter here. Right? Unlimited tokens forever.
02:35Now you're saying, what are you selling me, Spencer? Hear me out here because this is your machine, your agent, and you use it from anywhere. Now it's a kicker.
02:42So I know a lot of times you've probably tried to set up a local model in the past and thought, man, is way too complex, way too difficult. Look, guys, it's one copy. You copy this, paste it in.
02:51It's gonna give you three options. You can either run the whole stack on your machine, which I'm gonna show you here in a minute, or you can install the inference on one machine. Now you think, oh, I've gotta have a really expensive h 100 that's $30,000.
03:01Absolutely not. We are doing this on standard hardware. See this stuff back here behind me?
03:05These are normal gaming machines. Very low end by today's standards in a lot of cases. This one right behind me just has a thirty ninety in it, guys.
03:12This is about a thousand dollar gaming machine. You can find these on your local Craigslist, Facebook Marketplace, like wherever you go pick up your stuff. But even more importantly, we've built them out on these little nook and bricks here that actually work, then give you about 20 tokens per second.
03:24That's very comparable to what you get with Claude Code or with Codex. So for about 20 tokens per second, you can own the whole hardware, and this is very reasonably priced. And I'm gonna show you something here at the end that you're definitely gonna wanna make sure you stay at the end.
03:36Now how do you get started on this? It's really easy to start. Copy this, paste it, run it.
03:39I wanna go through some of the features here with you because one of our manifesto here is that AI shouldn't be a subscription that you rent. It should be infrastructure that you own sitting on your desk, serving your code, answering only to you. Now this is local first always.
03:51That means you own everything. The model run, everything to the top, to the bottom of the stack. Nothing goes across the cloud.
03:57Unlimited tokens. You want this thing to run for four days? All you.
04:00Like I said, these things here run on about 25 watts. K? The other thing is we've built this so that they're sandboxed by default.
04:07You get a Docker native so your agent mounts your project in, and it doesn't escape. Permission gates are right inside that Docker, and it's fully 100% open source. Don't believe me?
04:17Well, here you go. Here's the whole project, open source, right here on GitHub, all for you.
04:22We have a massive amount of documentation that we've worked on, and the whole project is ready to go. This is not just proof of concept, guys. I'm gonna show you a working demo here in a minute.
04:30But this is the full thing. You can go in and read the documents. Each of these go into the different parts.
04:34And I'm gonna go through some of those parts with you here, wanna but talk through some of these. So first of all, it's embedded inference zero setup. You literally run the script.
04:41If you decide to run it on two machines with the inference and your agent, then you can run the agent on your dev laptop and run your inference on the machine back home, and we connect them with a relay server. It's trivially simple. TUI is our interface that we use, and so it's built for long sessions, you can continue to run it indefinitely.
04:56It's Docker sandboxed. We have over 20 different MCP tools built in. It's built for .net, focused on dot net.
05:02So we actually built it with csharp.net, and you'll see all the code, and it is blazing fast. LSP for c sharp and TypeScript.
05:10Playbooks, this is our version of skills. Playbooks is dominant over skills. There's so much more you can do with a playbook.
05:16These are typed composable stateful workflow automation, step sequencing, gates, and templates, not just markdown recipes. This is not just one flat text file, folks, and it's very easy. The agent itself will actually help you write these.
05:28Now we also have our dual box mode that I talked about here, and one of the best parts is we're hosting a free Relay server where you can actually go and sign up and set up your Relay between the two boxes. Absolutely free, totally encrypted, 100% secure. We're not getting any data from you.
05:42We'll get your email so that that's your ID. Other than that, like and we're not doing anything those. Next is you have persistent sessions.
05:48We actually are saving your sessions in JSON. They stay on your machine. We're not saving them.
05:52I'm sorry. They're saving on your machine on the agent machine. This thing runs we've probably installed it about a thousand times, and I'm not exaggerating here.
05:58You wanna see some of these servers behind me? These are about half of the dev servers that we have. See the ones underneath the desk over here, the ones over.
06:04We have about 20 to 30 different type of workstations varying from boxes these size up to 50 nineties. We've not done anything larger than fifty ninety on this. And with that, we have this incredible set here.
06:14So you can do these little bricks, right, which are Ryzen nine seventy nine forty HS, get about 20 tokens per second. Thirty ninety is about 50. The forty ninety is we got a little typo here we need to get fixed up.
06:24It's about 60 tokens per second, and the 50 nineties are running closer to a 100 tokens per second. We actually have tested with five different developers all running against one fifty ninety at the same time using the dual box setup. So you can go through and compare how it stacks up against other things, but really, you're truly up and running in one single command, like two commands because you're install, and then you're up and running with MonoAgent.
06:45This is yours, guys. It's open. It's local.
06:47It's yours. It's forever. Incredibly fast.
06:49Right? We have absolutely worked to optimize this. 100 open source here, folks.
06:53Biggest favor that I ask is you just leave us a star on this, because as you know, one of the best ways you can get it is to help us to get the stars in there. Now we can go through more of these features, and I wanna dive into a couple more of these features with you here before I go on a little bit more. So with this, uh, I wanna talk a little bit about playbooks.
07:07Your agent needs guarantees, not just suggestions. So a skill is like a suggestion.
07:11It's like, hey, if you kinda wanna do this thing, go over here and act like this. A skill actually tells it exactly how to run. It's not just a prompt.
07:18A skill is a prompt. The model can drift, skip, or misinterpret. A playbook gate is code.
07:22The executor calls this, and the LM is not in the loop. It cannot skip it, hallucinate past it, or decide it knows better. Now I'm telling you guys, this is way better.
07:30We have tons of documentation around this. We have really worked hard to make this work really, really well, and we're really incredibly proud of this. So if you're doing something with OpenClaw, this is gonna run circles around that.
07:41Now, last but not least here, one of the things that I've been talking about. Go and make sure you sign up. We're doing a free giveaway.
07:47Sign up here because we're gonna give away one of these Ryzen boxes where you can run your own inference box at less than, like, I think they're about 25 watts. So it's pretty incredible. These are amazing.
07:57You know, believe me, I'm not trying to sell these. This is just a link to Amazon. Right?
08:01You can go get one of these boxes yourself. But you can see that this is a great opportunity, and we are giving it away. I just want my goal here, as the manifesto states, because you can see the manifesto.
08:11My manifesto is that it shouldn't be a subscription, and this is what we're trying to do. I wanna give the opportunity for people to be able to set up and learn how to use AI locally. See, I have a lot of beefs with the big frontier models, and at this point, we all do.
08:23We have a lot of beefs with them. We have a lot of complaints. Open MonoAgent is a terminal native coding agent that runs entirely on your machine powered by local LLMs at zero cost.
08:33It's written in csharp.net, which was a a deliberate choice, not a limitation, because AI tooling should be built like infrastructure, not like a weekend side project. It installs with a single command, and guess what?
08:43It runs on every platform. The agents will run on Mac OS, Linux, and on Windows. Right?
08:47So you can install the agent, run it locally, and then run the inference on some other box like one of these and run it and at a very low affordable cost, and you own the whole stack. Did I mention that you own the whole stack? It installs with a single command, and it's model agnostic.
09:02You can change out the model, but we have the models already all tweaked for you. So if you use the CPU, we have a very specific 3.6 model. We have another QUEN 3.6 model.
09:11These have been tested and work fantastically. There's no telemetry, no tracking, no free tier, like, the whole thing free.
09:18I want you guys to be able to use this because go to monoagent.ai right now and install the command. It's right there on the landing page, and it could be running in minutes.
09:25Now every prompt you send to cloud AI tool leaves your machine, and that's not paranoia. That's just how it works. Right?
09:32For personal projects, that's probably fine. But for client code, proprietary algorithms, anything under an NDA, that's real exposure. Open Mono Agent has no server to exfiltrate your data to because everything runs locally on your hardware.
09:44There's no, well, we may use interactions to improve our models. There's no terms and services buried there. You go pull down the code, you can see what we're doing.
09:50You can even modify it. Do what you want with it. Help us improve it.
09:53Do a pull request. Right? We're gonna continue to build on it.
09:55We have a lot of huge plans. Next week, we're rolling out our mobile apps, which will allow the mobile phone to then be able to be in control of it. We then are also rolling out a Versus Code extension that's gonna continue to improve upon this.
10:06So when your AI stack is local, the compliance conversations simplifies dramatically because there's nothing leaving the building. Now we built OpenMonoAgent in csharp.net.
10:15This was by choice. .Net is cross platform, it's production grade, and has one of the most mature ecosystems for system level tooling in software development. After a long time in the industry, I've watched developers use NPM packages, use Python, and I've seen just a lot of soiled projects.
10:30C sharp gives us type safe long term maintainability and performance characteristics that matters when you're building something that's meant to run for a long time. Python's fantastic if you're doing experimentations, but C sharp is what you reach for when you need a real production thing to stand up.
10:43Now if you wanna contribute, fork it, extend it, open a pull request, help us build it out. We're continuing to add more things to it.
10:50You can even go see the road map in there, and I have a team that's dedicated to this. I have multiple senior developers. I have a full time PhD AI engineer.
10:58I have multiple other junior developers. I have a large team that's working to continue to build this faster. Why?
11:04Because we love to build custom software solutions for people, especially stuff that's built that you own. The fastest way to kill an open source project is to make it a three day configuration exercise before you get into anything in front of it. Open Mono Agent installs with a single command, not because it's simple under the hood, but because we engineered the setup to be invisible.
11:20This took a lot of time. Developer time is expensive, but if the tool costs more than the configuration and saves you in the first week, you've already lost in the argument.
11:27So we made onboarding a first class concern because that's where most developer tools go to die. So get it running, connect your local LLM, and go to openmonoagent.ai because the command's right there on the front page.
11:38Now every piece of foundational infrastructure in modern software starts as something that somebody gives away. Linux was dismissed by enterprise vendors as toy that would never handle real workload. Now you know how that ended.
11:48Right? Git replaced a version in ClearCase, not because it had a better sales team, because developers adopted it and was genuinely better. The pattern repeats across every technology generation.
11:57Incumbents call it a toy, developers use it anyways, and eventually becomes the default. The companies charging you for AI coding tools today are going to call local agents a toy, and they're gonna be wrong. They're betting on a pattern one more time, and I'm gonna take that bet every single time.
12:11Now, OpenModel Agent doesn't care which LLM you run, but we've picked out some really good ones that we've tuned for specific setups. Vendor lock in on AI models is not just a version of a problem the software industry has been solving for decades with varying success. When a better model ships next month, and one will, you swap it out with one rebuild.
12:28You don't even rebuild. You just literally swap the model out. The inference then continues to run on the new model, and you're off and running.
12:33The agent is the layer. The model is the engine. Changing engine should not require you to buy a new car.
12:39So that flexibility compounds over time in a way that single model benchmark scores can't match. So this is what democratizing AI really looks like. And it shows up in a lot of marketing copy in other places, but real democratization means a developer in Nairobi has the same AI coding skill developers that they have at Google.
12:55No credit card required. It means students building their first serious project has access to the same category of tools as a funded startup. It means developers in countries with weaker purchasing power aren't priced out of the tools that they need to be able to compete in the market.
13:08Open Mono Agent is free because free is the only price that actually is universal. There's no purchasing power here. So that's a thesis.
13:15Not free trial, not free tier, free, permanently, because the mission requires it. Now we may work into some larger things in the future, but mostly this is an opportunity for us to be able to work with folks to be able to show and demonstrate our understanding of how AI works. So Open Mono Agent isn't just free to use.
13:30It's free to study, modify, fork, redistribute. And again, I'm putting my money so much where my mouth is that I'm even going to give one away for free. Right?
13:39We're doing this. We're gonna announce it on May 15. So go and get signed up because this is free as in free as in free.
13:45Now if you're a dot net developer or a C sharp practitioner or someone who wants to help us better the local AI tooling, dive in and do a pull request. We welcome it. But the biggest thing I can ask is star the repo.
13:55Open an issue. Tell us what's broken, what you wish it would do. The project's gonna grow as fast as the community decides that it should, And that's always the best open source project out there.
14:03And I'm willing to commit some resources to this. The companies charging you for AI coding tools are going to call Open Mono Agent a toy. And I'm just gonna plan on that.
14:12But again, remember Linux was a toy. Git was a toy. The entire foundation was a toy.
14:16These tools got called toys by incumbents who had nothing, who had something to lose from developers owning their own stack. Does that sound familiar? Now, I wanna give you guys just a little quick demo.
14:25This is gonna be super fast because we've already gotten really long on this video, but I wanna be able to show you guys how well this works here, okay? So I'm firing this up here. So we can see here that I've got a local project, and this is actually a little small snake game project that that the tool actually wrote itself, but I'm not gonna write that one for you here now.
14:41So all we do is type open mono agent. Okay?
14:44So kinda like writing Claude. Boom. There we go.
14:46Coder review found, but no tool graph. So we'll talk about Coder review graph. This is a great powerful tool.
14:51We'll talk about that another time. So let's say, review the project.
14:55Give me feedback on what we need to improve. Okay, so firing this up, you can see it's already firing through tokens.
15:05Looks like we're burning about 41 tokens per second. This is on this machine that was behind me here, so if you look back behind me, this is on this machine here that's running the 3,090. Alright, now let's check the build state.
15:14So again, I'm running this and so I'm gonna say yes, I want this to, you know, to run. Okay, we're gonna tell it to give it access. Okay, so it's a minimal ASP dot core static file server, give this kind of the outline.
15:26It says, hey, this should be you know, so it's telling me to do some Canva wrappers. K? Game state uses all let's, you know, let's do global, so it's gonna give us some suggestions there.
15:36Oh, hey, there's no gitignore bin, right? So we've got some problems here, right? So it's telling us it's already doing the you know, going to do some modifications on this.
15:43So it's saying, for code quality, game over doesn't return anything, but it's called return game over. Right? So it's giving us examples.
15:49So I can say please now, because you know Sam told us not to say please because, like, that burns out the tokens. Guess what? We don't care about tokens.
15:54I already burned 38 tokens, 38,000 tokens, nobody cares. Please fix number one and create a git repo for this project.
16:04Okay. Let's let it go to town here. It's probably gonna ask me to do a couple it's probably gonna, you know, ask me for some prompts, because things like creating the git, was probably, see, it's gonna take a file permissions.
16:13So we can also set these two. My head's in the way, but down here in the other corner. In fact, let me get my head out of the way.
16:18We already got a prompt here. You can actually change the different modes on it, just like you would expect from any of these models. So we can see that we've, you know, it's prompting us for some of these, but we have some of our different slash commands that we can change out.
16:29It's still asking for various different permissions, and I just clicked off of one of them. So let's say yes.
16:35Oh, let's do a instead. I keep saying yes. So the git ignore isn't taking effect because this, right, and so no commit yet, so git reset head doesn't work.
16:43Let's use this. So you can see it's working through this. This would be what you would expect from like a cloud code, right?
16:48And this is an example of, you know, what you would expect from that. So there it goes. There's our Git repo, right, initiated the Git branch, did all this, committed it, boom.
16:56And all of this is running inside. So let's Control C out of this. Hey look, we have our Git, right?
17:01Generated the Git, added all this. So you can see that all of this is that it's working great.
17:06So I can go through and I can demo this for you a long time, but really the biggest thing is pull this down, try it out, sign up for the free giveaway. Go for this. This is one of the big things we're very excited about, openmonoagent.ai.
17:17Right? This is a great opportunity for you to be able to dive in, learn how AI works, look at things under the cover, and make sure you are running your own local AI instead of giving away all of your data to these large data providers regardless of what their terms and services say. So go check it out.
17:31Do us a big favor. Leave a star there. And as always, make sure you like and subscribe.
17:34I'm gonna be teaching about this over the next week and teaching about some of the different features of it. So make sure you follow along because we're gonna be building a lot of this. And as always, if we can help you with custom software solutions, go check out starterpack.com.
17:45And otherwise, we will catch you tomorrow.
The Hook

The bait, then the rug-pull.

One number changes the math: 1.6%. That's the share of Claude Code's codebase that's actual AI decision logic. Spencer from STARTUP HAKK leads with this reverse-engineered stat to force a question — if the intelligence is 1.6% of the equation, why are developers paying cloud-subscription prices for all of it? The answer, he argues, is that the gap isn't magic. It's engineering. And engineering gaps close when developers decide to close them.

Frameworks

Named ideas worth stealing.

00:00concept

The 1.6% Reframe

Open with a counterintuitive precision stat about what the competitor actually delivers vs. what you pay for. Forces the audience to question the value prop before the product is even named.

Steal forJoeFlow equivalent: cost of cloud transcription subscriptions vs. local Whisper over 12 months, broken into a per-hour price that sounds absurd
01:33concept

Engineering Gap vs. Magic Gap

Any perceived gap between cloud and local tools is engineering, not magic. Engineering gaps close when developers decide to close them. Removes the mystique from the incumbent.

Steal forThe $6 Stack positioning — the complexity of self-hosting is an engineering gap, not a complexity that only enterprises can afford to solve
07:10model

Playbooks vs. Skills (Typed Gates)

Skills = prompts (model can ignore, drift, misinterpret). Playbooks = code (executor calls them, LLM not in loop, cannot hallucinate past a gate). The distinction between suggestion and guarantee.

Steal forAny agent reliability argument — Claude Code skills vs. structured workflow gates
11:34model

Linux/Git Toy Pattern

  1. Incumbents call it a toy
  2. Developers adopt it anyway
  3. It becomes the default
  4. The pattern repeats

Every foundational infrastructure tool was dismissed by incumbents as a toy. Local AI is next in the sequence.

Steal forSelf-host revolution content arc — every piece of the $6 Stack was a toy before it was default infrastructure
12:40model

The Agent/Model Layer Separation

The agent is the layer. The model is the engine. Changing engines should not require buying a new car. Model-agnosticism as a core design principle.

Steal forMCN platform architecture content — separating the orchestration layer from any single model vendor
CTA Breakdown

How they asked for the click.

17:02link
Go check it out. Do us a big favor. Leave a star there. And as always, make sure you like and subscribe.

Soft and multi-part — star repo, like, subscribe, visit openmonoagent.ai, starterpack.com for custom dev. No hard sell. The giveaway CTA (mid-video, t=429) was sharper and earlier.

Storyboard

Visual structure at a glance.

hook
hookhook00:00
intro
promiseintro00:47
manifesto
valuemanifesto03:55
playbooks
valueplaybooks07:10
linux/git
valuelinux/git11:34
live demo
prooflive demo14:42
CTA
ctaCTA17:02
Frame Gallery

Visual moments.