Why Modern Creator?

David Ondrej · YouTube

I spent $50,000 self-hosting AI models. You should too.

A two-host deep dive on why self-hosting open-source AI is a freedom fight, not just a cost play — covering hardware tiers, model benchmarks, geopolitical risk, and the case for owning your inference stack.

Posted

June 24th

yesterday

Duration

1:35:52

Format

Interview

educational

Views

13.4K

447 likes

Big Idea

The argument in one line.

Running frontier-capable AI at home crossed from hobbyist experiment to economically rational decision in 2025, and the window to spread ownership before governments restrict access may be shorter than two generations of model releases.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

A developer or technical founder currently spending $2,000–$10,000 a month on cloud inference who has never run the rent-vs-own math on local hardware.
Someone who has tried local models before (Llama 3, 7B era) and wrote them off as not work-usable — the landscape has changed materially.
Anyone curious about hardware tier guidance: what $2,000 / $9,000 / $20,000 / $50,000 / $100,000 in compute actually buys you in model quality and tokens-per-second.
A person who follows AI closely but has not thought through the geopolitical centralization risk — this conversation maps it clearly.
An entrepreneur who handles sensitive client data (healthcare, legal, finance) and needs private inference for compliance reasons.

SKIP IF…

You want a hands-on tutorial — this is a philosophy-plus-overview conversation, not a step-by-step setup guide.
You have no budget for hardware and no near-term path to change that — the actionable parts assume at least $1,000–$2,000 available.
You are looking for model benchmarks with citations — the numbers here are experiential estimates, not published evals.

TL;DR

The full version, fast.

Self-hosting AI crossed a viability threshold: a $50,000 rig of RTX Pro 6000s now runs GLM 5.2 at 60–80 tokens per second — frontier-capable inference that cost $100,000 to match a year ago. The hosts argue ownership is not just a cost play but a political one: governments are likely to restrict access to the next one or two model generations, and spreading open-weight models now, like seeding torrents, is the only hedge. Hardware advice is practical: start with LM Studio on any machine, rent cloud GPUs before buying, and think in increments — two RTX 3090s for under $2,000 run all Qwen and Gemma models; eight RTX Pro 6000s for $80,000 run everything at frontier speed.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Voices

Who's talking.

00:00hostDavid Ondrej

00:15guest0xSero

Chapters

Where the time goes.

00:00 – 02:00

01 · Live demo: GLM 5.2 at home

0xSero demos his home inference rig running GLM 5.2 with custom compression, showing concurrent sessions and token throughput.

02:00 – 08:00

02 · GLM 5.2 review and Chinese model marketing problem

Discussion of GLM 5.2 capabilities — agent work, Docker, reverse engineering — and why Chinese models face a distribution and perception gap despite technical strength.

08:00 – 12:00

03 · Sponsor + Fable 5 personal experience

Oxylabs sponsor read, then both hosts share their experience with Fable 5 before the ban — described as 'an actual contributor, not a tool'.

12:00 – 20:00

04 · Fable 5 ban and the honeymoon period theory

The guest predicts the model will return lobotomized; discusses whether the ban was justified on cybersecurity grounds or a strategic narrative pivot.

20:00 – 30:00

05 · Government centralization risk and AI as freedom technology

Both hosts argue that within two model generations, governments will control access to the most capable models via sanctions-style lists. Bitcoin / crypto analogy introduced.

30:00 – 40:00

06 · Why top talent joins closed-source labs

Discussion of why open-source believers still go to Anthropic — access to the most intelligent systems, Moloch dynamics, and the Claude constitution culture.

40:00 – 53:20

07 · Economics of cloud inference vs. self-hosting

0xSero breaks down why Anthropic's subscription model is unprofitable per consumer but profitable at enterprise, and how self-hosting inverts that math at 2B+ tokens/month.

53:20 – 1:00:00

08 · Self-driving cars as the displacement template

Autonomous vehicles as the current case study for technology deployment being slowed deliberately. Societal effects on male employment and entry-level jobs.

1:00:00 – 1:13:20

09 · Hardware buying guide by budget tier

Practical breakdown: $2K (2x RTX 3090 → Qwen 35B), $9K (2x DGX Spark → Step 3.7 Flash), $20K (4x DGX Spark), $50K (6x RTX Pro 6000 → GLM 5.2), $100K (8x RTX Pro 6000 → everything).

1:13:20 – 1:20:00

10 · Architecture deep dive: MoE, prefill/decode, hybrid rigs

Technical explanation of why mixture-of-experts models have cheaper decode memory than headline parameter count suggests; experimental NVIDIA+Mac hybrid prefill/decode setups.

1:20:00 – 1:26:40

11 · Live demo: agents running locally, Droid harness

Screen share of multiple concurrent local agents — file reorganization, GPU research, Bloodborne-style game generation — all running on the home rig via Droid.

1:26:40 – 1:30:00

12 · Uncensored models, Hugging Face risk, download strategy

Practical case for downloading weights now as a hedge; uncensored Hermes 70B example (peyote cactus care); French government dataset takedown precedent.

1:30:00 – 1:35:52

13 · Why they went all-in, Poland vs. San Francisco, robots

0xSero's origin story (first tech revolution he could participate in at 24), San Francisco energy vs. cost, unitree robot demo with local Gemma inference, closing thoughts.

Atomic Insights

Lines worth screenshotting.

The cost of running frontier-capable open-source inference halved from $100,000 to $50,000 in one year — the trend is down, but the hardware price itself is not.
374 million tokens a month locally is 0xSero's current output — he says he uses even more on remote APIs, making local only a partial substitute today.
Qwen 3.6 27B runs on two RTX 3090s (under $2,000) and benchmarks above Claude Sonnet 4 on several coding tasks according to the guest.
A company spending $300,000 a year on API billing could likely fund a $100,000 local rig that pays for itself in under two years while keeping data private.
Many regulated industries — healthcare, legal, finance — legally cannot send data to Anthropic or OpenAI APIs, making self-hosting the only compliant path.
Anthropic's model terms require data sharing even at enterprise tier, per the guest — private inference eliminates that exposure entirely.
Power-capping GPUs at 40% capacity cuts inference speed 20–30% but makes a home rig practical — full-power eight-card rigs need multiple dedicated circuits.
The 'honeymoon period' effect means model bans deny users the inevitable come-down phase, locking in peak hype and making access restriction politically easier.
An 82% poll result in favor of 'nobody / open source' controlling AI's future reflects the audience's values but also how self-selected that audience is.
Downloading model weights now is analogous to seeding torrents — once weights are distributed, no single government action can remove them from circulation.
DGX Sparks daisy-chain up to four units at roughly 1.5x speed gain per node, giving 512GB of combined memory for about $17,600 — the best $20K configuration today.
Mixture-of-experts models like GLM 5.2 have only 40B active parameters despite 744B total, meaning decode memory requirements are far smaller than the headline size suggests.
The guest predicts government model restrictions will look like sanctions lists tied to company size or employee nationality — a mechanism that routes the best models to enterprise and government only.
Self-driving cars are the current template for how governments throttle transformative technology: they have the capability, but deployment is deliberately slowed to manage labor displacement.
Robots running entirely on local open-source inference — a $30,000 setup with a unitree robot, DGX Spark, and VR headset — are already functional in lab conditions today.

Takeaway

The rent-vs-own math on AI inference has flipped.

WHAT TO LEARN

Self-hosting frontier-capable AI crossed from expensive hobby to rational infrastructure investment in 2025, and the window to act before access gets politically restricted may be short.

01Live demo: GLM 5.2 at home

Running a 744B-parameter model at home requires custom quantization — 80% compression is achievable with tuning beyond the standard 75%.

02GLM 5.2 review and Chinese model marketing problem

Chinese models make deliberate capability trade-offs (e.g., dropping vision to improve coding); treating them as benchmark-optimized ignores these architectural choices.
Distribution is the gap — GLM 5.2 is not available in Cursor, Claude Code, or Codex without manual setup, creating a perception deficit relative to its actual capability.

03Sponsor + Fable 5 personal experience

The difference between current frontier models and Fable 5 was described as moving from 'a tool you must direct' to 'a contributor that understands intent' — a qualitative shift, not incremental.

04Fable 5 ban and the honeymoon period theory

Capability bans are most politically effective at the honeymoon peak, before users can calibrate their expectations against real limitations.
The cybersecurity risk argument — models scanning open-source PRs for zero-day exploits while sitting in a loop — is more technically grounded than most public debate acknowledges.

05Government centralization risk and AI as freedom technology

Governments slow technology by targeting distributors and manufacturers rather than end users — the defense is mass possession before those bottlenecks form.
Spreading open-weight model weights now is a hedge in the same way Bitcoin's distributed possession limits any single actor's ability to confiscate it.

06Why top talent joins closed-source labs

The 'Moloch' dynamic means individual rational actors join the lab with the most capable models even if they prefer open systems, because the alternative is irrelevance while someone else does it.
Anthropic's low churn is attributed partly to a flat titling structure and a shared moral framework embedded in the Claude constitution — culture as retention mechanism.

07Economics of cloud inference vs. self-hosting

Consumer subscriptions at $200/month generate $4,000–$8,000 in compute value — Anthropic is subsidizing consumer usage to build enterprise pipeline, not running a consumer business.
Enterprise contracts that switch from subscription to token billing are the actual profit center — consumer hype is the top-of-funnel, not the revenue stream.

08Self-driving cars as the displacement template

35% of US income derives from transportation jobs — the government's deliberate delay of full self-driving is economic labor-displacement management, and AI job displacement will follow the same pattern.
Entry-level programming is already experiencing the same compression factory jobs did when outsourced — upgrading ahead of automation is not optional.

09Hardware buying guide by budget tier

Start with whatever you have, rent before buying, and think in doublings — one, two, four, or eight cards, not odd numbers.
The RTX 3090 at $1,000 is the best value entry for Qwen/Gemma class models; the RTX Pro 6000 at $10,000 is the right anchor for scaling toward frontier models.
Two DGX Sparks running a MoE model with 10–20B active parameters can reach 20–40 TPS — adequate for most agentic workloads.

10Architecture deep dive: MoE, prefill/decode, hybrid rigs

Prefill (prompt processing) needs all model parameters loaded; decode (generation) only needs active parameters — a 40B-active MoE model's decode step is as cheap as a 40B dense model.
Combining a high-memory Mac or DGX Spark for prefill with a fast NVIDIA GPU for decode is experimentally viable and could halve effective hardware cost for production workloads.

11Live demo: agents running locally, Droid harness

Using a local inference endpoint with Droid gives access to a state-of-the-art agent harness at zero API cost — the only barrier is owning the GPU.
Concurrency matters: four simultaneous streams at 50 TPS each is more useful for most workloads than one stream at 200 TPS.

12Uncensored models, Hugging Face risk, download strategy

Downloading and storing weights now is low-cost insurance against future access restrictions — 20TB of NAS storage holds thousands of models.
Uncensored models solve mundane refusal problems (peyote cactus care, chemistry questions) that frontier models block due to overfitted safety training.

13Why they went all-in, Poland vs. San Francisco, robots

A $30,000 setup — unitree robot ($20K), DGX Spark ($4K), VR headset ($1K) — is sufficient to run a functional household robot on local open-source inference today.
Personalized local AI education for children is one of the most practical near-term use cases: tunable, private, and not dependent on any platform's content policies.

Glossary

Terms worth knowing.

MoE (Mixture of Experts): A model architecture where only a fraction of parameters activate per token. A 744B parameter model might have only 40B 'active' parameters, making decode memory far cheaper than the total size suggests.
Quantization: Compressing a model's weights to lower precision (e.g., 4-bit instead of 16-bit), trading some accuracy for drastically reduced memory requirements and faster inference.
Tokens per second (TPS): The speed at which a model generates output. Frontier API services typically deliver 50–200 TPS; local rigs vary widely by hardware, from 15 TPS on modest setups to 200+ on high-end multi-GPU configurations.
Prefill vs. decode: The two phases of inference. Prefill processes the input prompt and requires loading the full model into memory. Decode generates each output token and only needs the model's active parameters — enabling hybrid hardware setups that optimize each phase separately.
RTX Pro 6000 Blackwell: NVIDIA's professional-grade GPU with 96GB VRAM, priced around $10,000. Multiple units can be combined to run the largest open-source models at high speed — eight cards give 768GB total.
DGX Spark: NVIDIA's compact desktop AI computer with 128GB unified memory, priced around $4,400. Designed for training and prefill workloads; slower at token generation than discrete NVIDIA GPUs.
GLM 5.2: An open-weight model from Zhipu AI with 744B total parameters and 40B active. As of mid-2026, one of the highest-capability open-source models and a benchmark subject throughout this conversation.
Fable 5: Anthropic's frontier model, available briefly before being restricted from public access. Referred to throughout as a step-change in capability — 'felt like a contributor, not a tool'.
Droid (Factory AI): A software harness for running AI agents that accepts pluggable inference backends, allowing users to substitute local models for cloud APIs.
XO: A software layer that combines multiple Mac devices or DGX Sparks into a unified inference pool, allowing prefill on one device and decode on another.

Resources

Things they pointed at.

00:00productGLM 5.2

07:00productOxylabs ↗

10:00linkdavidondrej.com/sero-podcast (free assets) ↗

13:20productFactory AI / Droid

33:20bookMeditations on Moloch (Scott Alexander, 2014)

1:02:30toolLM Studio

1:03:00toolRunPod

1:03:10toolLambda

1:03:20toolPrime Intellect

1:06:40productDGX Spark

1:08:20productRTX Pro 6000 Blackwell

1:13:20toolXO (multi-device inference layer)

1:25:30toolDroid harness

1:31:30productHermes 70B (uncensored)

1:34:00productUnitree robots

1:34:20productMicro AGI (German robotics company)

1:34:40productGemma 4 (4B, robot control)

Quotables

Lines you could clip.

01:00

“I'm using 374,000,000 tokens a month locally.”

Concrete number that makes the 'self-hosting is viable' claim visceral.→ TikTok hook↗ Tweet quote

31:00

“If the government controls the future of intelligence once we have AGI and beyond — it's over. It cannot be centralized.”

Strong declarative statement with urgency, no context needed.→ IG reel cold open↗ Tweet quote

52:00

“A company spending $300,000 a year on Anthropic billing — it's not beyond the realm of possibility to purchase hardware and reduce costs long-term while having private inference.”

Reframes self-hosting from hobbyist to enterprise CFO conversation.→ newsletter pull-quote↗ Tweet quote

1:22:00

“I can give inference to maybe 24 people with the cards I have.”

Surprising concrete claim — home rig as mini cloud.→ TikTok hook↗ Tweet quote

1:33:00

“I was basically way too early to any large technological revolution — and this is the first one I can actually take part in.”

Personal origin story moment, emotionally resonant for creator audiences.→ IG reel cold open↗ Tweet quote

Topic Map

Where the conversation goes.

00:00 – 12:00denseModel reviews: GLM 5.2 and Fable 5

12:00 – 40:00denseAI geopolitics and centralization risk

40:00 – 1:00:00denseEconomics of self-hosting vs. cloud

1:00:00 – 1:20:00denseHardware buying guide by budget

1:20:00 – 1:26:40steadyLive demos and use cases

1:26:40 – 1:30:00steadyStrategy: weights, Hugging Face, uncensored models

1:30:00 – 1:35:52sparsePersonal stories, San Francisco, robotics

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogystory

00:00This is running at home. This is my, uh, GLM 5.2. It's like a custom compression that I did for it.

00:05But if we go here, like, this is the same model. Right? This is a compression.

00:09It's, uh, it's compressed 80%. And so, you know, this is three d, you know, the flappy Bird. Bird.

00:14And this was a single shot, uh, attempt. This is another one. So this is running DeepSeg v four Flash.

00:20Uh, there are there are going to be two sessions left and right that are running. So you can see it's running pretty fast. It's able to do, like, a concurrency.

00:28It's doing its tool calls reading. This is a, like, private inference that I'm running at home that is capable of doing everything that the, like, frontier models is capable of doing, at least for work. Right?

00:37So I'm using 374,000,000 tokens a month locally. Alright.

00:42So, Sero, you're one of the top voices when it comes to, like, local models, open source models on Twitter. What's your thoughts on GLM 5.2? So I've been using GLM 5.2 for a few days now.

00:52And, uh, well, actually, I got access to it sometime last week. They usually give me early access to these models. Uh, the entire GLM series lineup from 4.5 up until now has been phenomenal.

01:03They're very good at agent work. They're, uh, excellent at coding on back end systems. They're really good at Docker and, like, DevOps type of work.

01:12They're good at GPU programming, uh, and, like, anything related to AI ops, like ML ops. Like, it's the only model that will always just reverse engineer anything.

01:21So if you just ask it, like, here's a thing, reverse engineer it, it is more than happy to to work on it for, like, eight hours plus to get it to perfect, and I really love it. I think it's one of the first models where people don't feel like it's been optimized for benchmarks because that's been one of the main critiques of the Chinese models.

01:37Right? That, like, usually, they perform, like, really well well on benchmarks, But then when you use it, they don't feel as good. So do you agree with this statement, and do you think, like, there's something different about this model?

01:47I I don't agree with this statement. I have never really believed this statement. You could say the same about the OpenAI models or the Anthropic models that they're bench maxed.

01:56And, um, I've found that, like, the Chinese models tend to make trade offs. For example, z AI's trade off is that they don't have vision in their, like, coding models. Without vision, it gives it more, like, space to learn how to code.

02:12Because if you wanna train it on vision, you're going to have to, like, have, like, a vision section. You're gonna have to, like, add training data related to that, and, uh, it's going to require it to be bigger, means it's gonna be harder to run and self host and, uh, use, even deploy for them. So I don't really agree that they're benchmark like, benchmarks.

02:30You know, OpenAI and Anthropic tend to release, like, a full package. So you take the model and everything just works, and it's it's like really refined and and the marketing around it.

02:41Like, Americans are very good marketers. So we are you know? Yeah.

02:45Like, think of Hollywood. Think of, like we're very good at convincing people. You know?

02:49Our stuff is the best. So I would say that's the case. But it is better significantly than anything that's come before it.

02:55That's for sure. Yeah. I do think, like, the point about marketing is valid.

02:59The Chinese models, a lot of people just don't wanna use them because they come from China even if the the inference is, like, in Europe or in USA. So I do think there's a, like, a marketing issue when it comes to a lot of these open source models. Like, how how do you fix that?

03:11Is it just, better UI, better pack like, better tooling around them, better harnesses? Yeah. I think I think mainly it's, like, distribution.

03:18So right now, if if an American wanted to try the GLM models, it's not gonna be available on Cursor. It's not going to be available in, like, ClaudeCode or Codex without you having to go to this website that, like, is typically not named the same thing as the model.

03:33You know, get get a card connected. You know? Listen to people telling you it's benchmarks.

03:38Listen to people telling you the Chinese are trying to hack your computers. Like, it's I think I think it just needs better distribution.

03:44And so a company that I would like to call out that does this well is Factory AI or Droid. So they have, like, these Droid core models, and, uh, they use the Chinese open weight models.

03:55I think they host them themselves or they partner with some American company. It is hosted on American soil. But they've been they've been able to get these models over to more people, which is something that I really respect and appreciate.

04:08People look at OpenAI and Anthropic as, like, a research labs and all these are, like, technical tech companies and, you know, scientific efforts, whatever, but they are also excellent in marketing. Right? Because, like, Anthropic, they kinda

04:18hyped up their models that they're gonna replace 50% of their jobs. They're gonna be dangerous to society. And then when the US government slaps them, they're acting all surprised, but even though, like, they had it coming.

04:28So maybe maybe that's a pivot to, like, Fable five and what's your thoughts on the model and on the whole situation around it. The web is full of public data. The problem is actually getting it.

04:38The moment you try scraping at scale, the websites fight back with captchas, rate limits, and IP bands, but it doesn't have to be that way. With Oxylabs, you give Oxi Copilot a target URL and describe what you want in plain English.

04:52It builds the scraper, the parser, all of it. Oxylabs literally handles the proxy rotations and capture solving for you.

04:59And even on the toughest of toughest sites, it has over 99% success rate. If you build an n a 10 like I do, the Oxylabs AI Studio node plugs right in.

05:10You describe what you want, it scrapes the data, and your workflow handles the rest. Under the hood, Oxylabs has over 175,000,000 residential IP addresses from over 195 different countries, giving you real time data for nearly any website.

05:25Plus, they have a new fast search API that can drop organic search results straight into your AI pipeline. And by the way, Oksilabs gives you 2,000 scraped results for free. Just try it at oxylabs.io/david,

05:40and make sure to use code david to get 20% off any paid plan. Thank you to Oxylabs for sponsoring this video. So I have I used the model the entire time it was out.

05:50I think, like, in API credits, used about a thousand $500 in two days. And, uh, I've tried, like, various types of, uh, prompts on it. So I have this directory on my computer called slash person.

06:02It's just personal. And it has all of my medical records, all of my social media posts, all of my Google data.

06:08Like, I just download everything from all of the sites, which you can do pretty easily. Like, in x settings, you can go and download personal data. The Google Takeout lets you take all your Google, YouTube, Gmail, calendar stuff out.

06:19And so I download this stuff and I rag it and I put it on my computer. And it's, uh, super useful in my opinion because I get, like, good search and I can, uh, you know, have, like, conversations and extract insights. But, uh, I put it there and I just had it, like, go over things.

06:33And then I told it, just give me, like, some insight or, like, some topic to make my life better. It found essentially, like, 25 out of the 20 seven or years of my life. And then it said, like, it wants to talk to me about that missing two years.

06:46And that was the first model that's ever done anything like that. And then it just like it like it it was like acting like some kind of psychiatrist, psychologist.

06:53Like, it was very good at human communication and and portraying empathy, which is rare.

07:00Like, GPT models don't do that at all. They will shut down these types of conversations. The Chinese models tend to be, like, more tailored towards coding, and they're not necessarily that great at this kind of thing.

07:11So I really love that part of it. That that was, like, mind blowing to me. But there is this important thing that I've tried.

07:17So there is a game called Bloodborne where you just, go and it's like an a a Japanese action role playing game. And you're the the first level is, like, you start off at, uh, like, a square, and then you have to make it through the fireplace, uh, fight these enemies, then go on a bridge, fight the enemies, and then go on, uh, to fight, this massive werewolf type creature.

07:37So I give this prompt to all the models. I just say rebuilt Bloodborne, basically. The the, like, very short prompt.

07:42Um, starting from Gemma three thirty one b, they will basically get the components of it right. So, uh, but everything is just this like a block, like a square block.

07:53You know, the gun is a square block. The the weapon is a square block. Um, so there there is no detail in the outputs.

08:00And now as the models get larger and larger, you start seeing more and more detail detail in the same functional thing. Right? It's like the same thing that they're outputting, but there's just like so much more detail to it.

08:11And I found that the, like, the Fable model outputs for that specific, uh, game benchmark that I do to be really, like, true to the actual game.

08:21Like, the types of enemies that there were were the same. Uh, the types of, like, behaviors and game mechanics were the same. And so I think that the model is capable like, the size of the model makes it capable of extreme detail.

08:33And I I really like that about it. I think it's a shame that we don't have access to it right now. Yeah.

08:37I mean, I had the same experience for

08:39all the projects I wanted to build. I just like you can see on my GitHub those four days or whatever, just like way more contributions. I just got way more done.

08:47And, also, it like, the difference I would describe it as like a, you know, Opus four point eight and five point five Pro or or, like, x high are more like tools. They really feel like tools you need to ex exactly explain what you're doing. But with Fable, it felt like an actual contributor.

09:03Sometimes I didn't have to explain what I doing, but it understand why am I doing it, like, the the deeper intent behind it. And, yeah, it felt like like something else is working on this project with me. Not like I have to be constantly the director and, like, give all the details.

09:15Yeah. I understand exactly what you mean. So the

09:19the main thing about that model that I think we should keep in mind is that there's a honeymoon period to everything. So whenever a new model comes out, everybody's excited.

09:28Everybody's, like, you know, talking about it. It's in people's attention and people are hyping it up. Like any model that comes out.

09:35Like look at GLM 5.2. People are hyping it up. And so what this this situation has done is it denied people from the come down of the the honeymoon phase.

09:48Right? It denied people from taking the time to see the flaws in the model. And so whatever, like, conceptions that we have right now, I I would say, like, I'm making a bet or a prediction that when the model does return and it will, that it is everybody's gonna call it like, oh, it's not the same model.

10:05This is like this has been lobotomized, and it's going to be the same model. So this is a bet I'm making. I don't I don't know if it's gonna happen or not.

10:12Yeah. It always always happens even when no when no changes are made, but, you know, that's what goes viral. Real quick, if you want all of the materials from this video, it's in the second link below, including the model matcher skill, which analyzes your computer, sees what specs you have, and which models you can actually run locally.

10:29So, again, all of this is available completely for free in the second link below the video. But I guess what's your thoughts on, like, the situation? Like, the ban?

10:37Like, I would say this is the closest thing. Like, you know, I I don't know. We we can talk about, like, AGI and what's your thoughts on, like, superintelligence, but, like, you know, this is a step in that direction.

10:46And I think it's a huge disservice to all of humanity no matter if you're American or not that this model is not available. Right?

10:53Even if there's a honeymoon period where, like, we are hyped about it, it's new capability, you know, new step change. But, like, personally, I could feel the productivity increase, and I know that if everybody had access to it or even better if it was open source, that, like, the progress of society would be way faster.

11:08So what's your thoughts on this ban? Do you think it's justified? Do you think, like, this this even makes it more clear that the future of AGI needs to be open source?

11:16How would you think about this?

11:18So, like, we can we can take two paths here. Like, path one is that the model is truly capable of, like, extreme cybersecurity cybersecurity risks and biological terrorism risks.

11:31I don't think that is actually outside of the norm of reality. If the model is capable to get you closer from, like, let's say, rat DNA to some kind of, like, virus that is gonna hurt people, then it is actually worth taking a look at the model and seeing how much it's being used for this kind of stuff, and I guarantee you it is.

11:50Like, just like every model. Uh, people are asking how to make bombs. People like, this is the the reality of the situation, and we need to be able to confront that.

11:57There's also the risk of hacking. So now, like, imagine this. You have an open source repo and they have a pull request.

12:04They don't say what the pull request is about, and there's a lot of changes in it, but it's like critical software, like, let's say Linux. And they're probably hiding, like, some RC like CSV.

12:15I forgot the the code for it. But, like, there's they're probably hiding a fix to some kind of security bug, which they do in every release. So you can have the the models, like, that are really intelligent.

12:25You can have them just sit in loop on all of these open source repos and try to identify essentially fixes for security bugs. And now until that PR is merged, now you have a zero day exploit. So I'm just giving you a scenario that somebody has told to me, which I I think is is really true.

12:41I used to work in financial analysis and blockchain analysis. And one thing that these people do is that they'll have a model just sit on the blockchain and watch every transaction and then find something to exploit.

12:52Like, they they'll find, like, some opportunity for making money, and they'll just sit and try to exploit it. So now you're you're pouring fuel on that fire. Now I don't I'm not saying this is actually why they banned it.

13:03I don't think that's the case because you can do this with other models as well. Uh, I think what's what's happening, like, it could be that.

13:10It could be that they're just like they just, like, open their mouth too much to the government, uh, or it could be that this is, like, a way for them to shift the narrative from them being a SaaS company to them being a weapons manufacturer, which is much easier to raise a trillion dollars for on an open market. If you think about, like, American companies that are also weapons manufacturers, they can get close to those numbers.

13:32So I think it's an easier sale. Now, uh, last last point on that is regardless of what happens, I think we are going to a place where, like, maybe we get one more generation and that's it for, uh, improvements for civilian usage.

13:49I don't think we can keep pushing past this point without it being a big issue. So the Chinese companies, Moonshot, DeepSeek are either partially or fully owned by the government in some capacity, and they are, uh, like, did the so it's, like, not a choice anymore.

14:04It's like, this is what's happening, and we have to figure out how to accept it and then build an alternative so that when they do take it away, we can still do what we're doing. We can still, like, learn and, you know, contribute to science, and, uh, I think it's super important.

14:18Okay. So this is fascinating perspective. I definitely wanna go deeper there.

14:22So you think, basically, the governments, whether it's American government or Chinese government, which, you know, are the two countries really at the lead of Carnegie AI, you think they're just gonna shut it down in the next one or two generations?

14:33I don't think that they're going to shut it down. I think they're going to stop any type of, like, progress that is going to improve the models in intelligence. So the models may be like, there will be newer models, but they're going to be more like efficiency gains.

14:45Like, how can we do this with less tokens? How can we do this with less, uh, costs to the user as opposed to how can we make this model do 20% better on DeepSwee or whatever?

14:55Because it's getting to a point where if you just keep increasing the intelligence, it really does become like, um, like an existential risk of, like, cybersecurity. We're not gonna have, like, private, like, homes anymore.

15:06Like, your home network can be hacked. Anything can be hacked, and, uh, that's gonna cause a lot of problems. So this is what I'm thinking is gonna happen.

15:13Uh, I read this online, of course, and that's influenced my opinion a lot because it makes sense. But may maybe it's maybe it's not the case. Okay.

15:20But the counterargument would be that, you know, let's say the latest generation that is available,

15:25open source, you can have, like, any country, whether it's Iran, Russia, or just any random country, download these models and, like, put their best engineers on them and start fine tuning them and start the right reverse engineering them, start training better models based on them.

15:38So, like, how how do you think about that? Because, like, the progress technological progress, I I don't think it can be stopped, I don't think it can be stopped for sure. I just think it's, like, it's very easy to slow something down.

15:50So going back to the example of blockchain cryptocurrency,

15:54is it full of scams, full of people robbing each other, full of, like, corrupt behavior? 100%. And it's even more so because it's a financial thing.

16:02But it's still like open source financial services. Right? Like, if you are in a country where you cannot like, you I've known I live in Warsaw, Poland, and I have known many Ukrainians who have fled from from, you know, their country.

16:15And they had to live in the forest for weeks, and then they have to bribe people to get to the other side. And so the a lot of this is happening in crypto. Right?

16:22A lot of donations back to people are happening in crypto. Whether that's good or bad doesn't matter. The point is, like, this is a necessary service, but governments can still ban it.

16:31It's still happening under their hood. It's still happening within their organizations, within their government, but they can ban it for, like, the average person, right, like, the average consumer, um, which is what I think is, like, probably going to happen.

16:44Now I don't know. Again, like, there is a chance where what I'm saying is wrong, but, uh, this has happened with a lot of technology beforehand. Right?

16:52Yeah. I mean, I guess nuclear is the biggest example. I think it was a, like, a tragedy that we had nuclear reactors and, you know, this magical form of energy, and they got shut down.

17:02And now, like, you know, Germany and lot of other countries are going back to coal, basically, back to the stone age, like, literally reverse progress. But when it comes to AI, like, it it's a lot simpler. You know?

17:13It's runnable locally. Not every not everybody can build a nuclear reactor, but, like, people can set up their own rig and start fine tuning.

17:21Like, I think it's a lot hard to ban it. No?

17:25Yeah. So this is this is why I'm so excited about, like, local AI, open source AI. So partly is because I love this technology.

17:33I it it has truly made me a better person in every way possible. Like, I am smarter. I am more capable.

17:39Uh, I am better, like, as an adult. I can pay my taxes better. I can, like, figure out how to be, you know, normal.

17:45Right? Like, it helps me, like, schedule my calendar, helps me learn, uh, apply for visas. It's very hard stuff that I was not capable of doing before AI.

17:53I was just, too scatterbrained, and now I can do a lot of this. And so it's helped me. And, um, I you can run it at home.

17:59Right now, like, I think the Quen models, like the 27 b and the, um, uh, the other one, the 27 b and the 35 b are truly better than Sunnett four in every single way, all on all the benchmarks. So is it like, I I and I would also challenge people to because you can still use Sunnett four, uh, through the Cloud Code subscription.

18:18I challenge people to, like, try and see, will you get better performance on a MacBook or on Summit four from a year and a few months ago? I think you will get better performance with the Quen model. So, uh, but that requires people use it.

18:34That requires people, uh, wanna protect it. Every single right that we have, we have to kind of fight for. Personal computers, the Internet.

18:42Right? Like, the Internet freedom thing in 2016, 2014, like, they were trying to shut down, like, make it very hard.

18:49Like, so in America, they were basically trying to stop people from going on four chan and other websites from their mobile subscriptions. And and we have to fight for that, like, our own freedoms.

19:02So if people start installing these things, if people start running these things, there's nothing the government can do to stop us. But if people don't start doing this within the next two years or three years, what they can do is go to the manufacturers or go to the distributors.

19:15Right? Because Hugging Face is a western country. Can go to Hugging Face and you can say these models are deemed a security risk.

19:22They're on the sanctions list. You have to take them down. But if everybody has Quen on their machine, it's not possible for them to do that, and it's not worth the effort of going to Hugging Face.

19:31So I would the I would say that, like, this is kind of like a we have to fight for our freedom thing. You know?

19:38Obviously, peacefully and nicely and respectfully, but we we do have to fight for it.

19:43Yeah. And I think what a lot of people don't realize, the reason why it's a severe situation is that, you know, if people remove your access from AI, you're done.

19:51Right now people don't see it because, okay, you would go Fouche, GBD, Claude, blah blah blah, whatever the average consumer uses, you would probably survive in the world. But if we extrapolate two years, three years, five years in the future where AI is super intelligent, really removing your access from AI would be worse than if someone took away your Internet, took away your electricity, took away everything, like other technology, and, like, put you back in the stone age.

20:13It will be 100 times more crippling in five years to not have access to AI. I'm certain of that. And a lot of people need to, like, you know, go into long term thinking and see, okay.

20:23If the future if AI keeps improving and it is improving, how bad will it be if the government takes away your AI access? It'll be insanely bad.

20:32You'll just get unable to function in society, unable to compete in business, anything. So if people start to realize that, I think they'll realize the gravity of the situation and that there cannot be any centralized control.

20:42Whether it's Sam Altman or Dario Amodey, whether it's some, like, group of small companies or a few governments, it doesn't really matter. There cannot be centralized control. This technology is just, like, too important for the future of humanity.

20:54I love the example. This is only really relevant for Americans,

20:58but American health care system. So in in the American health care system, you have two options. Option one is you get your health care from the government.

21:06So you you get a massive subsidy from the government and you pay in still like 12 plus thousand dollars a year. Option two on top of your taxes. Option two is you get it from your employer.

21:17And when you get it from your employer, you get better health care. You get your you know, you get dentists. You get eye doctors.

21:22So what's gonna happen in the future, in my opinion, is you will have a subsidized national AI in countries that are more like socially leaning. So like if you could think of Europe, like they'll they'll probably have a deal with Mistral and they'll like distribute it for free for people. But then every single thing that you say is going straight to the government, everything.

21:42And they're going to be able to, like, mass analyze it, like, and connect all of these dots and, like, decide who's who's worth, you know, giving the smarter models to, who's worth, like, literally throttling on purpose or ruining their life on purpose. This is gonna happen. This is already happening.

21:56It's just gonna be scaled up. So and in The US, if you like, for example, you go to a company that's really strong, they're gonna give you the equivalent of opus. And if you're really good at your job and if you're really like kiss ass to the boss, maybe you'll get fabled.

22:11This is what I think might happen. Like, it's kind of like one of those nightmare sit situations for me personally.

22:17Does that make sense? Yeah. It's happening already.

22:19Like, not not necessarily like the government surveillance, which I think also might happen, but, like, the companies, it's happening. Right? Like, people led by company, they they can use basically unlimited stuff as long as they're billing something useful.

22:29Right? Like, when when it comes to, like, 200 a month of Corex subscription, ClothCloth subscription, OpenRouterKey, whatever, like, as long as they're they're doing something useful and they can show their output, like, I'm happy to cover it.

22:38Right? So it's already happening with, like, companies that understand the importance of it. They're gonna give you the inference, and you're gonna be able to run way more than you could afford personally.

22:47And, yeah, I think a lot of people are also gonna look at that the same way they look at, like you said, health insurance. Right? A lot of people don't wanna work with companies that don't provide any health insurance or whatever.

22:55It's gonna be the same thing with AI. If a company cannot even cover, you know, like, share g b d pro subscription or cloud code subscription or give you some, like, budget for open source models that or, you know, some local AI rig. If a company doesn't have a local self hosted supercomputer, then lot a of the top talent will be, I guess, dissuaded by that and go to companies that will give you, like, unlimited AI budget.

23:16Well, you can already see this with Anthropic. Like, many, many, many smart people are leaving from other companies or from being independents and going to Anthropic because they objectively right now have the best model that is available on the market that we know of.

23:30Right? They have the best model. There's not there's no doubt about that.

23:33Maybe OpenAI can compete in two releases. I don't think the next one is going to be that level. I think they might shoot for the one after that, uh, to be, like, you know, either that level or or above it.

23:43Uh, but Anthropic has proved that they have the best model. And so, uh, the the incentive to go work for them as opposed to work for yourself or do anything else is enormous.

23:54It's it's enormous. Even if you're like a local maxi or an open source maxi, you know, if you can do your life's work, like if you can truly be the best you can possibly be by going to Anthropic, why not?

24:07Why why shouldn't you do that? You know? It's it doesn't make sense.

24:10So I think we we we just need to start, like, figuring out how to spread the intelligence. Spread it out so there's not these, like, massive bumps where, uh, you know, things centralized and and things uh, like, you you have to basically surround yourself with with how Anthropic wants to do things to get access to intelligence, we should have it available everywhere.

24:30And and, um, I I I think, you know, it should be considered a human right, honestly.

24:35Yeah. Absolutely. I agree.

24:37Um, on the topic of Anthropic, I guess you mentioned, like, the top people prefer working for them even though they're, like, open source Maxis, but why? Like, why would people go to this closed source company and, you know, maximize like, these people who also have, like, tens of millions dollars in career earnings. Right?

24:52Like, why do they go work for Anthropic? Like, do they believe they have, like, recursive cell improvement? Why don't they, like, work on the open source side of things?

25:00Because if if you believe that Anthropic has the most intelligence in the world right now and you are somebody that is motivated by intelligence and by the understanding of intelligence, I I don't think that it makes any sense not to.

25:14Like, I think that that model, right, and we got like a kind of nerfed version of it. I think that that model is a complete step up, like a completely different thing than we but but, you know, again, I might be biased. But that's what I think.

25:26And I think that's what people like Andre believe is like, this is a huge step up. I'll go in there. I'll learn for like five, ten, twenty years and I'll come back and I'll spread spread the the the intelligence openly.

25:38Because, you know, if he was working for all these companies that weren't able to hit that level, then that means, you know, there's something he can learn there. I think I I at least this is like I'm I'm projecting my beliefs and my my my thoughts onto him.

25:50I, uh, you know, he he's a he's a leader in the space, and, like, a lot of people respect him. I don't think he's malicious per se. I just think he's motivated by learning and intelligence more than anything.

26:00I see. So, basically, it's like the need to understand, you know, the whether it's, like, scientific need or, like, more like sci fi, like, universe. These top people, the top talent, they just seek to have that knowledge and to, you know, interact with the closest thing to superintelligence that we have and to have basically unfettered, unlimited access to it.

26:17And, yeah, like you said, Anthropic is kinda that place. By the way, everything we discussed in this video, you can get absolutely for free by clicking the second link in the description, including a custom rig planner skill that allows you to plan your own hardware rig at home depending on your budget and depending on your requirements.

26:33So all of this is available in the second link in the description completed for free. So go grab it now. Do you think that's the main reason why they are able to keep all their cofounders, all their top employees?

26:42Or, like, why does Anthropic has such a low employee churn compared to all the other AI companies?

26:48So if you look at, like, the, like, the way that they call their employees, everybody is chi like, what is it? Like, member of technical staff. Right?

26:54So that that's one thing. Another thing is that they are building something that is very artistic. In my in my, like, in my opinion, what they're trying to do is, like, embed soul and character and persona and and, like, something that is deeper than just, like, an a task executor.

27:10They're trying to embed that into the model purposefully. So they write the, like, Claude constitution. So I think a lot of this is more like moral beliefs and, like, the types of people that also work at Anthropic are very, like, morally driven.

27:21They're very like, I've met and spoken to many of them. Um, they have, like, basically helped me run a ClaudeCode meetup in Warsaw.

27:30They they have been nice people to me in per like, in person. Right? Every single one of them.

27:35And I think that, you know, all of this stuff goes into creating an atmosphere. It's like you're you're developing really fast. You're innovating on a lot of things.

27:43Um, you are working with people that are, like, very morally and ethically driven, uh, and you there seems to be, a little bit of an equality within the company. Like, there isn't that much of a hierarchy where this person is more important than this person, you know, at least in the way that they're titling things to the public.

27:59And I think all of that kind of goes together to create an atmosphere of, like, a you know, people say cult, but I would say, like, uh, like, an imagine an open source project that is, like, really great and people love. I think, like, that's kind of the same mentality is that they're doing it because they like it as opposed to doing it because they wanna, uh, have a good job or

28:17but maybe maybe I'm wrong. I don't know. I'm just assuming.

28:19I mean, yeah, the the work environment for sure is probably on point, but I'm wondering, like, whether they don't see any of the risks. Right? Like, they're all trying to it's it's kinda funny, but they're all trying to, like, create the AI automatic researcher.

28:33Right? So they're all kinda trying to, like, replace them self out of the job. And, you know, whether that will happen or the company will keep hiring more people, there is other risks.

28:42Like, for example, getting nationalized. Right? Like, that's not out of the realm of possibility.

28:45So, like, are these, like, researchers just blind to these, like, practical geopolitical risks, Or, like, are they just, like, all about, like, I I I wanna have access to the best AI? I you know, I think everybody that's doing anything thinks that they're doing the right thing. Or, like, it's very easy for people to you know, the majority of people think that whatever they're doing is the right thing.

29:03There's this Yeah. Article called Meditations on Molok. It was written by, um, Scott Alexander in 2014,

29:11and it talks about how whenever there is like a a pocket of an incentive. So, like, imagine, uh, you have, you know, the the cells on your body, they're all independent creatures that come together to create a whole.

29:23And if there's one that figures out, well, they could just exploit the resources to keep multiplying and keep spreading their own genetic line. They're going to kill the host, but they do it anyway. Right?

29:33So there's this, like, incentive, uh, mechanism that is transcendent of, like, humans or, like, it's you are a creature that requires sustenance and stuff to survive.

29:43And so if something is gonna give you that, like, at full force more than anybody around you, you're going to follow that pocket. And if you don't do it, someone else will, and that's enough. Like, if if if I know that I don't do something and somebody else is going to do it, I mean, I'm I'm gonna have to do it.

29:58Like, why would I let myself, uh, be on the sidelines when I am capable of doing this and it's going to happen anyway? And, you know, the way that they market themselves, the way that they speak on these things is is this mindset.

30:09Like, they have this, like, mindset and this belief system is like, it's gonna happen anyway. It's better that we do it because we think we know better. And maybe they do, maybe they don't.

30:18Like, I don't know if time has, uh, proven out what they're saying. It's gonna take, like, five five years plus maybe to see what, you know, the consequences of whatever is happening right now. I see.

30:31I mean, yeah, that logic does make sense. But, like, how does that look like practically? Because like you said a few minutes ago, you think the government will stop giving, like, access to the best models.

30:41So at at that point, we'll just the the closed doors become even more closed, and it's, like, literally just a internal company and some, like, project like, Glasswing, you know, 200 enterprises, and that shrinks to even less and the government, and that shrinks to just the government. How how does that look like?

30:56Well, if you think about, like, the economics of it so I I I run my own rig at home. I have four six thousands. Before that, I I had eight thirty nineties, and I have, like, like, the sparks and stuff at my house, and I'm doing inference, like, twenty four hours a day.

31:11So the cost of the GPUs, they are using mostly b 2 hundreds. Like, they are paying, I think, a billion dollars a month to to Elon.

31:21Right? So a billion dollars a month to Elon, and they are selling mostly these subscriptions.

31:26And these subscriptions, like, I can go look right now how much I'm getting back, something like 4,000 to $8,000 a month every month from my $200 subscription. And I think that that is an accurate relative cost. Right?

31:38Like, it's probably two x, uh, more expensive.

31:42They're selling it for two x or maybe, like, uh, three x the price. That's possible. But then if you think about, like, the taxes, they have 2,500 plus employees.

31:51They have to deal with all the regulation. They have to also, like, give money to the government so that they can get their way. Like, if you start, like, thinking about all the costs, like, it gets lower and lower and lower and lower.

32:00And if you have somebody who like me, who in one day can spend 4,000,000,000 tokens, like, with with OpenAI, I spent 4,000,000,000 tokens. That's 8 b 2 hundreds running for 24 like, 16 b two hundreds running for twenty four hours just for me.

32:13And 16 b two hundreds costs about a $120 an hour, and I'm doing it for twenty four hours. So you could you could start to see, like, this is completely, completely unprofitable.

32:24The way at least we, the end consumers, are getting. What is profitable is enterprise contracts. So when they deal with enterprise, they remove subscriptions.

32:31So it's it's it's token usage only. So because of that, that's profitable.

32:37So their incentive is going to be to give money, give out this free money to the end users, to the people on the Internet, get them really excited, show them what's possible, then they go tell their companies, hey, let's get an anthropic subscription and, uh, then the company's onboard onto a subscription a year ago.

32:55It was a hundred twenty dollars for, like, even more usage. But then once they onboard and their entire company is built on Anthropic, then they switch off.

33:04Now it's it's token usage, and then they see, oh, wow. Well, this is pretty expensive. And so now you just, like, filter off all the companies that can't afford to pay you, and you have a profitable business, which is where they are right now and which is what I think OpenAI struggled to accomplish.

33:19Like, they had it in reverse. They had enterprise first, and then they went down and, like, you know, targeted the developers. I like I I like their message better.

33:26I like the fact that they're pushing intelligence for everybody. At least they're saying that. But yeah.

33:31So does that does like, the economics of it don't make sense unless you do it that way. But the question is, like, if the if the generation or two generations after that cannot be released,

33:41how does that work? Or or will it only be released to the top companies that the government approves? Or how do you see that?

33:47I think that's what's like, yeah, it's it's likely that the the government is going to have, like, a sanctions list of all of the types of entities that can't get access to Fable and, like, Fable or whatever Fable two. And then there are going to be companies that don't fit into that, like, sanctions list, and they can use it.

34:02So, uh, like, for example, sanction all people that have less than, whatever, a thousand, uh, employees that are or, like, that have more than 20% of their employees are are non American. Like, you could find a million different rules to do the same thing. Yeah.

34:16Uh, so and and that's gonna be very profitable for, uh, Anthropic, of course, because the companies already have money. You already have, like, the pre filtered best customers, including the government.

34:25So what do you think, like, will will happen as the technological progress will keep going? Like, you know, let's say, you know, AGI, then after AGI, closer to ASI, will that be really in control of the government?

34:36Or, like, how how do you see that going at the current trends if nothing changes? So the way that I feel about the models right now is that they're they don't have,

34:45like you know, we say agents, but they don't have personal agency. They need to be interacted with from the outside.

34:51Right? Like, they they need to be triggered. Uh, I don't know if anybody is building the infrastructure that is required to have a model just be like an actual free entity where it can go around.

35:02And if they are, they require a lot of b two hundreds or b three hundreds or whatever the the new, uh, the new Verirubin stack or, you know, a bunch of h two hundreds. So that like, imagine that as, um, as an organ.

35:14Right? Like, that is an organ of whatever it is the creature that we are trying to create. So it's very easy to target.

35:20So if if if they create this, like, RSI, ASI, like, independent entity that is running around the world, it's still running in data centers, at least for now. And so if it's running for data in data centers, people are going to be able to target those data centers. They're gonna be able to wreck them.

35:34So what we're already seeing this again right now. So in, um, Dubai, they were building star star something, like, Stargate.

35:43And, uh, Iran specifically bombed exactly that area, specifically so they cannot build the data center.

35:50Right? Uh, because they know that's where it is, and so we have, like, Stargate, uh, two or I'm not really sure the numbers. And, also, I could be wrong.

35:58So if if I am, I'm sorry, but this is what I heard on the news. And, like, we could kinda see, like, I don't know if it's ever going to progress to the point where it's controlling the entire world. I think it's more like this is a a situation of convenience.

36:14AI is very convenient. It's very helpful, and so we will give more and more of ourselves to these systems, um, and, you know, help others pilot these systems, like other human beings pilot, like, a huge data center farm and have access to all this intelligence, which is where the risk is.

36:31I don't think the risk is, like, something taking over the world. It's, like, people

36:34kind of mess messing up, which is very obvious. Yeah. So I guess the it's the same ideology as Bitcoin then.

36:40Right? Like, the decentralization of something highly important.

36:43This case, it's not, like, money and financial system. It is just intelligence,

36:48which is even more important. So, I guess, what what's your fault on Bitcoin? And, like, do you have the same ideology for, like, future of AI?

36:55I pretty much all of my money is in Bitcoin and ETH, honest like, almost all of my money. I think I have some But you lost it in a boating accident. Right?

37:03I lost in a boating accident. Yeah. Yeah.

37:05I don't have the private keys anymore. But the yeah. So but also, it's, like, it's actually in a bank, like, in in bank sinks.

37:13So the the the situation is, like, Bitcoin is just another evolution of free compute and, like, freedom through compute.

37:22Because we we first had these, like, timeshare systems. I read this in a book called Weep Programmers. Programmers used to have to go and, like like, put in a slip and they get access to a computer for two hours, and then they have to write their program and it has to compile the first time or your your access is, like, useless because you're not gonna get feedback for at least an hour of it processing.

37:41And and so then computers started to spread around the world and spread more and everybody had a computer in their home. And and then the financial system came up and, like, we can start to digitize it. And so there were many like, there was, like, these decentralized science, uh, in the nineties where you can connect, like, an I think it was, um, a device that, like, looked into the sky and mapped the stars, and you can share that data.

38:04Um, and this is, like, one of the first decentralized science experiments. Then there was Internet beans, which was, like, I think, a plug in, and you just, like, collect beans on different websites, and they didn't have any value. And and that evolved into Bitcoin, and and Bitcoin evolved into ETH.

38:17And and so I think this technology is just like an evolution of human freedom. We are trying to build systems that make us more free. And I I love Bitcoin.

38:25I think Bitcoin should should always be something that people consider holding even a little bit of money in, um, just for the fact that it is controlled by no one. Um, and I think, you know, AI also falls in into the same categories.

38:38Like, this is freedom technology, and, hopefully, we can spread it out as much as possible. Yeah.

38:42I I agree. I basically have the same opinion.

38:45And I I think, like, people really need to realize that with AI, and they need to, I guess, think more into the future and, you know, realize that, like, whatever it is right now, it's it's not the same as taking away your description. It really is the same as, like, taking away the older technology.

38:58And, you know, maybe it's more easier with, like, electric cars that, like, shutting off access where you can drive and, you know, it's easier for people to visualize that. It's kinda harder to visualize the not having access to intelligence because people cannot, again, you know, see the exponential. They cannot see into the future.

39:13Most people just, like, lift paycheck to paycheck, so they cannot even project long term. So I guess how can we, like, practically do something? Right?

39:20Like, obviously, we we can encourage our followers and viewers to, like, run AI locally, get into fine tuning, download the model weights. But, like, what we can do as, like, humanity society

39:29to really make sure, like, the future of AI is open? So let let let's make it actually very clear to visualize what's going on. So right now in The United States, we have Waymo, and we also have Teslas, and they're developing the Tesla trucks.

39:44And and it's been possible, I think, for the last five years at least to have mostly autonomous vehicles. Right? So 35% of people in The United States survive off of income from driving.

39:59They make their money in transportation, driving trucks, driving Ubers, driving cars, and that money then is going into feeding the families by buying groceries, by paying taxes, by going to, like, events.

40:12And then that so, like, the the amount that driving touches is crazy.

40:17And the reason we don't have full self driving, like, three, four years ago and all the companies haven't moved to it is because the government is purposefully trying to slow this process down because we are going to have like fifteen to twenty percent of the male population unable to make money.

40:35So if we if we just flip that switch on and we say it's fully legal now, go ahead, you can invest in it, you can you can work on it, It's gonna be very hard for a lot of these people to, like, find a job. What are they gonna do? I mean, the you know, it's cheaper to use a robot to just drive around.

40:50So then these people are not gonna be able to pay their their neighborhoods, and, like, you can start to see the cascading effect of, like, what will happen as soon as self driving. And it is it is officially in Europe now, and Europe has been really adamant on not letting this kind of thing happen.

41:04And now it's in The Czech Republic. Tesla's full self driving is active in The Czech Republic for the last four months, uh, and it's spreading over to Germany. And, uh, you know, obviously, it's it's already in The UK.

41:15And so we are seeing the ripples of this. Doesn't mean it's gonna happen in a month or a year or five years. I don't know.

41:20Um, but we are starting to see that this is developing. And so what do we do when people cannot get jobs?

41:26There are very little entry level jobs for men. Right? I I'm using men because I just I just remember my personal experience trying to get a job before I was doing programming.

41:36Like, it's it's very hard to get a job that pays you anything that makes life worth living. And so you're just being shoved out, shoved out, shoved out. Now programmers, like, it's it's very hard for a programmer to get a job as soon if they're if they're starting, you know, college and they think that the it's similar to what was happening ten, fifteen years ago where they can just put their CV up and and get a job.

41:55It's, like, more difficult. They have to be more like the average person now. And so we're already seeing this happening.

42:00It's just it's just like we have to think logically ahead based on the past. What has happened in the past whenever technology has been able to automate a bunch of jobs, warehouses, and and all that, and what is the effect on it? In America, we don't have much, uh, warehouse.

42:15Like, what are these called? Not warehouses. Factories.

42:17Like, all of the factory jobs have been outsourced to Mexico, China. Right?

42:22And then you hear this from men now twenty, thirty years later saying, well, we can't get a job. There's no union jobs.

42:28We can't afford a house because we outsourced it. Right? And there's nothing for you to do.

42:32So unless you upgrade yourself, there's nothing for you to do. So I think this is kind of how we should visualize it. Now if we take a step, like, towards positive, like, how do we help?

42:42And so I run three meetups. Uh, I have a meetup in Warsaw, uh, and I just educate people.

42:49I just literally go and, like, here's how you use codex. Here's how you use open models. Here's what this means.

42:55Um, and I do this, uh, on the Internet as well. Uh, I post these videos. Every single thing that I do, I literally post it online because when I was trying to start self hosting, there was nobody putting this information online.

43:08I couldn't figure out, like, how much tokens a second I would get on a certain model, uh, on a certain hardware. Now I can get that because I'm making that data and other people, of course, that are also convinced.

43:19So making data more accessible, educating people. And then last but not least is, like, what does an alternative economy where we help each other look like instead of, like, me, like, trying to trick you and you trying to trick me, which is, like, a lot of the economy right now.

43:34Everything's, like, people trying to, like, gamble against each other. You know? It's it's it's it's very corrosive.

43:39What does it look like if you and me can help each other make more money and obviously spread prosperity? Because I I think it would be very easy for us to figure that out together. It's just we, like, all of our our systems in this society is like sorry.

43:54I know I've been rambling for a minute. I go out here in San Francisco. I usually live in Warsaw, but I go out here in San Francisco.

44:03I have to get an Uber to go anywhere. I have to get a car. It's too far away.

44:06There's no public transport. Uh, then I have to pay, like, a stupid absurd amount of money to get food, and then I get, like, this thing turned around, and they're asking me for a tip. I go to a grocery store.

44:15They're asking me for a tip. Everything is so p two p. Like, uh, PVP, I mean.

44:19Everything is, like, so aggressive. Right? And, uh, I think I think that's really destroying our society completely.

44:25Sorry. I I I've been, like, so negative this this podcast. No.

44:28It's it's fine. It's fine. Also, I I wanna ask about, like,

44:32why why Poland? Why Warsaw? Because personally, I I moved to Katowice now because I used to live in in Dubai for two years, and then it started getting bombed.

44:40So I moved back to Europe. And, you know, Katowice is close to Czech Republic where I'm from. So I'm wondering, like, why Poland?

44:46Why Warsaw? So why Poland? Why Warsaw?

44:49So Warsaw

44:50well, one, I met my wife in The UK, I think, like, almost eight years ago now. She is Polish. Then my UK visa ran out after COVID.

45:01I had to go back to The US. And the process for getting her a green card and having her move would have taken three years.

45:10And if I didn't stay with them, like, I don't think a relationship could survive that long. Like, on average, I don't think it's it's very hard for a relationship to last long, like, um, you know, with people far away. So we started exploring other countries to live in.

45:23Like, we went yeah. We went to we went to Mexico. Uh, we went to Poland.

45:28We went to the Germany. And, uh, I think I loved Poland, like, when I went there. Uh, I really loved I love the people.

45:35I think the people are, like, very libertarian minded. At least the people in Warsaw that I know, like, they're very, like, uh, hardworking.

45:43Things don't close until 10PM. You don't see that in most of the world. Right?

45:47Now, like, leave like, whether this should be the case or not. Like, people work super hard. Um, people are very honest.

45:54Uh, I think that's also really rare for people to be honest to you and, like, tell you things in your face. Uh, and the government has been incredibly kind to me. Like, everything has been pretty easy.

46:05I just, uh, I just put an application.

46:08They asked me, like, okay. What do you do? Here's what I do.

46:11Here's your card, and I'm good to go. Like, it it's it's been really easy to interface with them. Uh, hopefully, that answers your question.

46:18Yeah. It does. I mean, I chose Poland as well for similar reasons and, you know, is the fastest growing economy since COVID.

46:24It's, like, one of the safest countries in the world, and I think it's on a great direction. I I just wanna hear your thoughts on differences between San Francisco and Poland. Because, personally, I've never been to San Francisco.

46:34I think I need to go this year to kinda experience it for myself. You know, you mentioned some of the negatives of it, but, like, what's what's your full opinion on San Francisco? Because that's, like, where most of AI is happening.

46:44Yeah. So I've been here for a month and a half. This has been, like, my second time to San Francisco.

46:50One thing about America and San Francisco especially is Americans, they love opening doors for like, open not opening, like, physical doors, but, like, they want you to succeed most of the time. So if you go somewhere, like, they will let you in.

47:03They will you it's very easy to get into Apple. It's very easy to get into NVIDIA. It's very easy to get, like, massive investments.

47:09Um, it's easy to get connected to smart people. Um, and, uh, it's because Americans in general, they like they like the underdog story, you know, like David versus Goliath, the underdog, the, you know, like, the Rocky, uh, like, that kind of, you know, somebody that is less likely to succeed succeeding because of, like, working hard.

47:27This is like an American, uh, fundamental, like, founding of the country story, and, uh, I I see that everywhere I go. The city is extremely expensive. I think a hotel the cheapest hotel you're gonna get is gonna be $6,500 a month.

47:42Forget about Airbnbs Crazy. Because I would not stay in an Airbnb with other people.

47:47If I and if you want to do it alone, it's 8 to $9,000. Food is extremely expensive. Like but the city is it's like a slice out of heaven unless you go to the like middle.

48:00So that, you know, you have the top heaven, the bottom heaven, the right, the left heaven. You go to the middle, and it's like a zombie town. And every time I come back here, it's shocking to me, like, how we can accept this being a reality in this country.

48:13Like, people just like, it's like a zombieland. I've never seen it anywhere else in the world, and I've been to so many places. It's it's a zombieland.

48:20Um, so that that's a bad thing. Uh, but there's no way you're gonna be able to succeed more than America. Every time I come to The US, my income, like, almost doubles.

48:29And seriously, it's like it's it's just such a good thing for business. Last thing is that the energy here is insane. Like, the people here, like, they're in a bubble for sure.

48:38Like, in in their you know, in in they're in a bubble. But the like, they are some of the smartest, most, like, driven, high talent, high agency people on the planet. I think I think you would have a wonderful time here.

48:50Just make sure, yeah, to be ready, like, for the financial situation and for food to, like, keep your body healthy. It's gonna it's gonna require, like, some attention. But I I I think it's one of the best places you can go, honestly.

49:03Yeah. I think I I think I need to do it just for the vibe and, you know, to see, like, really the difference. Maybe not for permanent living, like you said.

49:09You know, I think I'm happy in, like, Central Europe. I think it's actually one of the best places in the world. It's kinda crazy because I've been born here, and I only see it, like you said, after traveling to different places.

49:18Right? And, like, experiencing everything else that I realized how how blessed I was when I kinda spawned in this world. But, yeah, I guess a lot of people wanna get into local models, self hosting, all that stuff.

49:28Where would you recommend them get into? Would you recommend them spend a couple thousand dollars? Would you first tell them, like, try running something locally, whatever your computer can handle?

49:36How should someone start?

49:38So okay. Any computer right now is able to run a model that is capable of tool calling and doing stuff, every single one. So try that.

49:49Working like, I think LM Studio is an excellent start for most people, whether you're on a MacBook or on a Windows or LM Studio is going to help you get something running.

49:59Right? It'll tell you what can run. It'll help you get it running, And then you can see that.

50:05And then if you start thinking like, okay. I think I wanna invest in this. I think I wanna spend money on this.

50:10What I would recommend you do is go onto, uh, RunPod or, uh, Lambda, Llama not Llama, the one.

50:18Uh, prime intellect. Yeah. Lambda.

50:21And just go and find the device that you are interested in purchasing, rent it for, like, two to four hours, run a model on it, and then see what kind of performance you get out of it, see if it, like, if it feels right for you. Um, once you get, like, experience tinkering on your own machines, uh, and then you can make a decision from there.

50:41Uh, I think there are, like, some standards that I would say if you are already convinced, like, you wanna buy this. Like, the the standards are if you want a lot of memory and you don't mind speed, Mac is a good option, um, uh, or the DGX Spark.

50:56If yeah. But that's a little more expensive. If you want a lot of speed, like frontier level speed, you need to go NVIDIA.

51:03Unfortunately, that's the only option that makes sense. Maybe the AMD, there's a 7,900 x card.

51:09I hopefully, that's right. But maybe AMD will get you close, uh, on some models, but I would recommend NVIDIA.

51:17Uh, for the NVIDIA chips, the thirty nineties, if you can get them for a thousand dollars or less, are excellent cards. The six thousands are the next best option.

51:28I would not touch the 40, the 50. There is a card called the 5,000 Blackwell.

51:34So it's similar to this 6,000, but it's half memory or 72 gigabytes. So you get, like, increments that are the same price.

51:40It's very expensive. Um, the tiers, uh, basically, the because we have a web we have a a benchmark that we've been running.

51:50So we took the 23 most popular models, and we benchmarked them, uh, and all of their compressions, four bit, three bit, two bit, eight bit, uh, on five benchmarks, like TerminalBench, SuiteBench Pro, um, these, like, uh, agentic benchmarks, GDP eval, uh, the best ones in in our opinion.

52:09And what we found is that from zero, like, from, let's say, 250 to $9,000, the only model that will make sense is either Quen 3.627 b if you want really smart and slow or Quen 3.635 b.

52:26The Gemma models are good. Like, they have world knowledge, but they're not gonna be able to do anything like Claude code in my opinion. Um, then after that, when you have $9,000, you can get two sparks, and you can run a model called step 3.7 flash.

52:39I would say that that is, like, truly phenomenal. So not 4.5 level.

52:44Like, it's like a big step up from from the Quince, And that would run on $9,000 of hardware, uh, with full context and, like, maybe two to four concurrency. Uh, and then after that, like, it's gonna like, you're gonna spend another $10,000 to get the next best model, um, and, uh, that'll be something like MiniMax or DeepSeek v four flash.

53:06And then last but not least, like, where I'm at is, like, $50,000, and you can run GLM 5.2 with, like, compression.

53:14You can run MiniMax m three at, like, eight bits. So you start getting some good, uh, like, actually good models, uh, that are frontier capable, but that's the current price.

53:26It's 50,000. What's exciting is that last year, it was a 100,000 for the same, like, tier of intelligence.

53:34So it's one, getting cheaper, and then two, it there's more, like, tiers that you can jump through. So keep that in mind.

53:42It's like if you have less than $10,000, I would just go with the whatever can run coin 3.6 with full context, maybe two to four concurrency, and then you have a bunch of options on how you would do that.

53:55Okay. So 50,000 for, like,

53:58the best 6 thousands.

54:00Mhmm. Yeah. The best best open source models, but roughly what tokens per second?

54:04So with GLM 5.2, I'm getting between sixty and eighty for a single stream and up to 250 for, like, four no.

54:12Like, for yeah. For four, it's 200 tokens a second. For four, it's, like, concurrent.

54:18Yeah. That's really really recent. Okay.

54:21So how do you think about the setup that's, like, built in a way that's easily upgradeable? Like, like, personally, I would be, you know, interested in, like, maybe, like, the 20 k range. Like, what would you recommend me?

54:32First of all, the hardware and, like, how to think about it? How how do you think about it when you build a hardware? Like, are you, like, planning, okay, in the in the you know, in six months, I'm probably gonna have a 100 k worth of hardware, or how are you planning about it in a way that's, like, you you're not buying things that, like, get bad very fast?

54:49Just tell me in general how how you think about

54:52So can I share my screen? Is is that okay? Yeah.

54:55Of course. Of course. Okay.

54:56So, basically, like, there's a few things I would go through. One is, like, the type of hardware again that you're going to get.

55:04I'm gonna go to bookmarks where I have a lot of stuff. So this is like this is what we currently have for the benchmarks. Each one of these speed is like the actual so for the benchmarks, it's a sample, and each sample is, like, between 64,000 to 200,000, and it takes five minutes to solve a sample.

55:23And it solves 85% of the samples. So we're breaking it down like this.

55:28So next is, like, when you have a budget. So if you have, like, let's say, 20 k. I'm gonna go to hardware here.

55:35So when we have a budget of 20 k, what you can spend is, like, you can get four, uh, DGX Sparks. Uh, the reason I say this is, like, uh, each DGX Spark can link to another one, um, and you can hack it to link so you can daisy chain, uh, up to four. You get about 1.5 x speed up per node that you add in.

55:58So and with four DGX Spark 6,000 DGX Sparks, each one is a 128 gigabytes. So if you take a 128 times four, that's 512 gigabytes.

56:09Um, you take this, let's say, times 2.5, you get, like, 600, uh, gigabyte a second memory bandwidth give or take.

56:16And then you can see, like, what models are running and how good they are for this range, and it tells you, like so this is just essentially how I'm trying to walk through it myself. The price point is, like, 17,600.

56:29So I think about increments of some base component. Now the base components would be, like, a $30.90.

56:36That's a base component that you can upgrade in increments of doubles. Um, you don't wanna have three thirty nineties. You wanna have one, two, four, or eight or 16.

56:47So, uh, each one has 24 gigabytes. I know this is a little a little bit math heavy, but it's like there's not much of a better way to explain this.

56:55Um, so each one has 24 gigabytes and let's say costs a thousand dollars. So if you take that and you multiply it, um, by four, you get 96 gigabytes and you have to spend $4,000, which is a little less than half price for what you would have to pay for 96 gigabytes with the RTX Pro 6,000, and you get half the speed, of course.

57:18Also, so the increment of update is very important. Like, do you wanna have sixteen thirty nine days in your house?

57:26How are you gonna cool that? How are you gonna get power to that? So if you want models that are, like, 512 gigabytes, whatever, I would not touch the 30 nineties.

57:34This is gonna, like, limit you at, I think, 96 to a 192 gigabytes in your it's already a problem. So and please stop me if at any point you have No.

57:43Keep going. Keep going. It's great.

57:45Keep going. So this is like a little chart I made in the past. I'm gonna open it in a new tab.

57:52So this is kind of like what you now we just have to 1.5 x the cost because right now, all of this all of this is, like, 1.5 x the cost as as when I did it. Um, for a Mac Ultra, you get a lot of memory.

58:05You pay 10,000 or $15,000, you get 512 gigabytes, but it's slow.

58:10So any model that is large and has high active parameters is gonna be slow. So take a look at this this chart.

58:17I think, uh, it's if we can understand this, like, it's better for our decision making. Uh, one second. Okay.

58:24So we have, uh, Minimax m 2.7, which has 229,000,000,000 parameters, 10,000,000,000 active.

58:33We have Kimi, k 2.5, 2.6. It has 1,000,000,000,000 parameters and 30,000,000,000 active.

58:40Um, the GLN five has seven forty four billion parameters and 40,000,000,000 active. So the more active parameters, the more intelligent the mob, like the more intelligent, uh, intelligence you get per token that is generated.

58:55Um, with quen 3.627 b, it has 27,000,000,000 parameters. So it's actually almost as intelligent as a model like Kimi.

59:03It just has less range. Like, it knows less stuff. But per token is as intelligent as Kimi.

59:09Um, so thinking about it that way. So either you come in with hardware that you already own, and then you need to be matched to models and configurations that work for your hardware, Or you come in with a price point that you want to spend. Then if you come in with that price point, you need to look at the models that are available and see what can fit at what speeds.

59:31Then you can come in with a model that you wanna run, and then that dictates everything else. So if you wanna run GLM five and you wanna have more than, like, 15 tokens a second, you're gonna need to get NVIDIA hardware. And you're going to probably need to get the 6 thousands because anything bigger than that is way more expensive, and anything smaller than that, you would need to build out, like, a a 24, um, 24 card rig that is just not sustainable.

59:57Uh, does that kind of answer your question? I know it's not

1:00:00Words. For sure. No.

1:00:01It's amazing, man. Uh, I I have so many follow ups on that. I guess I would approach it in a different way.

1:00:06I would say, like, how would you advise me to go in the direction where, like, in six months, I would have strong enough of a rig to run the best open source model in six months at at at the best speed possible. So, like, I don't care how much it costs. Right?

1:00:20Obviously, not, like, millions of dollars, hopefully. But, like, how would you advise me to start building in that direction? What would you buy right now that, like, would be still useful right now?

1:00:29Like, hardware that I can run, you know, smaller models. But in a the the main strategy being that, like, in six months, I don't mind spending 100 k to get the best model in six months to run it locally at the fastest DPS.

1:00:41So I think the answer is the RTX Pro 6,000 for you.

1:00:46If you think in Okay. Six months, I'm willing to spend a $100,000. Each card is worth, like, let's say, $10,000.

1:00:52So in six months, you can have six of them. Let's add another two months. You have eight of them.

1:00:58If you have eight of them, you have 768 gigabytes, and you can run every single model, all of them.

1:01:05And you can run them at, like, high precisions with full context windows and at, like, a 100 plus tokens a second.

1:01:12This this applies to Kimi. This applies to GLM. So all of the models that are the current best models, you will be able to run if you spend a $100,000 and you have eight RTX Pro six thousands.

1:01:22Now if your budget is, like, let's say, uh, 20% of that, like $20,000, I would say buy the, um, the DGX Sparks because each DGX Spark has a 128 gigabytes, and, uh, it's really good at training.

1:01:40It's really good at prefill, fine tuning. Um, it's slower at the token generation speeds, but if you run mixture of expert models with, like, 10,000,000,000 to 20,000,000,000 active parameters, um, you could get maybe 20 to 40 tokens a second, um, closer to 40 if you really sit there and optimize it.

1:02:00Um, so based on however whatever your monthly income is, you you should take a slice of that, and you should think I'm going to invest this in compute. And then based on that, you can make the decision.

1:02:11So if your monthly budget is, like, you have $2,000, I would say maybe go with the with the the the what are these? The the DGX Sparks.

1:02:19If your monthly budget is, like, $5,000, maybe $10,000, um, you can go with the d, uh, the RTX Pro 6 thousands.

1:02:26If it's less, then it's best to get like, let's say you wanna start out and you have $2,000 over six months, get two thirty nineties, and you can run all the Quen models and all the Gemma models.

1:02:39Does that give you kind of a good idea? Yeah. Yeah.

1:02:41No. It's great. Like, I have a good idea.

1:02:43I'm gonna go with the RTX Pro 6,000. The reason I don't wanna start with the 39 is because I already have a MacBook with 128 gigabytes of memory. So, like, I feel like the two thirty nine is would probably be similar or even less less powerful than that.

1:02:55So and does that even change, like, your your question if I have the MacBook? Like, should I think about, like, maybe getting a Mac Studio to, like like, pull the VRAM somehow or not not really?

1:03:05So the currently, XO is allows you to take, like, multiple Mac devices or Macs and DGX Sparks, and you can combine them together, uh, and you get two things.

1:03:17Uh, if you combine a Mac and a DGX Spark, you can run prefill. So this is prompt processing. You can run it on the Spark, and you can run token generation on the Mac because this one is better here.

1:03:27This one is better there. So on average, you get double the memory, and you get something that's, like, maybe 1.5 to two x faster just by having these two. Um, if you are thinking about, like, stacking Macs, you also get a little bit of a speed up.

1:03:39I think Apple has been really good at, uh, going faster in their, uh, a, like, AI hardware development, uh, cycle.

1:03:49Uh, the upgrades with Mac are cheaper also. Like, you can you can get, like, 512 gigabytes more for 10,000, $15,000, which is, like, a like, you cannot get that with a $100,000.

1:04:00Uh, you need a $100,000 with NVIDIA to get close to that. Um, so keep that in mind. Uh, though, if you really want Frontier speeds, like, the only option, again, it it is NVIDIA.

1:04:11So if that's if you want something where you can actually replace Claude code and not know the difference or, like, codex and not know the difference, you need NVIDIA as of now. There's one thing. This is still experimental.

1:04:23This is in the research phase. A lot of people also knock this idea is so if you have a mixture of experts model, like, let's say, Kimi or, uh, we're we're gonna use GLM 5.2, the 40,000,000,000 of those parameters are active.

1:04:36Now the the the LLM process is two steps, prefill and then decode. So prefill requires that the entire memory be loaded into, like, VRAM or high speed memory.

1:04:49Um, decode is whatever the active parameters is plus whatever the specific math of that model, but it usually comes down to, like, one tenth of the requirements of memory for the decode step.

1:05:02So is if it's possible for us to connect an NVIDIA RTX Pro 6,000, for example, and, uh, let's say, DGX sparks together and efficiently and quickly move the bits of the attention, so this is the token generation part.

1:05:19If we can move it over to the RTX Pro 6,000, we can get NVIDIA speeds with, like, much cheaper costs. Um, so it's an accelerator.

1:05:29Now the labs, the laboratories, like the real giant gigantic labs are already doing this. So when they have, like, a a d AWS to Tyranium and they have NVIDIA GPUs, they can do the, like, stuff on the Tyranium and get the speed benefit of both.

1:05:45But we, as consumers, as end users, we still haven't gotten to the point where this is, like, a viable option. Does that make sense? Yeah.

1:05:51I see. So your you you would predict the trend continues like the hardware gets cheaper, like, in in six months or in twelve months, it's gonna be halved again? I think the cost of running, like, much more intelligent models is going to be halved again.

1:06:04I don't think the cost of NVIDIA hardware is going down anytime soon. They just don't have enough, like, capacity to to fit the demand. So it's like the price of hardware is not gonna get any cheaper from my opinion.

1:06:16The price of running, like, Frontier Intelligence is going to get cheaper. I see. I see.

1:06:21I yeah.

1:06:23So, basically, you if you are planning to buy hardware, you would do it sooner than later, basically?

1:06:29I don't I don't think I don't I don't think it's healthy to, like, say that to people because, like, maybe people can't afford No. No. Like, I I'm not sending the message.

1:06:36I'm saying, like, for myself, like, I should think about it. I think I think you should think about it. Like, if you have the budget and you use this stuff and depend on it, like, you should have a budget for it, and you should have that budget go into ownership.

1:06:47Like a mortgage versus rent. If you're gonna do something, like, your entire life, would you wanna rent it the entire time, or do you wanna just, like, purchase it, you know, over increments? And I think that's that's a good way for us to to approach it.

1:07:00Yeah. Actually, as you say that, like, I kinda realized, like, my strategy is is kinda renting strategy where, basically, you know, I'm spending, like, right now between 5 and 9,000 a month on OpenRouter subscriptions for myself and employees and everything.

1:07:17And, like, you know, a slice of that could be going towards building the rig where, like, you know, obviously, I I will not get the same quality and TPS right off the bat. But, like, if I do that consistently for six to twelve months, I'm gonna have enough hardware to run it locally, and then I can cut off, you know, basically all of those subscriptions or maybe just keep a few to, like, experiment with the best closest models, but, like, fully own my local setup.

1:07:40So, like, when you position it like that, it makes complete sense.

1:07:43Mhmm. Yeah. So companies companies, like, if we if we stop thinking about us as people, like, if a company is spending $300,000 a year on Anthropic subscriptions or, like, Anthropic billing, It is not beyond the realm of possibility.

1:07:58There are companies that are spending, like, tens to hundreds of millions of dollars on AI right now that are not in AI. So why wouldn't those companies think logically?

1:08:07Like, okay. Are we like, do all of the requests that go to to Claude Fable need to go there?

1:08:14Do do they? I don't think so. So can we then purchase hardware to, like, basically reduce the costs long term of what we're doing and also have, some some of our own private, you know, private inference, some of our own training data that because each company is just training data.

1:08:33Right? Like, everything that they do, how they interact with the browser, how they use Slack, that's all training data can be installed, and they can just train a model and make it more and more and more accurate on their own stack. So I would think of it as an like, as a long term investment for your own personal sovereignty is is essentially the best way to to consider it.

1:08:52Last thing is, like, if you actually buy hardware

1:08:55and you have to fit it together, you have to learn from the bottom of the stack to the top of the stack. And that knowledge right now is so expensive. That knowledge is, like, what people pay for.

1:09:05Right? This is a giant industry that's, like, being born, and they need people that do this. So, you know, you ultimately will just have an advantage over the rest of the people because you know computers better.

1:09:16Yeah. And, like, there's gonna be companies that, like, will happily pay you big money to transfer, you know, their crazy Anthropic bill or crazy OpenAI bill towards, like, some self hosted solution or maybe, like even, like, help help you set up spin up some, like, GPUs in the cloud so it's, like, fully self hosted. But, you know, like you said, if you do it locally for yourself, you're gonna have the knowledge that, like, companies are gonna pay for.

1:09:38And I think this is much better skill than, like, you know, just going to a company, like, vibe coding a front end for them and calling that, like, AI implementation. This is the serious AI implementation. And there is still many, many companies that cannot really use frontier models because of, you know, terms and service, private data, financial health care, you know, legal data that literally they legally cannot send to OpenAI API or Anthropic API, and they don't even want to probably that these companies don't wanna fall behind.

1:10:05Nobody wants to fall behind. Right? They wanna use great AI models, but they just don't know how, or they don't know it's it's it's much more possible than they think.

1:10:13And a lot of companies could afford to spend 100 k on a local hardware rig, but they just don't know how to spend that 100 k. They're even it's possible. Even if they spend a 100 k, they don't know which models to run.

1:10:23So, yeah, like you said, that expertise, I think it's gonna become a, like, like, a actual position that a lot of people go into these companies and implement these setups.

1:10:31I have a I have a good friend in Warsaw who created a company, and they basically helped, like, launch at least, like, four, you know, billion Polish zwaltz companies in, like, in Warsaw. Like, they've had their hands in a lot of the most successful companies in Poland, um, as developers.

1:10:48And they, um, they basically started buying compute a few months ago or, like, a year ago, uh, and now they have an entire data center. Like, it's got, like, multiple DGX sparks, multiple 50 nineties, six thousands.

1:11:02Like, uh, they have a few b two hundreds. They have they have everything that they might need, and they use it. Like, they're using it all the time.

1:11:08Uh, they they are, like, giving it to their employees. Um, I also met a company in Germany who was doing the same. Like, they are, like, a a software development firm, and they are just buying 6 thousands, buying b 2 hundreds, um, you know, incrementally over time, and they are creating a a a data center in their own company, and they service other companies.

1:11:26So now they can use their inference. Right? Like, this is German private inference.

1:11:30This is, you know, we are your partners. We will never take anything from you. And they can trust that.

1:11:37if you want to use Fable, you're not allowed to keep your data private. You have to share the data with Anthropic. Crazy.

1:11:42Not even as an enterprise. Yeah. So I don't know.

1:11:46You know? It's like, it makes more more sense. It's just so expensive to get into for the average person if people feel priced out.

1:11:52Yeah. But, again, like, people need to look into the future and, you know, see where it's headed and that, like, access to intelligence is gonna be essential. And, you know, people already do this with, like, solar panels.

1:12:03You know? People understand, I I wanna have my own energy, but that's because electricity has been around for a hundred plus years. AI, you know, also has been around for a long time, but, like, useful AI is only, like, you know, five years old or whatever.

1:12:13So people don't really see the need for it. They don't understand the implications of it. But I think the the issue is that it's advancing faster than any other technology.

1:12:21So I think they they do need to see the need for it. This is this is hopefully, it works and it it doesn't embarrass me, but I will launch this. So this is running at home.

1:12:29This is my GLM 5.2. It's like a custom compression that I did for it.

1:12:34Okay. But it'll run-in a minute. So then we have so it's running now.

1:12:40It's doing its thinking. Yeah. It's doing its exploration.

1:12:44It's fetching its skills, and and it's gonna make me a nice website. But if we go here, this is the same model. Right?

1:12:50This is a compression. It's, uh, it's compressed 80%, like, uh, on the original.

1:12:55Typically, you can get about 75%, but I've, like, learned how to push it a little more. And so, you know, this is three d, you know, three d what is it called?

1:13:03The the Flappy Bird. Like, is usable. Right?

1:13:06And this was a single shot, uh, attempt. Now I also have, like, um, I'm trying to this is another one. So this is running DeepSeg v four Flash.

1:13:16Uh, there are there are going to be two sessions left and right that are running. So you can see it's running pretty fast. It's able to do, like, a concurrency.

1:13:24It's doing its tool calls, reading. Um, this is a even smaller model that you can probably run on $9,000.

1:13:30Uh, I told this to, uh, reorganize my downloads folder, which was a mess. Just, like, reorganize it. I don't really care how you do it.

1:13:37Um, let's go over here. So, you know, it's reading things. It created a bunch of folders.

1:13:42It's figuring out how to categorize, and it's just gonna start moving things into the right folders in a second. You can see that it's, like, disappearing. Yeah.

1:13:49It's moving those files into the right places. And all of this is economically valuable, uh, work.

1:13:55Right? That would have taken me maybe thirty minutes to sixty minutes to do, and now I can just, like, hey. Organize my downloads.

1:14:01I could go do something else. I saved myself that time. Uh, this is a model that would run on, uh, two RTX Pro six thousands or maybe, like, uh, two, um, DGX sparks, and it's doing its job.

1:14:15Um, this is basically DeepSeg v four flash on both sides.

1:14:20One of them is researching, like, inference engines. One of them is researching GPUs. So the capabilities, again, are significantly better than they used to be, and now this is thinking.

1:14:30It's doing its job. And so I can also, like, launch another one if I wanted to. And, like, I'm gonna do pi here.

1:14:40Model, GLM 5.2. I can do this with, uh, actually, let's do it with Droid.

1:14:48So Droid typically would cost you a pretty significant amount of money because they only do API based billing. I think right now, they they're experimenting with usage pools. Sorry.

1:14:57By the way, just a second. There you go. Uh, Hi.

1:15:03What's in this repo? So this is a, um, like, private inference that I'm running at home that is capable of doing everything that the, like, Frontier models is capable of doing, at least for work.

1:15:15Right? So I'm I'm very happy with with what I got. I think it's worth the investment.

1:15:21And, yeah, it's gonna be slow because this is, like, the largest model that you can practically run at home, like, from a active parameter count size. Hopefully, this kinda gives an idea. I could also run, like, these smaller models.

1:15:34I can run, like I could give inference to maybe, like, 24 people, uh, with the cards that I have. Go back here, and you can see that the model is running. It's, uh, around 62 a second, GLM 5.2, and you can see how much VRAM it takes, like, uh, how much yeah.

1:15:50It's it's it's a pretty great model. You can see my usage per month is like so I'm using 374,000,000 tokens a month locally.

1:15:59Now remotely, I'm using much more, honestly. Uh, but, you know, it's it's it's viable.

1:16:05The does that make sense? Like, it's viable. It wasn't viable last year.

1:16:08Now it is.

1:16:09Absolutely. Yeah. So, like, you know, with the all the techniques, quantization, all all the research papers from China, it's basically becoming more affordable to run useful models locally.

1:16:19Maybe not the cutting edge ones because the hardware is, you know, becoming more expensive. But, yeah, people that they, you know, tried self hosting maybe two years ago, they should try again because they can get much better models, not like Llama three or, you know, seven b or whatever that, like, was kind of fun to play with, but not really work usable.

1:16:35So I guess the strategy do you see the strategy as, like, larger percentage of your total token spend? Because I think most people watching this belief in AI and, like, the tokens are gonna go up and, like, you know, each usage like, my token usage is growing every month. So how do you see like, do you see the strategy of, like, more percentage is gonna be self hosted?

1:16:52Like, you you'll still use, like, some, you know, cloud?

1:16:56Mhmm. Yeah. It's definitely more and more of it is going to be self hosted.

1:16:59Like, I'm gonna share my screen one more time and just show you kind of what I mean. It's like right now, we think of inference as, you know, coding, clock code, codex, but there is a million things that you can do with it.

1:17:09You can set it up in a robot. You can have it control your lights. You can do you know, you can have it be on your watch and, like, you know, take care of your health.

1:17:17Um, let's see. Where is this? So this is Mario, the creator of pi.

1:17:22Yep. This is Gemma running locally. So the like, what is controlling the computer is Gemma running locally.

1:17:29So I'm gonna see. It follows him around. You know, it does like, I don't know if it's gonna move in this video.

1:17:34Hopefully, it does.

1:17:35But this was a toy that he ripped up. He broke his children's toys and, like, put them together, and it can play him music. It can, like, run around.

1:17:44Like, is so this is what I envision is going to be the interesting, amazing future of, like, local and, uh, self hosting is, there's so much we haven't even touched yet use case wise that is going to be enabled by people just tinkering at home. Yeah. And even in in, like, you know, traffic lights.

1:18:02Like, it's it's kinda crazy that, like, traffic lights are just, like, zero intelligence. When you know, there is one way. There is zero cars going, and, like, the red light with, like, full lanes and, like, everybody's waiting.

1:18:12Nothing is happening. It's just dumb, and it's slowing down society because there is no intelligence there. I mean, you could probably solve that without without an agent, you know, with just some some notes and, like, a if else statement.

1:18:22But, you know, the point is that, like, everything can be improved by having some intelligence. Like, for example, you know, I'm I'm tracking my calories with OpenClaw or Hermes, like, whichever one I use. And, you know, it's it's kinda obvious, and some people make fun of me for losing, like, Opus 4.8 fast for that.

1:18:36But, like, you can ask it custom questions based on, like, okay. I'm going to the gym.

1:18:41What are some quick carbs I can eat? How much carolies I have left? Right?

1:18:44Like, I I have a podcast. I need to be sharp. I I need to fill out my protein for today.

1:18:48Like, custom questions. Right? You're never gonna get that with a basic meal planner app or whatever.

1:18:53So, yeah, intelligence is super valuable everywhere. And I'm also wondering, like, how you think about your own setup. Do you plan on, like, adding more 6 thousands?

1:19:01Or or yeah. How are thinking about it? Or, like, are we limited by electricity?

1:19:04What what becomes the limit? Let's say, like, money aside, obviously, money budget is the limit. What's the limit?

1:19:09Like, it's just a public grid?

1:19:12Yeah. There there's there's so I power cap my GPUs at 40% capacity, all of them.

1:19:19So they just work at 40% of what they're capable of, which is also slowing down, like, inference speed by, like, 20 to 30%. But the power gains for me are worth it. I already have, like I'm already, uh, so I already have, like, an upgraded house.

1:19:32Like, the electricity has, uh, is upgraded. I'm going to upgrade it again this this year, uh, in September. Uh, so I'm gonna get more power.

1:19:39I want more like, four more six thousands. Um, and I think after that, like, there is not much sense in upgrading. Uh, what I wanna do, though, is this friend that I told you, he has, um, he has, like, basically a data center now.

1:19:51I wanna move my rig into his place. So the the the current limits, it's loud. It's hot.

1:19:56It's it's expensive. Okay.

1:19:59Leave that aside. But, uh, the heat is a problem. Like, how do you get the heat out of the room?

1:20:03Uh, if you're running, like, real inference on it, it's gonna get hot. It's gonna get, like, loud. Um, it the the electricity, the power, you know, you're you're gonna need two circuits to power eight six thousands.

1:20:13Or if you're in America, need four circuits to power eight six thousands, um, which is, like, honestly, to me, a little bit ridiculous. Um, so those are the main limitations. But it's it's not that hard, and I think it's worth it.

1:20:27So, like, this is the report that the GLM model, um, spit out while the what we were asking it. One second browser.

1:20:37There you go. So it made this UI, and it's basically explaining to me what a what, like, the PI agent is.

1:20:44It's telling me it has 73 skills, etcetera. Like, it built this UI in something like two minutes maybe. Uh, and, yeah, it's like for me, like, this is more than enough.

1:20:56Uh, make me a, uh, make me a video game, uh, three d, maybe, like a car game.

1:21:06Maybe a car game. Okay. Let let it let it do that.

1:21:09And then we go back and see the response here. So this is in Droid, which, again, you would have to pay a lot of money for, um, because it it is probably the best harness, but I can use it for free because I plug my own inference into it. And this is this is this is essentially what I mean is, like, it's worth it.

1:21:25It's worth it. If you're spending more than, like, 2,000,000,000 tokens a month, it's worth it. So I am gonna upgrade.

1:21:31Yeah. While that's building, I guess, so what's your thoughts on, like, uncensored models and and fine tuning that? You think the government will, like, come for you if you have a uncensored model at home?

1:21:40Like, how how does that work in your mind? It depends on the scale, like, how much how much people are using it and, like, whether it's gonna get into the news. Like, at least in I'm I'm only operating with America in mind because that's where I'm from.

1:21:53Our politicians are, like, driven by news. So if there's news that something bad happened, doesn't matter if it's true or not, it's gonna get into politics. If somebody wants to give them money to, like, bring a topic into the limelight, it's gonna be there.

1:22:09So I think it would be useful to have, like, some kind of organization like the Human Rights Foundation that sets a budget to lobby politicians to keep them away because, essentially, they're just waiting for money. Right?

1:22:20Like, that's that's the truth of it. Like, they they wanna get paid, and they're gonna get paid from here or here.

1:22:25So you gotta bid for their attention here in this country. Now this this goes back to the conversation earlier is how big is the scale and when does it get into the government's mind that, you know, people are running this stuff at home and they're doing this, uh, this stuff.

1:22:42If that does happen before we get the chance to spread this, it's gonna be a problem. Uh, but I you know, I've used uncensored models. What the one that I liked was Hermes 70 b because they they train it from, like not from scratch, but they, like, do a, like, heavy post training on it to to remove all sensors.

1:23:00So you can ask it about pretty much anything and get an answer. Um, and sometimes I need that because, uh, I took a, um, I have a cactus that is a peyote, and I need to take care of it.

1:23:12Like, I'm not using it as a drug. It's just a beautiful cactus that I was gifted, but I don't know how to take care of it.

1:23:17So I'm trying to ask GPT. I take a picture. It's like, how do I take care of this?

1:23:21It won't answer. Claude won't answer. Like, they're like, oh, we can't we we can't talk to you about so I I try with Hermes.

1:23:26It's like, okay. This is this. This is that.

1:23:28You know, you you get the the these types of rocks, and you water it once every two weeks. So, uh, I could get that information off the web. I would just have to spend, like, thirty minutes, like, reading random blogs and looking at ads and, you know,

1:23:41corrupting my mind. Yeah. So it's just convenience.

1:23:44So I guess the strategy is basically spread the message without, like, you know, showing too much of a crazy use cases and get as many people you know?

1:23:53Like, what what would be the first steps? Do you think people should, like, download the weights? Do you think, like, hugging faces at and the and the risk of getting shut down, or or you think that's overblown?

1:24:02Like, it feels like that.

1:24:04I there is a chance that I am misremembering this, but I had spoke I had the guy a guy from Hugging Face on the podcast, Victor. He's the head of product, and he told me that they have gone, like, the again, I might be misremembering this, so I'm sorry if that's the case.

1:24:19But he said something about the government, like, the French government trying to take down datasets, like putting in requests to take down datasets. So, uh, if this happens, it's probably going to be, like, they start taking down models and datasets, not that that they take down the entire site.

1:24:35So I think it starts like that. Right? Like, they'll they'll go after whoever is launching the models, or they'll ask, uh, Hugging Face to basically remove the models from the index or the the the weights.

1:24:46I think it's worth installing the weights mainly because, you know, I used to use torrents when I was a child. Right? That's how I got everything, all my games, all my shows, and the the torrenting culture requires people to have the files.

1:24:59You just have to have the files. So it depends, like, how much memory do you have? Are you willing to is this something do you care for the one terabyte that it might take to to to download a model?

1:25:08Like, if you do, then, um, and you you don't care about this cause. Like, I I there's nothing I can do to convince you. But I download them.

1:25:15I store them. You know? I have six eight eight terabytes now.

1:25:18More, like, 12 terabytes, actually. And I have, like, a bunch of models stored, and that's not even nearly enough. I need more.

1:25:26And storage is also way cheaper than GPUs. Right? So, like, people can easily acquire 20 terabytes pretty cheaply and download, like, thousands of datasets and, you know, all of the best models, basically.

1:25:37Yeah. Yeah. And you can you can you can see the torrents if you want, like, know, in the future if that if that's the case, or you could just have them and you can you know, they're great research subjects.

1:25:45Maybe eventually you'll be able to afford to run GLM 5.2. And if you have it yeah.

1:25:50Because how much how much smarter are they gonna get? Like, this is the this is the the the big core question is are we close to a limit, or is it, like, is it possible for one terabyte of data to be that much more intelligent?

1:26:03I I can't tell. Like, I I I I don't have the knowledge and skill to tell.

1:26:08I mean, do you see it slowing down? Because I I don't feel that way. I feel like the the models like, the progress in the small models is even faster than than the big models.

1:26:17I don't see it slowing down, but I I'm in a I'm, like, in my own, you know, bubble of because I still ask people, like, okay. Why like, people still don't like AI, nor me people, or even artists, or, you know, they don't like it.

1:26:30And I just don't understand

1:26:32maybe maybe I understand, but, like, I just I I don't get, like, how you don't see what the maybe maybe I'm crazy. I I don't know. Yeah.

1:26:39Yeah. I mean, I don't know. I feel the same.

1:26:40Like, it's literally the greatest technology of all time. It's here right now. It's our time.

1:26:44You know, it's the biggest technological revolution in history. And, like, it's the most important resource of the future.

1:26:50It's intelligence. And, like, it feels like nobody's really paying attention. And then when a new model comes out, it's this crazy.

1:26:56I've gotta go three layers deep. Right? Nobody's paying attention to, like, AI percentage wise from from the people.

1:27:01Out of all of my, like, friends, I was shocked how many of them disregarded Fable five.

1:27:07Like, how many of them just didn't take it seriously, didn't spend, every waking hour of it. It's like, oh, yeah. New model.

1:27:12It's kinda expensive for most work clothes, whatever. I'm like, guys, this is an increase. This is a step increase.

1:27:17Right? Like, are you not true believers? And then when it got shut down, I was also shocked by the even smaller percentage of people who realize the implications of this.

1:27:26It's like, you know, if the government controls the future of intelligence, you know, once we have AGI and beyond, like, it's over. It cannot be centralized. And, each layer are just, like, less and less percentage of the people.

1:27:37So I agree with you. Sometimes I I do feel like I'm insane because, like, the percentage of people that really understand where AI is, what it can do, and also, like, the implications of the future, like, where it's gonna be in two years from now and the implications if it's fully centralized.

1:27:50Like, it's such a small percentage of a fraction of a people. It really is crazy.

1:27:55I wanna point like, I I have to leave at the top of the hour. So also tell me if we've been going on for too long. I'm happy.

1:28:02But I wanna ask you, like, a few questions. I mean, you know Sure. Your audience is watching this, but is that like, what got you into into going this hard?

1:28:09Because I've I've seen you for the last year on YouTube, and

1:28:12yeah, I'm aware of, like, what you do and, like, what made you go so hard on this, like, industry? By industry, you mean AI as a whole or, like, specifically, like, open source, local, sort of hosting, like, I think both. Maybe you can start with AI as a whole and then, like Yeah.

1:28:26Local. Okay. So I'm gonna go back a bit more.

1:28:28So when it comes to, like, some of my, like, more libertarian values, like, I always had them. Even when I was, like, 11 or 12, I was explaining to my, you know, classmates in a free lesson, like, centralized banking and how it all works. And, you know, they didn't understand me because we were, like, 12 years old.

1:28:44But when it comes to AI for me, because I'm 24, it's really the first technological revolution I can take part in. I was you know, a .com bubble happened, like, when I was getting born or even before.

1:28:56Then, like, social media wave, it's really, like, 2004, 2006. That was the best time to start social media. Mobile mobile apps, even, like, you know, $7.08, 2,000 tens.

1:29:05Bitcoin, the best time to buy was, like, you know, since it's launched until 2013, whatever. So I was basically, like, way too early. Like, I didn't have any money at, like, 11 years old.

1:29:14Right? So, like, I was way too early to any large technological revolution, and this is the first one that I can actually take part in.

1:29:21So that's why, like, when I understood this, I just dropped everything. Because before I was doing, like, gaming content on YouTube channel, and I was making, like, really good money, like, 20 k a month at, like, 20 years old. Right?

1:29:29Which is great money, but I I completely stopped doing that because I just realized this is it. Like, this is my chance. This is my first big technological revolution, and it might be the the greatest one of all time.

1:29:39Right? And I do think it's gonna be the the biggest one of all time. So that's, like, why I switched to AI.

1:29:43When it comes to open source and, you know, running AI locally and uncensored videos, like, you know, I have, like, a lot of videos that are, like, 200 k views on this topic. And to me, it's just I don't know. It is obvious that, like, why should Dario Amore decide what I can ask?

1:29:58Who decides? Is this some European bureaucrat? Is it like Ursula von der Leyen?

1:30:02Is it some, like, American politician that's, like, you know, lobbied by somebody? It's like, no. I will decide what I ask.

1:30:07You know? Like, I I always had, like, these libertarian values. I also a huge believer in Bitcoin like you.

1:30:12Like, to me, this is obvious. You know? Some things, like, it's kind of really explains, but it's just obvious.

1:30:17Like, I don't think the the future, the most important important technology of all time should be centralized by some politicians or, you know, a few groups of of company CEOs.

1:30:28And what about your audience? So, like, how have because you probably read their comments. You probably interact with them.

1:30:33Like, how do how do you feel like the average mindset around this is?

1:30:38Yeah. So, I mean, you can I can maybe screen share my channel because I'm running these polls, and I the last two were on open source? So let me just screen share that.

1:30:48Because the results are overwhelming in favor of, like, open sourcing and and self hosting.

1:30:55Let me screen share here. Boom. Yeah.

1:30:57So here, I I voted, like, who should be in control of the future of AI? And, you know, 82% of people said nobody should be open source.

1:31:05Like, 6%, private AI companies, 5% governments, 7% don't care. The latest poll I did, why do you want open source AI? Cheaper or free access, privacy and control, Don't track this AI big AI labs, big tech for all.

1:31:18So yeah. I mean, how would you, um, I guess, interpret this?

1:31:24Or does that answer your question? You know, that does answer. I mean, like, if 2,600 people voted and you have, like, a four like, 300,000 like, that's a good amount of people that are interacting.

1:31:33Yeah. You have 400,000. It's a good amount of 1.5% interaction rates, like, are pretty high even on Twitter.

1:31:41So it no. This is this is good. I mean, my my because I'm really interacting with a lot of people from the closed source world.

1:31:50I because I have a contracting business, and I've worked with, like, pretty much every single company or, like, had some kind of business dealing with every company. And I don't think people there see it the same way.

1:32:02Maybe it's because they're already inside, and it's, like, it's hard for them to imagine, like, not being inside. So that I think that's where it's, like, the the opinions of other people outside of our niche, outside of our, like, communities is is still so far away and, like, minching them closer.

1:32:22Because it sounds like a sort of scam pitch where it sounds like a you need to spend money on something, a pitch, you know, like Yeah. It's it's hard to sell it as like a freedom technology. Yeah.

1:32:32The meetups the meetups, man. I do the in person meetups and I think that's where I I see, like, the most impact on people. If you can meet people in real life and, like, explain to them, it makes a big difference.

1:32:43I also saw you, like, did some fine tuning. I saw you did some, like, local inference. You've tried it out.

1:32:48Right? What's been your experience there? I mean, only on my MacBook.

1:32:52Right? So, like, that's why I purchased a beefy MacBook. Like, if I could buy a MacBook with 256

1:32:56kbps of RAM, I would just do it instantly. But, unfortunately, they don't sell them.

1:33:00If somebody from Apple is watching this, guys, please, next generation of MacBooks, at least double the RAM. But, yeah, like, I don't know.

1:33:07Even when, like, llama freak came out, I was doing the fine tuning videos. Obviously, nothing as impressive as you. But to me, it just always seemed cool, you know, having your own model.

1:33:16Like, isn't that, like, cool? Just your own model, you can have it. The only person in the world it's just the idea of it seems good.

1:33:23I went to this office in Germany of a company called Micro AGI. So their business like, they have, like, a few businesses. One of them is Micro AGI, but their their core business is they pay people to do jobs and to, like, record what they are doing.

1:33:38So, like, if you're a, what, a, like, a mechanic, you can wear this device and they pay you. This is like a contracting thing, kinda like Uber. It's not like a company forcing its so it's more like an Uber deal, like, um, end to end, uh, b to c.

1:33:51So they they do that, and then they use that data. So I went to their lab.

1:33:57So they have the unitary robots. Are you familiar with the unitaries? Yep.

1:34:01So they have the unitary robots. They have a DGX Spark and the DGX Thor. Um, they have the the what are they called?

1:34:08The meta headsets, the VR headsets. And, uh, they are using the open source stack, like, released by NVIDIA. It's impressive how much open source, like, real open source models NVIDIA's released, the training data, the scripts to run it, the environments, uh, the actual end weights, the the base models.

1:34:24Um, but they they they showed me how Gemma four, like, the the the the four b, was able to control this robot to, like, help it, like, make decisions in the real world. And they're training it, like, how to pick boxes up, how to, like, flip things around, how to connect cables.

1:34:40It it it is so impressive what you can do because the the robot is $20,000. The DGX Spark is $4,000, and the VR headset is, $1,000.

1:34:50Uh, and then you need like, that's that's it. It's for $30,000, let's say.

1:34:54You can have a real robot doing real things at home for free, like nobody can stop you. You don't need to do anything special. And and that's so exciting to me.

1:35:03I can't wait to like have little robots running around the house doing like chores and education for your kids. Like, I have kids now and so like I think about like how do I want to educate them?

1:35:13And having their own personalized education that is, like, technical and detailed and and I can tune It's huge. You know, or I can it's huge. It's huge.

1:35:21Because they're gonna use it in in education. They're already, like, trying to use it in education. And I don't know if you've seen, but, like, the kind of education quality

1:35:31Yeah. I've seen some some schools go play the crush, like, that used AI. Yeah.

1:35:35I I don't remember name, but I've seen it on Twitter that, like, it was just insane difference.

1:35:39Yeah, man. Yeah. I I'm so grateful that you invited me on.

1:35:42It's been a wonderful conversation.

1:35:44Do you have any other questions before we wrap up? No. We we we can wrap up.

1:35:47It's almost 11PM here in Poland, so I also do need to wrap up. Appreciate it, man. Alright.

The Hook

The bait, then the rug-pull.

The video opens with a live demo: GLM 5.2, the largest practical open-source model, running at home on custom-compressed weights — 374 million tokens a month, locally, for the price of a mid-tier GPU cluster. The question the rest of the conversation answers is whether the math and the politics make that worth it for everyone else.

CTA Breakdown

How they asked for the click.

MENTIONED ON CAMERA

07:00productOxylabs ↗

10:00linkdavidondrej.com/sero-podcast (free assets) ↗

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.

AFFILIATECommission earned if you click.

The voice tool I use ↗

OTHER LINKSAlso linked in the description.

Frame Gallery

Visual moments.

Frame at 00:00 from I spent $50,000 self-hosting AI models. You should too.

Frame at 01:17 from I spent $50,000 self-hosting AI models. You should too.

Frame at 02:52 from I spent $50,000 self-hosting AI models. You should too.

Frame at 03:47 from I spent $50,000 self-hosting AI models. You should too.

Frame at 04:51 from I spent $50,000 self-hosting AI models. You should too.

Frame at 06:35 from I spent $50,000 self-hosting AI models. You should too.

Watch next

More from this channel + related breakdowns.

47:00

David Ondrej · Tutorial

100 hours of Hermes Agent lessons in 46 minutes

A 47-minute walkthrough of all seven levels of Hermes Agent — from bare VPS to full MCP back end.

May 6th

1:02:24

David Ondrej · Interview

Matt Pocock's Agentic Engineering Workflow (just copy him)

A senior developer's real AI-agent setup, and the argument that the harness — not the model — is where the leverage lives.

June 18th

22:26

David Ondrej · Tutorial

This MCP makes Hermes Agent 10x more powerful

A 22-minute live walkthrough of wiring Hermes Agent to Apify MCP connectors and Supabase to automate lead scraping, scoring, and outreach.

June 15th

32:29

David Ondrej · Tutorial

Don't use Fable 5 in Claude do this instead

A 32-minute live workflow session on why agentic harnesses are the right home for Fable 5, not the Claude app or raw API.

June 10th

20:17

David Ondrej · Tutorial

Hermes Agent is crazy… 180,000+ github stars

How MiniMax M3 sparse-attention architecture makes always-on autonomous agents 10–100x cheaper than running Opus or GPT-5.

June 8th

26:04

David Ondrej · Tutorial

Hermes /goal is insane

A 26-minute step-by-step tutorial on the agentic loop command that runs until your goal is actually done.

May 16th

Chat about this