Modern Creator
Nick Puru | AI Automation · YouTube

How I run Claude Code with Deepseek 100x cheaper

A documented, officially supported two-line swap that routes Claude Code through DeepSeek v4 — and cuts the bill by 100x for everyday coding work.

Posted
3 days ago
Duration
Format
Tutorial
educational
Views
2.3K
78 likes
Big Idea

The argument in one line.

You can route Claude Code through DeepSeek v4's API for a 100x cost reduction on routine coding work, but must switch back to Claude for MCP servers, vision tasks, prompt caching, and multi-file debugging.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • A developer or technical founder running Claude Code regularly who spends $50+ monthly on API costs and wants to cut expenses without changing workflows.
  • Someone building internal tools, dashboards, or automation scripts who needs Claude Code's capabilities but operates on tight margins or bootstrap budgets.
  • A technical operator already familiar with API endpoints and environment variables who wants a documented, officially supported way to route Claude Code through DeepSeek v4.
SKIP IF…
  • You rely on Claude Code for vision tasks, multi-file debugging, prompt caching, or MCP integrations — the video explicitly identifies these as failure modes with DeepSeek v4.
  • You need production-grade reliability guarantees or SLAs — this setup routes through a third-party model and endpoint outside Anthropic's official support structure.
  • You're building complex reasoning systems or doing hard inference work where the 20% performance gap between DeepSeek v4 and Opus actually matters to your output quality.
TL;DR

The full version, fast.

Routing Claude Code through DeepSeek v4's officially documented API endpoint cuts everyday coding costs by roughly 100x while delivering near-identical output quality for routine work. The setup takes five minutes: grab a DeepSeek API key, then prompt Claude Code itself to configure the shell environment variables per DeepSeek's official documentation, after which the model picker exposes DeepSeek chat as a swappable default. The smart pattern is hybrid, not full replacement � keep DeepSeek as your daily driver for boilerplate and standard builds, then flip to Opus or Sonnet via slash model for the four workloads where DeepSeek breaks down: MCP server integrations, vision and screenshot tasks, prompt-cached agent loops, and deep multi-file debugging across large codebases.

Members feature

Chat with this breakdown.

Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.

Create a free account →
Chapters

Where the time goes.

00:0000:45

01 · Cold open — claim and proof

Dashboard demo result shown first. Cost: 2 cents. Same quality as Opus output. Promise of honest limitations built into the setup.

00:4501:13

02 · Is it allowed?

DeepSeek officially documented the integration. Not a hack. They published the endpoint and instructions themselves.

01:1302:12

03 · Why the swap works now

DeepSeek v4 dropped in April with MIT license. SWE-bench verified score in the 80% range — same neighborhood as Sonnet 4.6 and Opus 4.7. Open weights, self-hostable.

02:1202:56

04 · Pricing breakdown

Opus 4.7: $5/$25 per M tokens. DeepSeek v4 Flash: 14c/28c per M tokens. Gap shows up fast on real workloads.

02:5603:27

05 · Sponsor — Snapdragon

HP Omnibook X powered by Snapdragon Elite. All-day battery on heavy AI workloads, fans never spin up.

03:2706:39

06 · Setup walkthrough

Navigate to deepseek.com, create API key, paste one prompt into Claude Code: set up DeepSeek as Claude Code provider using official DeepSeek docs method. Claude Code handles the PowerShell profile config automatically.

06:3908:23

07 · Verify and fresh shell

Close terminal, open fresh project folder. Run Claude, type /model — DeepSeek chat already selected by default. /usage shows 13 cents total spend so far.

08:2310:54

08 · Live demo — ROI calculator

One prompt: build a DeepSeek vs Claude Code ROI calculator as a single HTML file with three inputs, live calculations, and a bar chart. Output in 28 seconds. Total cost: 1 cent.

11:1712:35

09 · Sponsor — Snapdragon chip story

7 hours into workday, 10 terminals open, multiple Claude Code sessions, video rendering — no fan spin-up. Dedicated AI engine in silicon keeps CPU free.

12:3516:55

10 · Limitations — the honest breakdown

4 failure modes: (1) MCP servers silently dropped, (2) vision not supported, (3) prompt caching edge case flips savings, (4) multi-file debugging needs 2-3 re-prompts vs Sonnet one-shot.

16:5518:13

11 · Hybrid workflow and CTA

Default to DeepSeek for everyday work. Hit /model for Opus/Sonnet on MCP, vision, or deep debugging. $200/month pilot becomes $20/month.

Atomic Insights

Lines worth screenshotting.

  • Running Claude Code with DeepSeek v4 instead of Anthropic's models costs 100x less — enabled by a two-line environment variable swap in under 5 minutes.
  • DeepSeek officially documented their endpoint and published exact instructions for connecting to Claude Code, OpenClaw, and other agent runtimes — this is a supported integration, not a hack.
  • DeepSeek v4 scored in the 80% range on SWE-bench Verified — the same neighborhood as Claude Sonnet 4.6 and Opus 4.7 on coding benchmarks.
  • A full polished ROI calculator dashboard built from one prompt in 28 seconds cost 1 cent — a comparable Opus run costs 10 times more.
  • Four specific failure modes exist: MCP tool calls, vision/image processing, prompt caching, and multi-file debugging across large codebases — and people who hit these burned out on the setup early.
  • The operators still paying full Anthropic prices are, in large part, the ones who got burned by these failure modes before DeepSeek v4 caught up with the frontier.
  • Open weights with an MIT license means anyone can self-host, fine-tune, or run DeepSeek locally — no single vendor dependency, no API rate limits at the model level.
  • DeepSeek v4 scored 'basically Sonnet' on coding tests — the performance gap that made the swap feel risky three months ago has closed.
  • The workflow that cuts a Claude bill in half is not about paying less per token — it is about routing routine coding work through the cheaper model and reserving the expensive model for the tasks where it actually matters.
Takeaway

Steal the hybrid model-picker workflow.

Operator playbook

Default Claude Code to DeepSeek for everything. Flip to Opus only when you hit one of four specific walls.

  • Sign up at deepseek.com, load $5, grab an API key.
  • Paste one prompt into Claude Code: 'set up DeepSeek as my Claude Code provider using the official DeepSeek docs method' — it configures itself.
  • Open a fresh terminal and verify via /model that DeepSeek chat is now the default.
  • Learn the four flip-to-Opus triggers: MCP integrations, vision/screenshot tasks, large-repo multi-file debugging, and any agent loop where Anthropic caching math flips.
  • Track spend with /usage — most operators report staying under $20/month for workloads that previously hit $200.
  • Never pitch this to your audience as a total switch. The hybrid framing is what makes it credible and safe.
Glossary

Terms worth knowing.

Claude Code
Anthropic's command-line coding assistant that runs in a terminal or IDE, takes natural-language instructions, edits files, and runs builds inside a project.
DeepSeek V4
An open-source large language model released in April with MIT-licensed weights, competitive with top closed models on coding benchmarks at a fraction of the price.
API endpoint
The web address an app sends requests to in order to talk to a model provider. Swapping endpoints lets one tool route its work through a different company's models.
Environment variables
Named values stored in the operating system or shell that programs read at startup to configure themselves, commonly used to pass API keys and endpoint URLs without hard-coding them.
Shell config file
A script the terminal runs every time a new session opens (such as a PowerShell profile or .bashrc), used to set environment variables and aliases permanently.
Open weights
A model whose trained parameters are publicly downloadable, allowing anyone to run it locally, fine-tune it, or host it themselves instead of relying on a vendor's API.
MIT license
A permissive open-source license that allows commercial use, modification, and redistribution with minimal restrictions, making the licensed software safe to build products on.
SWE-bench Verified
A benchmark of real-world software-engineering tasks pulled from open-source repositories, used to compare how well models can fix bugs and implement features across multi-file codebases.
Input tokens / output tokens
The units language model providers bill by. Input tokens are the text sent to the model; output tokens are the text it generates back. Pricing is quoted per million tokens.
Sonnet 4.6 / Opus 4.7
Tiers of Anthropic's Claude model family. Sonnet is the mid-tier balance of speed and capability; Opus is the most powerful and most expensive option.
OpenCode / OpenClaude / Hermes
Alternative open-source coding agents and CLI front-ends that, like Claude Code, can be pointed at different model providers via environment variables.
WSL
Windows Subsystem for Linux, a feature that runs a real Linux environment inside Windows. Installers often ask whether to target the Windows host shell or the Linux subsystem.
PowerShell profile
A startup script Windows PowerShell runs each time a new session opens, used to permanently set environment variables, aliases, and functions.
/model command
An in-session Claude Code command that lists available models and lets the user switch which one handles the next prompts without restarting the tool.
/usage command
A Claude Code command that shows tokens consumed and dollars spent in the current session, broken down by model.
One-shot
Producing a complete, working result from a single prompt with no follow-up corrections needed.
MCP (Model Context Protocol)
An open protocol that lets a coding agent connect to external tools and data sources such as file systems, Linear, or Notion. Providers must implement it for the integrations to fire.
Vision (in LLMs)
A model's ability to accept images as input and reason about them, such as reading a screenshot, parsing a chart, or critiquing a UI design.
Prompt caching
A provider-side optimization that stores the prefix of a long prompt so repeated calls reuse it at a discount, valuable for agents that send the same large system prompt over and over.
Agent loop
A program that repeatedly calls a language model in a cycle — read state, decide, act, observe — to complete a task autonomously over many steps.
Resources Mentioned

Things they pointed at.

03:27productDeepSeek.com
02:56productHP Omnibook X (Snapdragon Elite)
Quotables

Lines you could clip.

00:28
I ran this for about 2 cents. Is this the price for a stick of gum.
Punchline delivery after showing a polished dashboardTikTok hook↗ Tweet quote
02:50
The difference on any routine work, it's imperceptible — but the bill, however, is not.
Clean contrast setup, standaloneIG reel cold open↗ Tweet quote
16:33
If you take one thing away from this video, do not switch entirely.
Counterintuitive advice from someone advocating the switchTikTok hook↗ Tweet quote
17:02
The whole thing that was just too expensive to leave running around the clock just got cheap enough to actually leave running around the clock — and that's the real shift.
Strong business unlock framingnewsletter pull-quote↗ Tweet quote
The Script

Word for word.

metaphoranalogystory
00:00What if you can run Claude code a 100 times cheaper than whatever you're using right now? Now this is Claude code, but the model under the hood, it actually isn't even Claude. It's DeepSeek v four, the newly released open source model that's essentially just trading blows with Claude on any coding benchmarks.
00:15And this is all with just a two line swap, and it's a five minute setup. And it basically cuts the bill by a 100 x. So I just had to build me this entire dashboard from one prompt.
00:24Basically, just one shot at it, and it's a polished design. We have all the live calculations, and it took about three minutes from start to finish.
00:31This is the kind of output that you would be expecting from one of the top tier models like Opus. Now if we were to take this and do this on Opus, this would have cost me, you know, 10 times the amount. I ran this for about 2¢.
00:41Is This the price for a stick of gum. Now in the next ten minutes, I'm gonna be showing you the one prompt that does the entire install, the workflow that cut my Claude built in half, and most importantly, the specific kind of work where this whole setup actually does fall apart because nobody's being honest about that. So two questions you're probably already asking about this.
00:57Is this even allowed? And what's the catch? So to first answer that, yes.
01:00It is allowed. DeepSeek, they officially documented this themselves. They even published the endpoint.
01:04They published the exact environment variables, and they wrote the instructions for actually connecting their models into ClawdCode, into OpenClaw, and really anything else.
01:15They just want people to actually start doing this. Now the catch is the whole reason this video actually exists. So there is one specific category of work where, you know, this entire thing is breaking.
01:24And the people who got burned trying it earlier this year, they are why most operators still pay full price, and I will get you there. Just stick with me through the entire build. So just real quick on why the swap actually works right now because, you know, just about three months ago, it didn't.
01:37So all these open source coding models, they were very close, but you would just honestly fill the gap on anything past a boilerplate. But that recently changed in April when Deepsea v four dropped. You know, we have the open weights, MIT license, you know, that just means anyone can self host it.
01:52They can fine tune it or just run it locally instead of being, you know, behind a single vendor's closed API. And on the benchmarks that actually matters for coding, it finally caught up with all of this. So for example, on the SWE bench verified v four pro, it scored in the 80% range.
02:08You know, that's the same neighborhood as Sonnet 4.6 and Opus 4.7. Now on coding tests, v four is basically Sonnet, and on any hard reasoning, it's about 80% of OPUS.
02:18So for most of what Claude code actually does day to day, that gap doesn't really show up whatsoever. Now when it comes to the pricing, OPUS 4.7, it's $5 per million input tokens and $25 per million output. But DeepSea's v four flash, it's 14¢ per million input, 28¢ per million output.
02:34In the gap, it really will be showing up quite fast for you, pretty much the moment that you actually start running real workloads for this. I quite literally run three AI companies, and I've been pointing Claude code at this DeepSeek endpoint for the last couple of weeks across all three. And we're shipping the same code that I was shipping, you know, last quarter on Anthropix API.
02:52Now the difference is in actually comparing these two on any routine work, it's imperceptible, but the bill, however, is not. Now just a quick note, what I'm actually using is to run this entire thing.
03:01My laptop, it's powered by Snapdragon, which is effectively what's running this entire stack. And this video, it's actually sponsored by Snapdragon. So link will be down below in the description to check them out, but I've had this over a couple of weeks now, and I don't throttle on any long coding sessions.
03:16My battery lasts all day, and the cost I'm about to walk through, it plays even better when the hardware is not burning power either. Now anyways, for step one, we're just going to navigate over into deepsea.com. If you don't have an account already, just make sure it's to go ahead and sign up.
03:31It's completely free to make, so you only have to actually pay for what you use. And I dropped in about $5 just a couple of weeks ago, and I've barely touched this. But anyways, once you have signed up for your account, go ahead and navigate over into your API keys, create a new key.
03:44We will need this in a second. Now once we actually grab this, it's gonna get a little more interesting because we're just gonna use Cloud Code to actually install everything else, make it extremely simple for us to start using. Now once we do have our API key, we're just going to utilize Cloud Code to actually install everything for us.
03:58Now quick thing that I do want to flag before we actually keep going is that DeepSeek, they officially launched their own documentation showing exactly how to do this. So again, just how to actually connect their models to Cloud Code, and this is not a workaround. And this isn't some sort of like sketchy workaround or anything like that.
04:13This is not a hack. They wrote the instructions themselves. This is official.
04:17So you can see like this is their page. The official method is to copy six lines of code and paste them into a config file on your laptop. It's a little bit technical, but it does work.
04:24Link will be down in the description if you want us to it that way, but let me show you how to actually do it the easiest way. Oh, and one more thing worth knowing is that the same swap, it also works if you're using OpenCode or OpenClaw or Hermes, and DeepSeek officially supports all three of these. So we're staying on ClawdCode today, but the same idea carries over if your tool choice is going to be different.
04:42A pasting environment variables into shell config files is the kind of thing that makes operators, you know, just bounce off tutorials. I'm gonna show you a friend of your path that uses a tool. Alright.
04:52So we have Clog code running inside of our IDE. I'm using Versus code. You can use whatever you want to actually utilize this.
04:58You can use the terminal. You could also use the regular desktop application of Claude, but we're just going to paste this in.
05:05So what I'm saying specifically is set up DeepSeek as my Claude code provider using the official DeepSeek documentation method. So I'm effectively just copying what is in documentation of DeepSeek.
05:16And more importantly, we're going to leverage the power of Cloud Code to make it as simple as possible for us to actually install this system and make it a 100 times cheaper to actually be using Cloud Code moving forward. Now one important thing before we actually do send this off is that key that we actually copied from DeepSeek earlier.
05:32This is where we need to replace it. So right here, we can just replace this with our API key. Now we're just going to press enter, run this off, and really, we're just telling Clog code to do three different things in one prompt.
05:46First, to just go check my shell config file. So that's my file computer, which is going to actually be reading every time that I open a brand new terminal.
05:55And from there, it's going to just clean up any leftover deep seek settings from any older experiments. So we're just going to be starting fresh. So it's asking me how do you want to configure DeepSeek as your quad code provider on Windows.
06:08I'm just going to say a PowerShell profile. Now it's asking, did you mean this for WSL or Linux machine instead of this Windows host? I'm gonna say no configure this Windows machine, and we'll run that from there.
06:19It'll ask for a few more permissions that you wanna proceed with this PowerShell. I'll just go ahead and say yes, but you'll also notice in the prompt that I was asking it to present me the results so that I can just confirm that everything actually worked. Alright.
06:30So everything just finished installing. We got it all wired in. Now let's actually get started using this.
06:35But I do wanna say that when quad code edits your shell config, the terminal that you're currently running this in, it's not going to be picking up the new settings. So whatever you just installed, only a brand new shell will be picking that up.
06:48So instead of trying to refresh this specific window that we have right here, we just have to just close completely, open a fresh new project folder. It's gonna be a lot cleaner that way. So I went ahead and created a brand new folder.
07:00We've got a fresh window. We got a fresh shell. Now what I'm going to do is I'm just going to open up the terminal, and we're just going to run Claude code.
07:08So I'm just going to type out Claude. From here, if we actually expand on this a little bit, you can see it's automatically going to start up Claude for us. So we could run this in the dark mode, and we'll just press enter.
07:19Press enter again and use the recommended settings. Go ahead and trust this folder. Now we can see Claude code is fully loaded up, and we could expand this as much as we want.
07:27And if I just type out model again, we'll be able to pull up all the different models that we will have listed out. So you could see we have the defaults, of course, Haiku, and then we have down at the bottom, the most important DeepSeek chat. Currently, it's already selected by default because that's what we set in, uh, config.
07:43So most of the day, I'm just gonna be on DeepSeek, and I don't really think about it. And the bill automatically stays very low for us. If I just back out and just say, what model are we running on?
07:54You'll see what we get. We're running on DeepSeek chat. And if I just type out slash usage, we'll actually be able to see all the different models that we are running specifically and what's actually getting the charge.
08:05So DeepSeek, 24,000 inputs, about 400 outputs.
08:09And as you see, I've only spent about 13¢ on this. Anyways, the setup is completely done. Now let's get into some real stuff.
08:15So some quick context before I actually show you the build because this is the actual workflow that I have been running where most of the day, I'm going to be on DeepSeek, ClaudeCode. It looks completely identical.
08:26The work's going to get done. Our bill is going to stay very, very low. And when I hit something that is going to be heavier where I do need more complex demands from ClaudeCode, that is just where I can actually, you know, have a complex multi file refactor.
08:41I can have a vision task, something that just needs deeper reasoning. I just hit slash model, and I can so easily just flip between utilizing the default Opus 4.7.
08:52So that's gonna be the whole rhythm within this. The default, it's gonna be DeepSeek chat. And then when we do need something a little bit more hands on, then we'll be utilizing the default Opus 4.7.
09:02Now to show you that DeepSeek is genuinely capable of very real work, not just, you know, toy examples. I'm going to give it just one prompt right now and have it build me a real working dashboard. I'm gonna have it build out a calculator that figures out how much you save by routing routine work to DeepSeek.
09:18So this is the most on theme demo that I could actually think of for this I'm just gonna back out of this, and what we're going to paste in is this prompt right here. I'm just gonna run it off. Build me a single page interactive dashboard called DeepSeek versus Cloud Code ROI calculator as just one index HTML file.
09:33So it should take three user inputs. So the monthly Claude code spend in US dollars, the percentage of tasks that are routine slash boilerplates, and then the average deep sea cost ratio versus Claude and defaults to one out of a 100.
09:46Now number two is the live calculate and display. So just give me the monthly savings if it is routine work running to deep sea, the annual savings, the five year savings, and a simple bar chart comparing current versus the per monthly or the new monthly cost instead. Number three is just to use a clean modern design, and I'm just getting a little bit picky about how I want it to actually look.
10:06And then make sure that all values update live as inputs change. No submit button. So we scroll down just a little bit.
10:12We can see it already finished up generating this. I mean, that took how long exactly? That was twenty eight seconds.
10:18So let's open this up and we can before we do that, we could see what it does. So the monthly spend, the routine task slider, and the cost ratio. So I'm gonna open this up real quick inside of our folder.
10:30Cool. So here we are. The DeepSeek versus Claude code RI calculator.
10:33So our monthly Claude code spend, we're just on the $200 subscription. So the routine slash boilerplate tasks, we have about 60%, and we could also configure this as well.
10:42Then we have the DeepSeek cost ratio versus Claude. Now the monthly savings, so it's about a $119 versus $81. Annual savings, it's gonna be about $1,400, and the five year savings, $7,000.
10:54Let's scroll down a little bit further and we can actually see the bar chart. So this is going to compare, um, Claude plus DeepSeek.
11:01And then we scroll down, we can see the bar chart just comparing current, which is just all Claude versus a hybrid of Claude plus DeepSeek. Now to generate something like this even just a couple of months ago, I mean, we simply would not have been able to create this unless I was using some sort of supercomputer and honestly, like none of the open source models, they just were not there yet, but I genuinely think we are there right now.
11:23Now earlier in this video, mentioned I was using a Snapdragon chip and that is quite literally what makes my workdays possible using something like this. It is what these chips do with AI workloads. So right now as I'm filming this, about seven hours into the workday, Clog code, it has been running builds all in the background, terminals open all over the place, and all the fans, they haven't even spun up once.
11:44And the part that actually surprised me the most is it's not just Claude code that's actually staying quiet, it's everything at one. I have 10 different terminals open. I have multiple Claude code sessions running in parallel in a video rendering on top of that and a browser windows just everywhere.
11:57And the chip, it literally absorbs all of this like a bulletproof vest. And the reason for that is because Snapdragon, it has a dedicated AI engine just built directly into the silicon. The So heavy AI inference work, it doesn't fight the CPU for any power.
12:10The CPU, it's instead staying free for the rest of your work. And that matters for the cost story in this video specifically because the API bill is only half of it. And other half, it's just a hardware actually running underneath that.
12:22So if your laptop is just overheating and burning battery every time you run a build like this, you stop running them and then you batch your work, you wait until you're plugged in and you just avoid the long sessions and you're just not leveraging and capitalizing, um, you know, all the tools at our disposal such as AI.
12:38So overall, that's lost productivity, lost time and lost money for you in your pockets. But anyways, going back to this specific build, I mean, can change this monthly clot code spend, maybe it's gonna be about $300.
12:49I usually don't spend more than $300 a month and you can see the savings. About a $170 versus a 122, the annual savings, and of course, the ten year savings as well.
12:59Now if we go back inside of DeepSeek, you could see I literally only used 1¢ to build that dashboard. Now this is why I actually defaults to this for most of my work because it's not just adequate. It's actually very, very good.
13:10Now anyways, I do wanna get honest and cover some of the limitations when it comes to actually going with this approach. So from my experience, I have been just routing my quad code work through DeepSeek for about three weeks now. I love it for what it actually does well, but I'm also gonna tell you exactly what it doesn't do because the operators, the builders, who are actually getting burned trying this early in the year, they got burned on these specific things, and I would rather you know right now.
13:33So four things to be flagging. First one I could actually show you on the screen right now, it's the MCP servers.
13:39So if you use quad code seriously, you've probably been wiring up MCP integrations. You have, like, file system access, linear, notion, whatever it may be. MCP, it's going to be the protocol that lets ClaudeCode actually reach out and talk to all of your tools.
13:53So it's a big part of why ClaudeCode is, you know, powerful for real work. Watch what happens when I actually check MCP while routed through DeepSeek.
14:00I'm gonna type out slash MCP, and it's going to say no MCP servers configured. Please run slash doctor if this is unexpected. So we're getting nothing, and the reason isn't that I forgot to configure any.
14:11It's just that DeepSeek's endpoint does not support the MCP protocol whatsoever. So DeepSeek, they actually officially documented this in their API compatibility table where MCP calls, they're flagged as just being ignored. So they're not broken.
14:24They're not throwing errors. It just quietly dropped it on the floor and they didn't really like regard it whatsoever. So even if you had 10 MCP servers wired up and working perfectly through Anthropic, the moment you start using DeepSeek, you know, it's not gonna fire any of those.
14:37That's why I would be switching back to Opus the second that I am in MCP territory, or you can just go to do something else like utilizing Sonnet or Haiku. You don't need to kill every single process with, you know, high premium model like Opus. Now number two, it's going to be vision.
14:52So if you've ever pasted a screenshot into Claude code just expecting the model to read what's on the screen, you know, like debug a UI bug from, a screenshot, pull data out of a chart or, like, looking at a design, like, capability, it lives on Anthropix side and DeepSeek's coding endpoint. It doesn't process images whatsoever.
15:11So anytime my work involves looking at something visual, I have to flip to a different model from Anthropic like Sonnet or Opus. Number three is the prompt caching. So this is like the quietest I would say, of all the four.
15:23So in Thropic, it gives you a discount when you actually reuse long system prompts across sessions because they cache the prefix on their end. Right? So for most of what you and I do day to day, that doesn't really matter where each session, it's gonna be completely fresh.
15:37But if you run agent loops with massive system prompts that are just going to be repeating all day every day, like maybe an SDR agent or a background worker hitting the API every single minute, the Anthropic cash discount can quietly become bigger than the DeepSeek savings on those exact workloads. So it's definitely worth modeling all of this out before you actually commit a heavy production workload.
16:00Now number four, and this is the one that I've actually felt the most in my actual work, is the multi file debugging across a large code base. So when you are three layers deep trying to just figure out why a request is going to so for example, if you're three layers deep trying to figure out why a request is failing in a multi service repo for you, DeepSeek, it usually needs two or three follow-up prompts where Sonnet would often just one shot it for you.
16:22So the token savings, it does vanish on tasks like that because you are having to re prompt, you know, four times instead of just the one shot. So whenever I do hit deep multi file debugging, I flip to Opus, I do that part, I flip right back.
16:36If you take one thing away from this video, do not switch entirely. Don't just go straight into DeepSeek as I showed you reasons why. You want to be using the model picker the way that I have been using it.
16:45So your default, it points quad code at DeepSeek. That's your everyday setup and most of your work, it runs just right through it. Your bill, it's going to be dropping significantly and when something hard actually breaks like a multi file bug or a vision task, an MCP heavy workflow, you just hit slash model, pick out Opus or Sonnet, whatever it may be and you can flip to that, flip back later.
17:07So with that being said, is only going to be applicable if you are going to be utilizing the API. So some of you, you do utilize a cloud subscription. So in that case, it's not going to matter as much unless you're constantly hitting your limits.
17:19So with that being said, just stop paying full price for Cloud when most of your work, it does not need it. So that's the real unlock. So if you're an operator running automations or any internal tooling and Cloud Code, this literally turns a $200 monthly pilot into a $20 pilot.
17:33So the whole thing that was just too expensive to leave running around the clock just got cheap enough to actually leave running around the clock, and that's the real shift in what is actually worth building right now. But anyways, just a quick thanks again to Snapdragon for sponsoring this video and for the laptop that is been powering the entire build that you just watched.
17:51So link will be down below in the description if you wanna check it out. Highly recommend them. But any case, thank you guys for watching.
17:56If you are interested in getting more hands on approach, learning all stuff like this right when it drops, then make sure to check out our free school community and also our weekly AI newsletter. And if you're a business owner looking to implement AI into your business in 2026, then make sure to book in a call with our team.
18:09Link will be down below in the description. But, again, thank you guys for watching. I'll see you in the next video.
The Hook

The bait, then the rug-pull.

The claim lands before the intro music fades: run Claude Code a hundred times cheaper, with a two-line swap and a five-minute setup. Nick Puru then immediately cuts to a polished multi-input dashboard he just built for two cents — the kind of output you'd expect from Opus — and the cost case is made before he even explains how.

Frameworks

Named ideas worth stealing.

16:55model

DeepSeek as Claude Code default — hybrid model picker

  1. Default: DeepSeek chat for routine work
  2. Flip via /model to Opus for: MCP tasks, vision, multi-file debugging
  3. Never switch entirely

Set DeepSeek as your global Claude Code default. Use /model to jump to Opus or Sonnet only when the task hits one of the four failure modes. Jump back after.

Steal forAny tutorial on AI cost optimization or operator workflow
13:40list

Four DeepSeek Failure Modes

  1. MCP servers (silently ignored)
  2. Vision / image processing (not supported)
  3. Prompt caching (Anthropic discount can exceed DeepSeek savings on agent loops)
  4. Multi-file debugging across large repos (needs re-prompts)

Specific conditions where routing through DeepSeek costs more or breaks functionality vs staying on Anthropic.

Steal forHonest limitations section in any AI tool review or comparison video
CTA Breakdown

How they asked for the click.

17:53next-video
Check out our free School community and our weekly AI newsletter. If you're a business owner looking to implement AI in 2026, book a call with our team.

Tacked on post-conclusion with three separate asks — community, newsletter, consultation call. Feels rushed after a strong payoff line.

Storyboard

Visual structure at a glance.

hook
hookhook00:00
credibility
promisecredibility00:45
benchmarks
valuebenchmarks02:12
setup
valuesetup03:27
live demo
valuelive demo08:23
limitations
valuelimitations13:40
hybrid workflow
ctahybrid workflow16:55
Frame Gallery

Visual moments.