Big Idea

The argument in one line.

DeepSeek V4's Anthropic-compatible endpoint turns Claude Code into a frontend that can route to a 10-30x cheaper model with two environment variables, and pairing both models in separate terminals is the setup that actually captures the savings without sacrificing output quality on complex tasks.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You use Claude Code and your API bills or Max plan cost is eating into your margin.
You run an AI service business and want to lower tooling costs without changing client-facing output quality.
You want a practical routing framework for deciding when to use Claude versus a cheaper model, not just benchmarks.

SKIP IF…

You handle sensitive client data or proprietary source code — the video itself warns that DeepSeek servers are China-based and should not receive confidential credentials.
You want a rigorous technical benchmark comparison rather than a setup walkthrough.

TL;DR

The full version, fast.

Claude Code's API cost can reach $5,000/month for heavy users, which blocks many people from using it daily. DeepSeek V4 ships with an Anthropic-compatible API endpoint, so Claude Code can be redirected at DeepSeek servers by setting two environment variables with no workflow change. The real insight is task routing: complex reasoning and polished output stay on Claude, while unit tests, boilerplate, and refactoring route to DeepSeek at a fraction of the cost. For AI service businesses, the margin impact of this stack is the primary value.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 00:52

01 · Hook + channel intro

Cost pain point established, AI avatar introduced, free masterclass CTA

00:53 – 01:47

02 · Claude Code cost reality

$200/mo plan, $5,000/mo API worst case for heavy users, session limits draining in 20 minutes

01:48 – 02:50

03 · DeepSeek V4 introduced

1.6T parameters, 1M context window, MIT license, $0.14-$0.43/M tokens vs $5/M Claude Opus

02:51 – 03:53

04 · The Anthropic-compatible hack

DeepSeek's official Anthropic-compatible endpoint; DeepCloud GitHub project claims 17x cheaper; real-world 95x reports

03:54 – 05:06

05 · Setup walkthrough

Two environment variables, settings file, terminal alias, DevTK guide link, China data-privacy warning

05:07 – 05:53

06 · Mid-video CTA

AI Cashflow Masterclass pitch introducing the AI guilt concept

05:54 – 07:10

07 · Task routing framework

Claude for complex/polished output; DeepSeek for volume/boilerplate; context caching advantage per session

07:11 – 08:13

08 · Income opportunity

Upwork 109% YoY AI freelance growth, 178% AI integration; margin math for service businesses

08:14 – 10:05

09 · Mindset + outro

Build stacks not subscriptions; $5 test before over-researching; personal leverage framing; final CTA

Atomic Insights

Lines worth screenshotting.

Claude Code can be pointed at DeepSeek V4 with two environment variables — ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN — no code or workflow change required.
DeepSeek V4 Flash costs $0.14 per million input tokens versus Claude Opus at $5.00, a 35x price difference on the API side.
Real-world developers have reported 95x savings using the Claude Code plus DeepSeek stack, not just the 17x figure from the DeepCloud documentation.
DeepSeek automatically caches context from previous requests in a session, making extended work on the same codebase progressively cheaper.
The winning setup is both models running simultaneously: Claude in one terminal for complex work, DeepSeek in another for volume tasks.
Session limits on the $200/month Max plan were draining in under 20 minutes in early 2026, making cost-saving alternatives more urgent.
Demand for AI integration services on Upwork grew 178% year over year in 2026 — the tooling cost reduction directly expands the margin for service providers.
A $5 DeepSeek test run will tell you whether it fits your workflow in 30 minutes — over-researching before testing is the most common mistake.
The mechanic's toolbox mental model applies to AI stacks: the right model for the right job, not one model for everything.
DeepSeek V4 trails Claude on the most complex, agentic, reasoning-heavy work; for standard implementation tasks, it holds up well at a fraction of the cost.

Takeaway

Route AI tasks by cost, not by habit.

WHAT TO LEARN

Defaulting to the most capable model for every task is an expensive habit — and two environment variables are all it takes to stop doing it.

Claude Code reads ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN at startup, so redirecting it to any Anthropic-compatible API requires zero workflow change.
DeepSeek V4 Flash costs $0.14 per million input tokens versus Claude Opus at $5.00 — the gap is large enough to matter even for modest usage volumes.
DeepSeek trails Claude on complex multi-step reasoning and agentic tasks but holds up well on unit tests, boilerplate, and repetitive refactoring where output just needs to be correct.
Context caching in DeepSeek makes extended sessions on the same codebase progressively cheaper — the longer you work on one project in a session, the lower your per-request cost becomes.
Running both models in separate terminals simultaneously is the practical move: route by task type, not by which window is already open.
A $5 test run through your actual workflow is worth more than any benchmark table — you will know in 30 minutes whether the quality holds for your specific use case.
For anyone billing AI services to clients, tooling cost is the primary margin lever — client pricing stays the same, quality stays the same, and the cost difference goes straight to margin.

Glossary

Terms worth knowing.

Anthropic-compatible API endpoint: An API that accepts requests in the same format as Anthropic's Claude API, allowing tools built for Claude to route to a different model provider without code changes.
Context caching: A technique where repeated input tokens from previous requests in a session are stored and reused at lower cost, reducing the price of extended work on the same codebase.
DeepCloud: An open-source GitHub project that packages Claude Code's agent loop to run on DeepSeek V4 Pro or any Anthropic-compatible backend, documenting a claimed 17x cost reduction.
Claude Code Max plan: Anthropic's top-tier Claude Code subscription at $200/month, which includes session-based usage limits that can exhaust in under 20 minutes for heavy coding workloads.
ANTHROPIC_BASE_URL: An environment variable that Claude Code reads to determine which API server to send requests to, allowing redirection from Anthropic's servers to any compatible endpoint.

Resources

Things they pointed at.

04:04toolplatform.deepseek.com ↗

05:10toolDeepCloud (GitHub)

04:42linkDevTK.AI setup guide ↗

01:16linkCloudZero 2026 Claude Code pricing breakdown

06:59linkVerdant technical comparison of DeepSeek vs Claude in Claude Code context

07:29linkUpwork 2026 In-Demand Skills Report

Quotables

Lines you could clip.

07:13

“The people actually building things that work aren't using one tool for everything. They're building stacks.”

Standalone punchline, contrarian framing, no setup needed→ TikTok hook↗ Tweet quote

08:37

“It's like a mechanic's toolbox. You don't grab the same wrench for every bolt. You match the tool to the task.”

Clean analogy, immediately applicable, works without context→ IG reel cold open↗ Tweet quote

08:40

“Numbers on paper don't pay anything. Spend five dollars. Run your actual workflow through it. You'll know in thirty minutes whether it works for you.”

Anti-over-research closer, punchy and universally applicable→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy

What if I told you there's a way to use one of the most powerful AI coding tools in the world and slash the cost by up to 100 times? Same tool, same workflow, same results, just a fraction of the price.

Most people using this tool right now have no idea this is even possible, and the people who figure this out first are going to have a serious edge over everyone still paying full price. Stick around because this one completely changes the map. Hey there.

So if you're new here, I'm Nick Ponti's AI avatar. While the real Nick is busy helping businesses with Maina Marketing, Hawaii's fastest growing marketing agency, I'm here dropping the latest AI hacks, tools, and money making strategies. The real Nick reads every single comment on these videos, so make sure you comment below.

And, hey, if you're serious about landing some AI subscription based customers, grab my AI cash flow masterclass that I am currently offering for free. The link's in the description. Alright.

Now let's get into it. Let's talk about Claude Code. If you haven't heard of it, it's an AI assistant that runs right in your terminal.

It can write code, edit files, run commands, and work through entire projects almost on its own. It's one of the best AI coding tools available right now, but here's the catch. The top tier max plan costs $200 a month.

And if you're hitting the API directly, some heavy users report bills well over $1,000 a month. According to CloudZero's 2026 breakdown of Claude code pricing, some power users found their heaviest months would have cost over $5,000 at API rates if they weren't on the subscription plan.

And here's what makes it even more frustrating. Back in early twenty twenty six, developers on the $200 plan were watching their session limits drain in under twenty minutes. Not hours, twenty minutes.

If you've looked at Claude code and thought the price was just too steep, stay with me because there's a version of this that costs almost nothing.

You've probably heard of DeepSeq. They're the Chinese AI lab that shocked the industry when they dropped their r one model in January 2025. That model matched the performance of top American AI systems at a fraction of the training cost.

The whole AI world went a little crazy over it. Fast forward to April 2026, and they just released DeepSeek v four. MIT Technology Review called it the most significant release since the original r one launch.

We're talking 1,600,000,000,000 parameters, a 1,000,000 token context window, and it's released open source under the MIT license.

Built specifically for coding, reasoning, and long running agentic tasks. Now here's the pricing. DeepSeek v four Flash, the faster, lighter model, costs just 14¢ per million input tokens.

DeepSeek v four Pro, the flagship, costs 43¢ per million input tokens. Compare that to Claude Opus at $5 per million input tokens.

We're talking 10 to 30 times cheaper just on the API side. And DeepSeek officially announced v four as directly integrated with Claude code right in their own release notes. Here's where it gets really interesting.

DeepSeek has an Anthropic compatible API endpoint. In plain English, that means Claude code, the exact terminal tool you already know, can be pointed at DeepSeek servers instead of Anthropics. Claude code doesn't know or care.

Same commands, same workflow, same interface. The brain just switched to a much cheaper model. There's an open source project on GitHub called DeepCloud that breaks this down clearly.

They describe it as the Cloud Code Framework running on DeepSeek v four Pro as the engine. According to their documentation, this configuration can run up to 17 times cheaper than standard Cloud Code billing.

Other real world developers have reported savings of 95 times or more depending on workflow and usage. This isn't some sketchy hack. DeepSeek built this compatibility intentionally.

It's in their official API documentation. And here's the move most people haven't tried yet. You can actually run both Claude and DeepSeek in separate terminal windows at the same time.

Claude in one window for the hard stuff. DeepSeek in another for everything else. Let me show you how to set it up.

Getting this running is simpler than you'd think. Here's the core of it. Head to platform.deepseek.com and create a free account.

You get starter credit just for signing up. Once you're in, go to API keys on the left side, create a key, and copy it. In your terminal, you set two environment variables.

First is anthropic underscore base underscore URL. Point it to DeepSeek's Anthropic compatible endpoint. Second is anthropic underscore auth underscore token.

Set that to your DeepSeek API key. Then you launch Claude code, and it routes through DeepSeek instead of Anthropic. For a more reliable permanent setup, you can create a settings file that Claude code loads on command.

Then make a simple terminal alias. One shortcut launches the deep seek version, another launches regular Claude, both available simultaneously. DevTK put together a full step by step guide with the exact variable names and model mappings.

Link is below. If you already have Claude code set up with an OAuth or Vertex configuration, there's a flag in the documentation you'll need to use to make sure the override sticks.

One important thing, DeepSeek servers are based in China. Don't put confidential passwords, API keys, or sensitive client data into DeepSeek prompts.

Use it for technical work. Keep anything private on the Claude side. Alright.

Pause here for a second. If you've been watching this thinking, okay. AI tools are getting cheaper.

The barrier is dropping, but I still don't have a clear path to actually making money with any of this, that's exactly what the free AI cash flow master class is built for. It's not a surface level overview. It's a step by step breakdown of the exact AI services businesses are already paying for, the simple workflows you can set up without a technical background, and how to find the businesses that have what I call AI guilt.

They know they're behind, they need help, and they're ready to pay someone. And it shows you how to turn that into a real, repeatable income stream. The link is in the description and pinned in the top comment.

Here's the part most people skip, and it might be the most valuable thing in this whole video. You can't just route every task to deep seek and call it a day. Think of this as a two person team.

Claude is your creative partner. Anything that has to look polished, communicate clearly, or handle complex multi step logic across a large code base, that's Claude's lane. It's stronger on long horizon reasoning and at producing code a human would actually wanna read.

DeepSeek is your engine. Unit tests, boilerplate, repetitive refactoring, data processing, high volume tasks where the output just needs to be correct. For these tasks, DeepSeek holds up well at a fraction of the cost.

Here's one more advantage worth knowing. DeepSeek automatically caches context from previous requests. So the more you work on the same project in a session, the cheaper each additional request gets.

Cached input tokens cost dramatically less than fresh ones. That makes extended sessions on the same code base surprisingly affordable over time. Verdant's technical comparison of DeepSeek and Claude in a Claude code context confirms the Flash model performs well on most standard implementation tasks, where it trails is on the most complex, agentic, and reasoning heavy work.

For that, keep Claude in play. The winning setup is both running at the same time. Claude for the hard thinking.

DeepSeek for the volume. Quality stays high. Monthly costs come down dramatically.

Let's talk about the actual opportunity for anyone who wants to use AI to build real income online. And I want you to hear this because you're exactly the kind of person who shows up early for things like this. According to Upwork's 2026 in demand skills report, demand for AI related freelance work grew 109% year over year.

AI integration services specifically jumped 178%. Businesses are hungry for help with this stuff, and they're actively paying for it. Here's the angle most people miss.

If you offer AI powered services to businesses, automation builds, internal tools, workflow systems, your biggest ongoing cost is your tooling.

With a hybrid Claude plus DeepSeek stack, you just cut that cost significantly. Your quality stays the same. Your client pricing stays the same.

Your margin gets a lot better. Think about what lower costs actually unlock. You can take on projects that wouldn't have made financial sense before.

You can test more ideas without burning through a budget. You can move faster because you're not rationing your usage. For a service business, that kind of operational freedom compounds over time.

Here's the mindset mistake I see constantly. People treat Claude code as a subscription product. You either pay the full price or you don't use it.

Because of that, a lot of talented people are leaving real leverage on the table. The people actually building things that work aren't using one tool for everything. They're building stacks.

Claude for the output that matters. DeepSeek for the volume. The right model for the right job.

It's like a mechanic's toolbox. You don't grab the same wrench for every bolt. You match the tool to the task.

The other mistake is over researching before testing. DeepSeek v four has impressive numbers, but numbers on paper don't pay anything. Spend $5.

Run your actual workflow through it. You'll know in thirty minutes whether it works for you. That's it.

I didn't grow up with money. There was no safety net. If something went wrong, I had to deal with it.

That's why I've always paid close attention to leverage. Doing more with less has been the only game that made sense to me. And this setup, it's one of the best examples of that I've seen in a while.

So here's where you are right now. One of the most powerful AI coding frameworks ever built, and you can now run it at a fraction of the cost. Same workflow, same results, dramatically different price.

And if you're building any kind of AI powered service business, that difference goes straight into your margin. If you want a clear step by step path to turning AI tools like this into real recurring income, not just watching videos, but actually executing, the free AI cash flow masterclass is where to start. Inside, you'll find the exact service types businesses are already paying for, the systems to deliver them, and you'll also get a free thirty day trial of the all in one software I personally use to manage clients, automate outreach, and run my entire operation.

It handles a lot of what used to take an entire team. The link is in the description and pinned in the top comment. Register for free.

And drop a comment below and tell me, are you already using Claude code, or is this your first time looking at it? I read every single one. See you in the next

The Hook

The bait, then the rug-pull.

The title promises 95% off an expensive tool and the video actually delivers a working setup, not a workaround. By redirecting Claude Code at DeepSeek V4's Anthropic-compatible endpoint, the same terminal workflow runs on a model that costs a fraction of Claude Opus — and the real value comes from knowing which tasks belong where.

Frameworks

Named ideas worth stealing.

03:41model

Two-Terminal Stack

Claude: complex reasoning, polished output, multi-step logic across large codebases
DeepSeek: unit tests, boilerplate, repetitive refactoring, high-volume correct-output tasks

Run both models simultaneously in separate terminals and route tasks by output requirement rather than defaulting to one model for everything.

Steal forAny AI service business workflow where cost per task matters

06:05concept

Task Routing Heuristic

Does this output need to look polished or handle complex multi-step logic? Claude. Does it just need to be correct and it is high-volume? DeepSeek.

Steal forPrompt engineering briefs, multi-model agent design

CTA Breakdown

How they asked for the click.

VERBAL ASK

09:25product

“If you want a clear step by step path to turning AI tools like this into real recurring income, not just watching videos, but actually executing, the free AI cash flow masterclass is where to start.”

Two CTAs total (at 5:10 and 9:20). Mid-video CTA uses the AI guilt concept to pre-qualify. Outro adds a GoHighLevel 30-day trial sweetener. Both use the same link-in-description and pinned-comment.

MENTIONED ON CAMERA

04:04toolplatform.deepseek.com ↗

04:42linkDevTK.AI setup guide ↗

FROM THE DESCRIPTION

PRIMARY CTAWhere the creator wants you to go next.