Modern Creator
WorldofAI · YouTube

Kimi K2.7 Code: Is It Really the Best Cheap Coding Model?

A 12-minute head-to-head that pits a trillion-parameter open-weight model against Opus 4.8 — and finds a 17-cent win, a 262K-token disappointment, and a model that earns its hype on cost, not on polish.

Posted
yesterday
Duration
Format
Tutorial
educational
Views
13.4K
331 likes
Big Idea

The argument in one line.

Kimi K2.7 Code completes coding tasks for 88% less than Claude Opus 4.8, but the high-speed mode doubles the price and the 262K context window falls far short of what a trillion-parameter model should offer in 2026.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You run agentic coding pipelines and are shopping for a cheaper model that handles multi-step tool calls without collapsing.
  • You want a concrete, numbers-on-screen cost comparison between Kimi K2.7 Code and Claude Opus 4.8 on real tasks.
  • You are evaluating open-weight models for API access without vendor lock-in and need current pricing data.
SKIP IF…
  • You need the highest output polish on UI-heavy coding tasks — Opus still wins that comparison cleanly.
  • You run long-context workloads like large repo reviews — 262K tokens is a real constraint at this tier.
  • You are not doing code generation; this model is narrowly optimized for coding, not general reasoning.
TL;DR

The full version, fast.

Kimi K2.7 Code is a trillion-parameter open-weight MoE model claiming 10% agentic improvement over K2.6, stronger instruction following, and 30% less overthinking. In a head-to-head strange-attractor coding benchmark, it completed comparable tasks in 6 minutes at 17 cents versus Opus 4.8's 5 minutes at $1.45. Output completeness was similar; Opus produced more polished UI. The $0.19 per million cached input tokens makes it compelling for volume workflows, but the 262K context window and a high-speed mode that doubles costs undercut the efficiency pitch for anything demanding long-context agentic runs.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:00

01 · Model intro

K2.7 specs, MoE scale, claimed improvements over K2.6 — better instruction following, 30% less overthinking

01:0002:12

02 · Independent benchmarks

ErdosBench smoke test places K2.7 second after Fable 5; reviewer gives honest caveat about cherry-picked evals

02:1203:26

03 · Sponsor — Docker Sandbox

Docker Sandbox pitched as isolated YOLO execution layer for Claude Code / Codex / Copilot agents

03:2604:24

04 · Agentic capability claims

World of AI benchmark results, 10% agentic improvement claim, multi-step tool calling improvements

04:2405:07

05 · Pricing and context window

On-screen pricing table: $0.19/$0.95/$4.00; context window only 262K — reviewer calls it underwhelming

05:0706:04

06 · Access and high-speed mode

KimiCode harness, API, quantized 325GB download; high-speed mode: 6x faster but 2x more expensive

06:0407:09

07 · Token efficiency + front-end demo

Token usage analysis; SaaS landing page with GSAP scroll triggers generated

07:0908:10

08 · macOS clone demo

Full macOS UI clone with SVG dock icons, dark/light theme, dock visibility toggle, Minesweeper, Safari

08:1009:02

09 · Head-to-head vs Opus 4.8

Strange-attractor benchmark: Kimi 17c/6min vs Opus $1.45/5min — both pass, Opus more polished

09:0209:58

10 · SVG generative art

Lava lamp physics, attractor visualizations — Kimi K2.7 scores high on SVG creativity tasks

09:5810:20

11 · Channel CTA

Discord invite, super thanks, channel subscribe request

10:2011:52

12 · Conclusion

Final verdict, Kimi K3 anticipation, GLM-5.2 multimodal comparison, closing

Atomic Insights

Lines worth screenshotting.

  • Kimi K2.7 Code costs 17 cents for a task that costs Claude Opus 4.8 $1.45 — an 88% cost reduction with comparable completeness but less polished UI.
  • The high-speed mode doubles Kimi K2.7's price, eliminating the core cost advantage and making the standard mode almost always the better choice.
  • A 262K context window on a trillion-parameter model released in 2026 is a real product weakness — the jump from K2.6's 256K is marginal.
  • Benchmark rankings that put Kimi K2.7 above frontier models often use evaluations weighted toward Kimi's specific strengths — real-world task quality is closer but still below.
  • The quantized version of Kimi K2.7 runs at around 325 gigabytes — local self-hosting is theoretically possible but out of reach for almost all individual practitioners.
  • Kimi K2.7 Code's 30% overthinking reduction over K2.6 matters in agentic loops where unnecessary reasoning tokens compound across multi-step tasks.
  • Open-weight models with multimodal capabilities — as Kimi K2.7 has, unlike GLM-5.2 — have a structural edge in agentic workflows that require vision inputs.
  • The macOS clone demo shows Kimi K2.7 generating SVG dock icons, dark/light theme toggle, and dock visibility controls — features most models fail to implement without explicit prompting.
  • Coding model performance-per-dollar is becoming the primary competitive axis: raw benchmark rankings are losing signal as every lab cherry-picks favorable evaluations.
  • Docker Sandbox's pitch — give the agent YOLO mode in an ephemeral container instead of babysitting every tool call — directly addresses the real friction in Claude Code and Codex workflows.
Takeaway

The cost advantage is real — but comes with real limits.

WHAT TO LEARN

An open-weight model that costs 88% less than frontier alternatives is only a win if you understand exactly where the savings break down.

  • Completing a coding task for 17 cents versus $1.45 is a real difference, but the output polish gap means Kimi K2.7 is better suited for scaffolding and iteration than final-quality delivery.
  • The high-speed mode doubles costs, eliminating most of the efficiency argument — defaulting to standard mode is almost always the right call unless throughput is the only variable that matters.
  • A 262K context window is a practical ceiling for large codebase work; knowing this limit before committing to a model for agentic tasks saves architectural rework later.
  • Benchmarks that a model's own lab promotes are worth less as signal than independent runs — the gap between self-reported and independent results is consistently wider for newer models.
  • Open-weight models with multimodal capability have a structural advantage in agentic workflows over text-only models, even when raw coding scores are comparable, because vision inputs matter for UI work and debugging.
Glossary

Terms worth knowing.

Mixture of Experts (MoE)
A neural network architecture that routes each token through a small subset of specialized sub-networks, allowing very large total parameter counts while keeping inference costs relatively low.
Cache hit / cache miss pricing
Many API providers charge less for input tokens already stored in a prompt cache. A cache hit means the tokens were found and reused cheaply; a cache miss means the full token computation cost applies.
ErdosBench
An independent coding benchmark smoke test used to rank models on multi-problem coding challenges; not affiliated with any model lab, making it a more neutral signal than provider-run evaluations.
Strange attractor
A mathematical structure from chaos theory (Lorenz, Rossler, etc.) used here as a coding benchmark because generating interactive physics-accurate SVG attractors requires both mathematical reasoning and front-end coding skill.
Agentic coding loop
A workflow where an AI model operates autonomously across multiple steps: reading files, running tools, editing code, catching errors, and tracking goals across a long session without human approval on each action.
Quantized model
A compressed version of a neural network where weights are stored at lower numerical precision, reducing file size and memory requirements at some cost to quality. Kimi K2.7's quantized version is approximately 325 GB.
Resources

Things they pointed at.

02:32productDocker Sandbox
03:26toolWorld of AI Benchmark Tool
05:07toolKimiCode
01:06toolErdosBench / ulam.ai
Quotables

Lines you could clip.

08:32
Kimi finished it in about six minutes and cost only 17 cents.
Concrete number with implicit comparison — Opus was $1.45TikTok hook↗ Tweet quote
10:43
This may not be the best coding model in the world, but it might be one of the most important open weight coding models right now.
Balanced verdict that resonates with open-source advocatesIG reel cold open↗ Tweet quote
04:14
The context window hasn't drastically improved, which is honestly underwhelming for a trillion parameter coding model in 2026.
Direct criticism — rare in benchmark review content, more trustworthynewsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:00Just a few days ago, the Moonshot AI team out of China dropped an incredible model called the KimiK 2.7 code. This is their latest open weight coding focused model that's built for code generation, code based understanding, agentic programming, developer tool integration, and it still keeps multimodal capabilities as well.
00:21And what makes this really interesting is the scale. This is a massive mixture of experts model with around 1,000,000,000,000 total parameters and it specifically turned to be much stronger as a long horizon coding task model. Compared to Kimi k 2.6, Moonshot says that the Kimi k 2.7 is something that significantly improves instruction compliance.
00:43It performs better in long context coding workflows and reduces overthinking tendencies by around 30% on average. So in simple terms, this is an open weight model, better coding, stronger instruction following, and smarter agentic coding loops.
01:00But what is wild is that the Kimi k 2.7 code is already showing up extremely high on some independent coding evaluations. For example, on the AirDose smoke test, the KimiK 2.7 reportedly ranks second right behind Fable five and even ahead of g p t five on x high in a specific run. Now I do wanna be fair here.
01:22Benchmarks are useful, but, personally, I don't think the Kimi k 2.7 code is actually close to the true state of the art closed source frontier models like Fable or g p t or even Opus in real world usage. A lot of the benchmarks Moonshot highlights like MCP Atlas or MLS Benchlight seem to favor Kimi's strengths quite a bit.
01:43So, yes, the numbers do look impressive, but I wouldn't blindly say that this model is beating Frontier models overall. But then again, when you compare it on web development task in comparison to the OPUS 4.8 and GPT 5.5 and even in comparison to Kimi k 2.6, the previous model, you can clearly see that the Kimi k 2.7 code is definitely in that category, meaning that it is able to be comparable with these proprietary giants.
02:12Before we get into the video, I wanna quickly showcase something that actually fits where AI coding agents are heading. A lot of us are using tools like Cloud Code, Cursor, as well as Codex.
02:24We also have MCP tools and all these new autonomous coding workflows. But the problem is the moment you give the agent more power, you have to actually babysit every single tool call.
02:35And eventually, let's be real, you're just gonna start clicking approve over and over, which defeats the whole process of having an agent. But that's where Docker sandbox comes in. Docker sandbox gives your AI agent an isolated ephemeral environment where they can actually run-in YOLO mode safely.
02:53They can explore, test, write code, use tools, and iterate without touching your real system or creating chaos in your workflow. The best part is it works with whatever you're already using, any model, any agent, any MCP tool.
03:09So whether your stack is Cloud Code, Copilot, Codex, or any open source agent, Docker sandbox gives you a neutral execution layer for running them safely.
03:20You can start locally then scale the same setup to the cloud when you need more agents running in parallel. Docker sandbox is available for Mac OS and Windows, and you can try it out today with the link in the description below. Thanks to them for sponsoring today's video on the world of AI benchmark tool, which you can access completely for free and I'll leave a link to that in the description below.
03:41You can clearly see that the new Kimi k 2.7 code does exceptionally well in terms of its overall performance in open weight models. When you compare it to other open source models, it does exceptional in many of these different categories.
03:55Moonshot is also pushing Kimi k 2.7 code as a stronger Agencik coding model. They claim around 10% improvement in Agencik performance over the k 2.6 with better multistep tool calling.
04:08You have better reasoning, code editing, and long coding workflows. That matters because coding models today are not just running functions anymore. They need to understand the full project, edit multiple files, use tools, cover for mistakes, and then keep track of the goal across long sessions.
04:24That is all great, but something that doesn't align with that is the context window, which I will be talking about pricing wise. The model is listed at 19¢ per 1,000,000 input tokens with cash hit, but with cash miss, it is 95¢.
04:38Now for output tokens, it's listed at $4 per 1,000,000 output tokens. Now the only thing that you will notice right now that will kill your mood is the context window. This is where the context window hasn't drastically improved from the previous models.
04:52It has gained a marginal context window increase from the previous skinny model from 256 k all the way to 262 k, which is still useful at the end of the day, but honestly underwhelming for a 1,000,000,000,000 parameter coding model in 2026.
05:07Now if you're looking to get started with this model, you can easily do so through their harness, which is KimiCode. You can start testing it out through the world of AI benchmark. If you wanna access it for free, you can do so through their chatbot where you can test it out right now.
05:21Now something to also note is the API is also available and the open wait for this model is also available, but it is definitely gonna be not accessible on most of our devices. But you can get a shrinked quantization version, which shrinks the model down to around 325 gigabytes, and I can leave a link to that in the description below.
05:42Something to also note is that just yesterday, they announced another mode for the QEMI k 2.7 code, which is high speed. And this is basically a faster mode for the same multimodal coding model. They claim that it can run six times faster, hitting around 180 tokens per second on coding tasks and speeds up to 260 tokens per second on shorter context tasks.
06:05Now before we get into testing, what I wanna talk about is token efficiency of this model. It's not as efficient as the Kimi k 2.6. It's not as efficient when it comes to token expenditure.
06:17It uses a lot more tokens up in all of these different tasks, is why it is gonna be reasoning a lot more with every generation. So you may see it spending a lot more time with these generations. And then if you switch over to the high speed mode, you're gonna get the same performance faster, but you're paying a lot more.
06:35So then it defeats the purpose of just using this model for its token efficiency. But regardless, when it comes to front end, the model does pretty good in all of its outputs. You can see this with this front end prompt where I'd requested it to create a SaaS landing page, you can see all of the dynamic movements have been thoroughly generated.
06:54You have all the different triggers like the scroll trigger that has been added, the GSA p, as well as all the different hero sections. So when it comes to front end, it does pretty good in this particular domain.
07:09Now let's start off with the macOS demo. This is where we had requested it to clone the macOS operating system, and the GimmyKay 2.7 code did a pretty good job in getting the main interface right. You have the startup boot.
07:22You also have the bottom toolbar, which looks quite intuitive. You have the finder app. And one thing you can notice right away, each of the icons have been generated with SVG, so the logos represent what the application is.
07:35You have a calculator, you have a terminal, but you can clearly see that it's not able to mimic it exactly at the same sort of quality as the other models, which is the only downside you can say.
07:47You can change the accents, which is nice. Most models can't actually do that. You can change the theme to dark or light.
07:53You can change the dock visibility, which is something that I haven't seen with any models, so that is also good. You also have a mind sweeper game and Safari. So it's a pretty basic Mac OS clone that it was able to generate and got the main structure right with all the components.
08:10Now when you are to compare the Kimi k 2.7 code thinking mode against the Opus 4.8 max on a strain attractors coding benchmark, this is where the Opus was able to finish a couple of these different tests in about five minutes, and it costed $1.45.
08:27Whereas Kimi finished it in about six minutes and cost only 17¢. Both implemented all of the different sorts of benchmarks, and you can see from the results, both of them are quite similar, and you can clearly see that Kimi did a cheaper job, meaning it was more efficient with the generations.
08:46But Opus was clearly better engineered. You can clearly see that it was better structured with all the outputs compared to the Kimi model where it was not able to actually perform the same polished UI as the Opus.
09:02Now in terms of SVG, this is where the Kimi k 2.7 did exceptional with this output. This is where our first test where we had generated the lava lamp.
09:11You can see that it did exceptionally well in terms of generating the physics of all the blobs. It also added in the constant generation of the blobs, and you can even change the flow speed as well. When it comes to web development, I personally believe the Kimi k 2.7 code does exceptionally well.
09:29It's not obviously at the same level as Opus and GPT, but it is getting there. And you can see that with a lot of these different generations in comparison with Codex as well as Cloud Code.
09:41And you can see that Kimi k 2.7 within Kimi code does quite well with these generations in generating all the different parts and components like the investor page or where it was able to create the weather app page or the twenty forty eight game. Or you can consider joining our private Discord where you can access multiple subscriptions to different AI tools for free on a monthly basis, plus daily AI news and exclusive content, plus a lot more.
10:10If you like this video and would love to support the channel, you can consider donating to my channel through the super thanks option below.
10:20In conclusion, the Kimi k 2.7 code is definitely an exceptional open source model. I am definitely looking forward to the Kimi k three point o, which should be coming later this year, maybe within the next two months. That's just my 2¢.
10:35But if we're seeing this type of capability with their code model, imagine what the Kimi k three is capable of doing. Now this may not be the best coding model in the world, but it might be one of the most important open weight coating models right now. It's cheap, it's capable, agentic, multimodal, and surprisingly competitive for its price.
10:57And if you compare it to GLM 5.2, 5.2 doesn't even have multimodal capabilities, whereas the Kimi model actually does.
11:05So that is a clear benefit when you're comparing these two open source giants. Regardless, this is something that you would like to check out on your own to see if it fits your own preference because all these models perfect in different domains, and it could be used in different ways. And the best way is by just simply combining all of them.
11:23But I'll leave a link to all of these in in the description below, all the links that I use. Make sure you go ahead and take a look at the World of AI benchmark tool. Make sure you go ahead and subscribe to the YouTube channel, join the newsletter, join the Discord, follow me on Twitter, and last thing, make sure you guys subscribe, turn on notification bell, like this video, and please take a look at our previous videos so that you can stay up to date with the latest AI news.
11:43But with that thought, guys, thank you guys so much for watching. Have an amazing day. Start positivity, and I'll see you guys really shortly.
11:48Peace out, fellas.
The Hook

The bait, then the rug-pull.

A trillion-parameter open-weight coding model from China drops claiming to beat Opus 4.8 on benchmarks. The reviewer does not take the bait — he runs the tests himself and comes back with a number that reframes the whole conversation: 17 cents.

Frameworks

Named ideas worth stealing.

08:10concept

Performance-per-dollar evaluation

The reviewer's primary lens: not which model scores highest on benchmarks, but which model completes equivalent tasks at the lowest cost with acceptable quality.

Steal forAny model selection decision where you run high-volume agentic tasks
06:04concept

High-speed mode trade-off

High-speed mode gives 6x throughput at 2x cost — which eliminates most of the efficiency advantage that makes the standard model attractive in the first place.

Steal forEvaluating premium tier options on any API service
CTA Breakdown

How they asked for the click.

VERBAL ASK
09:58next-video
Join the Discord for free AI tool subscriptions, daily news, and exclusive content.

Mid-outro Discord pitch followed by channel subscribe and super thanks ask — two CTAs back to back

MENTIONED ON CAMERA
Storyboard

Visual structure at a glance.

open
hookopen00:00
benchmarks
valuebenchmarks01:06
pricing
valuepricing04:24
macOS demo
valuemacOS demo07:09
head-to-head
valuehead-to-head08:10
SVG demo
valueSVG demo09:02
verdict
ctaverdict10:20
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Video of the Day32:54
Theo - t3․gg · Talking Head

Fable is Mythos, and it is really good.

A 33-minute first-take from a developer who spent $3,000 on inference in 24 hours — benchmarks, real demos, session math, and the hidden safety intervention that silently degrades the model without telling you.

June 11th
Chat about this