Big Idea

The argument in one line.

Choosing between Claude Code and Codex is not a feature comparison but a workflow-shape question, and the benchmark data shows each tool wins decisively on different task types.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You are on a Claude Pro or Max plan and hitting session limits faster than expected and want to understand the token math behind it.
You are evaluating whether a second coding-agent subscription is worth adding alongside the one you already pay for.
You ship front-end UIs regularly and want data-backed guidance on which agent produces better design quality by default.
You produce research documents, PDFs, or structured reports and care about speed and token cost per run.

SKIP IF…

You want a clean one-tool-wins-forever verdict -- the video explicitly argues against that framing.
You are not doing serious daily coding-agent work; the pricing discussion assumes heavy use.

TL;DR

The full version, fast.

Three real builds, two tools, full telemetry. Claude Code took 14:51 total across three tasks using 5.8M tokens at $11.05; Codex took 25:52 using 6.2M tokens at $7.11. Claude built the marketing dashboard in 1:57 using 283K tokens while Codex took 7:50 and burned 1.64M. Codex won the research PDF -- faster and leaner. The underlying cause is output-token discipline: Codex consistently writes 2-5x fewer output tokens than Claude, which is why it burns through subscription limits more slowly. The practical decision rule: Claude for front-end, planning, and custom workflow automation; Codex for research-heavy tasks, structured documents, and longer-running objectives.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 01:50

01 · Hook and thesis

OpenAI comeback framing, promise of honest head-to-head across features, price, and three specific use cases.

01:50 – 04:00

02 · Claude Code overview

Task delegation, file editing, customization via hooks/skills/sub-agents. Desktop, terminal, web versions. Opus/Sonnet/Haiku models.

04:00 – 08:00

03 · Codex overview

GPT family models, gpt-codex-spark in preview. WorkTrees as the defining architectural choice. Included in every ChatGPT paid plan.

08:00 – 11:19

04 · Shared features

Both tools: local code editing, desktop app, VS Code extension, CLI, MCP, skills format, plugin marketplace, cloud delegation, hooks, sub-agents.

11:19 – 15:00

05 · Claude Code advantages

30 hook events vs 6. Auto-delegating sub-agents. /ultra-plan, /ultra-review, /loop. Channels integration. Agent SDK. Enterprise auth (Bedrock, Vertex, Foundry).

15:00 – 18:00

06 · Codex advantages

Native WorkTrees per thread. In-app browser. Computer-use QA. at-Codex GitHub PR integration. /goal. GPT image generation. OpenClaw/Hermes compatibility.

18:00 – 22:00

07 · Pricing and context windows

Claude: Pro $20, Max 5x $100, Max 20x $200. Codex: included in ChatGPT free through Pro $200. 1M token context (Claude) vs 256K (Codex).

22:00 – 25:12

08 · Live benchmark intro and results

Three identical prompts: research report PDF, landing page (Glaido), marketing analytics dashboard. Claude wins landing page and dashboard design; Codex wins PDF efficiency.

25:12 – 28:34

09 · Benchmark metrics deep-dive

Raw numbers from JSONL logs. Codex: 25:52, 6.19M tokens, $7.11. Claude: 14:51, 5.8M tokens, $11.05. Output tokens always higher for Claude. Efficiency scatter plot.

28:34 – 31:40

10 · Analysis and decision framework

Use Claude for front-end, deep planning, custom workflows, enterprise auth. Use Codex for research tasks, structured documents, /goal, GitHub PRs, image generation. Split workflow is valid.

31:40 – 26:34

11 · Portability and closing

Projects are files in folders -- not locked to either tool. CLAUDE.md becomes AGENTS.md. Closing thesis: which tool is best for this specific task.

Atomic Insights

Lines worth screenshotting.

Claude Code built a marketing analytics dashboard in 1:57 using 283K tokens; Codex took 7:50 and used 1.64M tokens for the exact same prompt.
Claude Code consistently writes 2-5x more output tokens than Codex -- and output tokens cost more than input tokens, which is why Claude users hit session limits faster.
Codex spent $7.11 total across three builds; Claude spent $11.05 -- even though Claude finished faster overall (14:51 vs 25:52).
Claude Code has 30 hook events for automated workflow triggers; Codex has about 6 -- a 5x gap in automation granularity.
Codex sub-agents require explicit prompting; Claude Code spawns them on its own when task complexity warrants it.
Both tools now have /goal for multi-hour long-running objectives -- Claude shipped the feature the same week this video was recorded.
Routing Codex through OpenClaw with a ChatGPT subscription is publicly endorsed by OpenAI; doing the same with a Claude subscription violates Anthropic terms without prior approval.
Claude context window is 1M tokens; Codex runs at 256K -- a 4x difference that matters for large codebases.
For the research PDF, Codex used 2.8M tokens in 7:59; Claude used 4.7M tokens in 8:15 -- Codex was both faster and cheaper on the most document-heavy task.
The gut-feel observation -- Claude feels more creative and pushes back; Codex obeys more reliably -- held up in the benchmarks: Claude planned tightly first, Codex iterated through more steps.
Projects are portable across tools: skills, hooks, and JSONL logs all transfer; the main swap is renaming CLAUDE.md to AGENTS.md when moving a project into Codex.
The efficiency scatter plot showed Claude Code dashboard and landing-page runs in the fast-and-lean quadrant; all three Codex runs bunched in the middle.

Takeaway

Which coding agent to reach for, and when.

WHAT TO LEARN

The benchmark data splits cleanly: Claude Code wins on front-end quality and planning depth; Codex wins on token efficiency and research-heavy output -- and both tools are portable enough that you do not have to commit to just one.

Output tokens are priced higher than input tokens, and Claude Code consistently writes 2-5x more output tokens per task than Codex -- which is the direct cause of hitting Claude session limits faster, not a platform throttle.
Claude Code finished a marketing analytics dashboard in under 2 minutes using 283K tokens; Codex took 8 minutes and burned 1.64M tokens on the same prompt -- a 4x speed gap and 6x token gap for front-end work.
Codex won the research report task, finishing slightly faster and using 1.9M fewer tokens than Claude, which suggests Codex is more efficient when the task is document generation rather than UI construction.
Claude Code has 30 hook events for automated workflow triggers; Codex has about 6 -- if you need fine-grained automation that fires on specific agent behaviors, Claude Code is the only current option at that scale.
Claude Code auto-spawns sub-agents when task complexity warrants it; Codex only does so when explicitly asked -- which means complex multi-step tasks route differently through each tool even on identical prompts.
Projects built in either tool are portable: skills, hooks, and JSONL logs all transfer; the main swap is renaming CLAUDE.md to AGENTS.md when moving a project into Codex.
A practical split workflow -- use Claude Code for planning and brainstorming, then hand the plan to Codex for execution -- is validated by how each tool token behavior maps to planning-heavy vs execution-heavy phases.

Glossary

Terms worth knowing.

WorkTree: A separate working copy of a git repository that lets multiple agent tasks run in parallel without overwriting each other. Codex uses these natively for every thread.
Hook events: Automated triggers that fire when specific actions happen inside a coding-agent session -- such as when a prompt is submitted or a tool runs -- used to inject automated behaviors into the workflow.
Sub-agents: Specialist agent instances spun up to handle a specific portion of a complex task, such as a planner, an explorer, and a code-reviewer running within a single session.
MCP (Model Context Protocol): An open protocol for connecting external tools and data sources to an AI agent. Both Claude Code and Codex support it, enabling integrations with services like Discord, GitHub, and databases.
Output tokens: The tokens generated by the model in its response, as opposed to tokens it reads as input. Output tokens are priced higher and are the primary driver of how quickly subscription session limits are reached.
JSONL session log: A line-delimited JSON file that coding agents write during a session, recording every tool call, token count, cache read, and cost. Used in this video to pull the benchmark metrics directly from the agent.
/ultra-plan: A Claude Code slash command in research preview that ships the planning phase to a cloud browser session where you can annotate the plan before sending it back to the terminal for execution.
/ultra-review: A Claude Code slash command in research preview that spins up multiple reviewer agents for a deep code review with reproduced findings. Priced per run after three free uses on Pro/Max.
OpenClaw: A third-party open-source coding-agent harness that routes Codex via a ChatGPT subscription instead of a separate API key. OpenAI publicly endorses this use; Anthropic has not approved the equivalent for Claude.

Quotables

Lines you could clip.

31:55

“It is not a matter of which tool is best, it is a matter of which tool is best for the specific use case in front of you.”

Clean thesis statement, standalone quotable→ TikTok hook↗ Tweet quote

11:19

“ClaudeCode right now has 30 different hook events. Codex right now has about six. If you want to fire automated behavior into every part of the workflow, ClaudeCode gives you about five x the granularity.”

Concrete number comparison, instantly shareable→ IG reel cold open↗ Tweet quote

28:35

“Claude has this way of planning the task tightly before it executes. And Codex tends to just grind through more iterations, which is why the input tokens stack up on its side.”

Explains the data in plain English -- no setup needed→ newsletter pull-quote↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphorstory

This could be one of the biggest comebacks in the AI space. Over the past years, OpenAI went from being the biggest AI company to becoming something kinda mid, and people who used AI to code basically forgot OpenAI existed, thanks to tools like Cloud Code. But over the past few weeks, I've seen a lot of videos saying that OpenAI Codex is actually better than Cloud Code.

So I've been trying Codex for the past month, and honestly, the results have been really impressive. But is Codex actually better than Cloud Code? Today, we're gonna answer that question by comparing them on features, price, and three specific use cases to see which one is better.

And at the end, I'm gonna give you my honest opinion on which tool you should be using right now. So let's get into it. So real quick.

If you've never used ClaudeCode before, here's the gist. ClaudeCode is Anthropic's coding agent, Anthropic being the company behind Claude. The way it works is pretty simple.

You give it a task, like fix this bug or build me a new feature or review this pull request. ClaudeCode goes off. It plans to work.

It opens up your project. It edits your files, runs the commands, and it asks you for permission along the way based on your settings. And you can use it pretty much anywhere.

There's a terminal version. There's a Versus code extension, and there's a full desktop app for Mac and Windows. And they've also got a web version and research preview where you can just run sessions from any browser or even your phone.

Under the hood, it's running Opus, which is Anthropic's currently smartest model, or it can run Sonnet or Haiku as well. Opus and Sonnet are top tier for coding work. Now the part I really like about Cloud Code is how customizable it is.

It's less of a tool, and it's more of a workflow system that you shape into your own engineering rituals and automations. You've got skills that you can drop in. There's hooks, which are basically automated triggers that fire whenever something happens in your session.

And then you've got things like sub agents, which are specialist agents that Cloud can spin up on its own to handle specific kinds of work. And we're gonna dive deeper on all of this in just a sec. And now Codex is OpenAI's coding agent.

And quick clarification, this is not the old Codex model from 2021 that retired. The new Codex is a full agentic system, very similar shape to Cloud Code, but with a few different opinions on how the work should flow. You can use Codex in, once again, terminal, desktop app for Mac and Windows, and a Versus code extension that works also with other IDEs like cursor and in their cloud version at chatgbt.com/codecs.

The models behind codecs are the gbt family of models and a gbt dash codecs for coding specific work, and a faster smaller one called gbt dash codex dash spark. And that one is still in research preview for pro users at the moment. And the thing that stands out about codex is sort of like the unified shipping vibe, where clogged code feels like a workflow system that you're building out.

Codex feels more like an opinionated machine designed to take you from agent is done all the way to the code is shipped to production. A good example here is the built in git work trees. Those are basically just like separate working copies of your project so that multiple tasks can run-in parallel without overriding each other.

So the whole shape is tighter and more end to end out of the box. So we'll get into the specifics of what each tool actually does best in just a minute. And by the way, Codex is also included in every paid and free ChatGPT plan right now.

So free plus pro business enterprise. If you're using ChatGPT, you've also got access to Codex. Whereas Cloud Code, you wouldn't be able to use for free.

Now before we get into where they're different, I want to plant the thesis of this video early because it's very important. It's which tool is best for the specific use case that is currently sitting in front of you. So that's what I'm going to be discussing today.

And one more thing I wanna plant on top of that, after spending a lot of time with both of these tools, I've noticed that they each have kind of a different feel. So Cloud Code to me feels more creative. It feels like it's better at brainstorming.

It's better at like pushing back when I'm going down the wrong path. Whereas Codex feels really good at just, like, following my instructions and doing what I want. And honestly, it's also been sharper at, like, reviewing code and reviewing my plan and, like, finding bugs or gaps.

So none of that is backed by, like, hard specific metrics or KPIs. It's just, the gut feeling that I get after spending lots of hours in both tools. But I do think it matters, I'm gonna come back to that at the end.

With that out of the way, let's talk about how much these two have in common. Because honestly, after using them both heavily, the overlap is way bigger than most comparison videos admit. Both of them edit code on your local machine.

Both have desktop Both have Versus code extensions. They both run the command line. They both support MCP, which is the open protocol for hooking up external tools to your AI.

They both support CLIs as well. They've got the same skills format where you drop a markdown file with a YAML front matter into a folder, and agents can read through those, pick them up, and invoke them. Both tools have a plug in marketplace where you can browse and install community tools.

They both have a cloud delegation option where you can fire off a task and walk away, and they also both have hooks and sub agents. So the question stops being, does my tool have feature x? The real question becomes, which one gives me the better workflow for the way that I actually want to work?

And that's where they start to diverge, which is what we're going to break down next. Let's talk about what each of these tools is uniquely better at. We'll start with Cloud Code.

The thing that sets it apart, in my opinion, is the depth of the customization. Cloud Code right now has 30 different hook events. Hooks, again, are automated triggers that fire when something happens, like when you submit a prompt or when a tool runs or when a session starts or when a task gets created.

Codex right now has about six hook events. So if you wanna fire automated behavior into every part of the agent's workflow, ClaudeCode gives you about five x the granularity there. The next one is auto delegating subagents.

Both tools have sub agents, but Claude code can spawn them on its own when a task needs it. Codex's docs specifically say that Codex won't spawn sub agents unless you explicitly ask. So with Claude, you can just give it a complex task, it'll decide on its own to spin up a planner agent and maybe an explorer agent and a code reviewer agent, whatever's needed for that task.

And that's really powerful by default. And then there's two of my favorite slash commands, both still in research preview, but we have slash ultra plan and slash ultra review. Slash ultra plan takes the planning phase, and it ships it to a cloud cloud code session, and it lets you review the plan in your browser with inline comments, and then you can send it back to your terminal for the actual execution.

Ultra review spins up, once again, kind of like a cloud instance with multiple reviewer agents, and it gives you a deep multi agent code review with reproduced findings. You get three free runs of that on pro and max, and then after that, it's build by run. And they're both insanely powerful for higher stakes work.

Slash loop is another big one that I love. You can give Claude code a recurring prompt that runs on a schedule, or you can run it without a prompt, and Claude will go into maintenance mode and just keep your project tidy. So you could set up a loop to run a certain skill every single, like, twenty minutes, and it will just loop through.

It handles unfinished tasks, addresses comments on your PRs, fixes merge conflicts, stuff like that. It's super, super useful. A couple more that don't get talked about enough.

The first one is channels. That's an MCP server that pushes external events from Telegram or Discord or even iMessage into a running Cloud Code session. So you can literally text your agent from your phone.

And then you've also got, like, dispatch or remote control. Then we have the Cloud Agent SDK, which is the same engine that powers Cloud Code exposed as a Python and TypeScript SDK so you can build your own agents on top of it. And we have enterprise auth, which probably doesn't matter to you if you're solo, but it is a big deal for teams.

Cloud Code supports Bedrock, Vertex AI, and Microsoft Foundry, which are the enterprise cloud platforms that big companies use to host their AI. Codex just doesn't have that level of auth flexibility at the moment. So if you want a customizable coding system that you can shape into your own workflow, Cloud Code is in a class of its own right now.

Okay. So flipping the script, what does Codex actually do better than Cloud Code? The first thing is the whole unified workflow shape.

Codex is built around WorkTrees from the ground up. Every thread you spin up can run-in its own WorkTree without bumping into the main version of your project. Combine that with the fact that you can review, stage, commit, and push from the same desktop app, and you basically got a full shipping pipeline in one tool.

Obviously, Cloud Code allows you to work with WorkTrees as well. Codex just does a really good job of making that feel more native. The second thing is in the in app browser.

So Codex inside the desktop app has a built in browser that you can use to actually, like, look at the work that your agent just shipped. You can leave visual comments right on the page if you've ever finished a feature and then you had to switch over to Chrome to check it out, this is just a much cleaner, universal experience.

Now to be fair, it also has a feature called Cloud in Chrome that gives you another type of functionality, but it just works differently. And Cloud in Chrome is a browser extension that runs inside of Chrome itself, whereas Codex put the browser right inside the desktop app. So the capability is there on both sides.

Codex just keeps everything in one clean window. And it just does it a little bit better when you use the desktop app than the way that Claud code does it. But I think both of these platforms are everyday improving their desktop app experience.

Now the other one that's pretty big is computer use, which both tools once again have, but Codex is is really sharp. They've got this whole product QA use case where you tell Codex, QA the app I just built, and Codex will open it up in the app. It will click around.

It will find bugs, and it will log them with, like, you know, severity ratings, expected versus actual behavior, the steps to reproduce, and a triage summary. And that's a really polished way to use computer use, and it's something that I haven't seen Cloud Code build out as a first party flow yet. But especially when you realize that you can connect Codex and Cloud Code to any of these, like, external tools, you can do a lot of the same functionality with both tools.

Codex also has a GitHub integration, which is pretty interesting. I mean, obviously, both tools can review pull requests and stuff, but Codex has, like, an at Codex mention model, and it's pretty smooth. You tag at Codex in a PR comment or an issue, and codex spins up a cloud sandbox to handle that.

There's basically zero setup involved. You just tag it, and it runs. Now this fifth thing in codex is called slash goal, which is experimental and gated behind a feature flag, but anyone can actually go turn on that flag and use slash goal.

This is for the work that's too big for a single prompt, but smaller than an open ended backlog. You define a goal with a verifiable stopping condition, and codex will just grind away until it's actually finished. And this could be, like, multiple, multiple hours.

And, of course, as pretty much all these features I'm talking about, you can do the same thing in Cloud Code. You could maybe use the slash loop or you could use something like the Ralph Wiggum loop or maybe like Karpathy's auto research. So the capability is there on both sides, but Codex has just packaged this into one clean native slash command where in Cloud Code, you're stitching together a few different tools.

Alright. So you literally can't make this stuff up. As soon as I finished recording that video, Cloud Code just released slash goal.

So now we have slash goal natively within Codex and Cloud Code. So just wanna give you guys a quick update. Back to the video.

And then the last one, because Codex is built by OpenAI, you get access right inside of Codex to g p t image two. And g p t image two is one of the strongest image generation models out there right now. So if you're building a project that needs image generation, whether that's a game or a product markup or maybe even a website, Codex can actually just generate those images for you right inside the app, whereas Anthropic doesn't actually have an image generation model at all.

You would have to hook it up into some sort of third party tool. Okay. This next one is interesting because it's where the two companies really diverge philosophically.

So a lot of you have probably seen third party tools popping up like OpenClaw or Hermes agent, which is the open source agent that lets you rep coding agents. They kinda blew up because they felt proactive. They have native crons.

They have heartbeats. They can still use skills and stuff like that. The cool thing about OpenClaw is that you can actually sign in with your ChatGPT subscription and just route your codex usage through it.

So you don't have to pay separately for an open API key, which would be way more expensive. You can also do this with a Hermes agent. Sam Allman himself put out a tweet on May 2 saying that you can now sign in to OpenClaw with your ChatGPT account and use your subscription there.

So OpenAI CEO is publicly endorsing this, and that's a really permissive stance from OpenAI, and I bet they saw a massive spike in ChatGPT subscriptions after that announcement. Endopic stance is basically the opposite. The Cloud Agent SDK page on their docs literally says, unless previously approved, Anthropic does not allow third party developers to offer cloud.ai login or rate limits for their products, including agents built on top of the agent SDK.

So in plain English, using your Cloud subscription inside of a third party tool like OpenClaw or Hermes isn't allowed unless Anthropic specifically approves you. And that's one important thing to keep in mind because it changes the economics of your decision. So if you live inside of these third party agent tools a lot, then you're probably gonna wanna go with Chatuchi PT Codex.

Alright. So let's talk about pricing real quick because this is actually a big part of the decision. Both tools are included with their parent subscription, which means you don't need to mess with a separate API key to start using either one.

So for Claude, you've got Claude Pro at $20 a month, which includes Claude code and the rest of Claude. Then you've got Claude Max five x at a $100 a month, which gives you five x the pro usage, and then Claude Max 20 x at $200 a month for 20 x usage. Pro is definitely enough to play around with Claude code, but if you're using it seriously every day, you're going to want at least one of the max plans.

For codecs, it's included with ChatGPT free and then also plus at $20 a month all the way up to ChatGPT pro at $200 a month for basically unlimited use. Not really, but it feels like it. But right now, OpenAI has a promo running where the $100 tier on OpenAI side gets you two x codex usage through May 31.

If you're going to test out codex heavily, that $100 tier is one of the best values in AI coding agent market right now. Now on context windows. Opus and Sonic can run-in Claude code with 1,000,000 tokens of context window.

The latest GBT model in Codex runs at about 256,000 as the token context window. Now the part that I wanna flag that's more important than, like, just the raw price of your subscription is that a lot of people right now are complaining that they're hitting their clawed code limits, whether that be session or weekly, way faster than they used to.

And I've been hearing this from my community for weeks and on x for weeks. So one of the things I tracked in the live test coming up is the actual token usage on each side. And honestly, the results didn't surprise me because as I've been playing around with these two tools, I have noticed that it seems like I'm able to do a lot more work in Codex before I'm hitting that limit compared to Cloud Code.

So we're gonna go through those numbers together live after we run some of those experiments. So the takeaway is if you're already paying for one of them, you've already got a top tier coding agent. But I do think there's a lot of value in subscribing to both, playing around with them, and seeing which one you like better or if you like having both subscriptions for different types of work.

To quickly recap what we've covered, Cloud Code is a more customizable shape. Deeper hooks, auto delegating sub agents, ultra plan, ultra review, slash loop, agent SDK. Codex is more unified shipping shape.

WorkTrees, in app browser, it seems to follow directions better, sharper computer use, gbt image to access. Both tools have subscriptions. Both tools have kind of different context windows, and third party harnesses currently favor OpenAI, JetGPT.

But this is where most comparison videos stop, just listing features and calling it a day. So here's what we're gonna do. I'm gonna give Claude code and Codex the exact same three prompts.

A research report PDF with branding, a full landing page, and an interactive dashboard with real feeling data. Same prompt. I'm gonna put both tools side by side, so let's see what happens.

Alright. So here are the final results of Codex versus Claude, and we're gonna come back to this and look at all of the actual breakdown in just a sec here. So let's actually look at the outputs of all of these three different prompts.

So in this experiment, I did both of these, or I used Claude code and Codex in their respective desktop apps. The first thing that we did was the research report. This was something that we could turn into a skill, and it would give us a automation report for SMBs on, like, different automation tools.

So this is the prompt that I shut off to both Codex and Cloud Code. As you can see, this is the prompt inside of Codex with the logo, and this was the one inside of Cloud Code. So let's take a look at the outputs.

If I scroll down a little bit here, we should be able to see PDF. And if I click on that, we get to open this up in Cloud Code's desktop app, sort of like browser viewer. So I'll just do it in here for now.

You can see right off the bat, you know, the logo's up top, but this is a major issue. Like, that is hard to read, and then the spacing right here is not great either. But this one's 15 pages, and as you scroll down, it gets better.

I think the header looks really clean. The table of contents looks nice. I'm not gonna read and verify all of these facts.

I just don't really feel like doing that right now. They're both pretty solid when it comes to doing research. And by the way, didn't give it any API keys.

So they're doing research using their native, like, web fetch and web search tools, whatever those are. So it goes through executive summary. It goes through market overview, and you can see that this one is very, like, wordy.

It's structured almost like it's trying to sort of, like, tell a story, and it's going over these different tools. We have a side by side comparison here, top three picks, Zapier, Lindy, make.com, and then at the end, have where the market is heading in the next twelve months with all the sources at the bottom here. And all of these are clickable links that I could go to, but not when I'm in the local host here.

I was to open this up in my browser, like you could see right here in my browser, I could actually then go ahead and click on these links, and it would take me to that actual source. Now here is Codex in the desktop app. Interesting enough, you can't actually open PDFs right in here in the preview.

So we have to open this up on our browser. And this is Codex's version. So right off the bat, it already looks better because we don't have some weird spacing on the title.

The logo's there, but it kind of has this weird, like, you can tell it's a square image. So the header, nice. I thought the header was better with Cloud Code.

Table of contents looks perfectly fine. We've got an executive snapshot, and some of this spacing feels a little bit almost rushed, like it feels a bit squished together. Market overview.

And then as we go into the platforms here, we basically just get a table for each tool. Also, the footer on this version isn't as cool either. So Cloud Code went for more of like a, I'm gonna tell you a story, and I'm gonna break it down with bullets.

OpenAI Codex went for more of like a, I'm just gonna give you a table, like a consistent table breakdown for each of these different tools. You'll also notice that this research report is nine pages, whereas the other one was 15. We get our side by side comparison.

We have our top three picks, which are Zapier, Lindy, and Relay. And Claude Co's top three picks were Zapier, Lindy, and make.com, so kinda similar. And then we have where the market is heading over the next twelve months and a practical buying guide.

And then we have all of our sources at the bottom, which once again, these are clickable links that work. Okay. So number two was our website.

And we gave it the same exact prompts here with the Glido logo, and we told it to build us a landing page. We gave it the actual Glido site so it could go and look at it and maybe get some, like, you know, inspiration, and then it comes back with an actual landing page here. And then, of course, in Codex, we gave it the exact same prompt with the same logo.

Now, here is the actual two landing pages. Which one do you think was which? This one on the left was Claude Code.

So right off the bat, they have similar feels. Right? They have similar colors.

You'll notice that OpenAI was able to put the logo up here, whereas for some reason, I don't know why Claude Code didn't. That would be a very easy fix. But as we scroll down, we can see that we've got sort of like an animation right here, which we have like the kind of like dictation looking thing.

I like how this is a microphone that's pulsing rather than just this being like a g. I also like how this kind of like text cursor thing is blinking as well. And overall, as we start to scroll down here, I generally like Claude Codes version better.

Like, even the font, it just feels a little bit less vibe coded. These logos are obviously wrong except for GitHub looks correct. Gmail looks sort of correct, but not really.

We that would be an easy fix. But I like the sliding banner compared to just having these six boxes here. This next section, I once again think that this looks better.

We have some glow. We have some icons rather than just like these random letters. So overall, I am liking Cloud Code's version here pretty much a lot better.

Here's the pricing page, the differences here. So I think that Cloud takes the cake here. The logo thing would be a very, very easy fix.

And as far as like a base, I think Cloud Code wins here. Okay. And the final one was a marketing analytics dashboard.

I told it to make up all the data, but I pretty much gave it the same, like, acquired elements. So let me pull up both of these side by side, and just for proof, here is the same exact prompt inside of Codex. Alright.

So here are the two dashboards. Once again, I put Clog Code on the left, and right off the bat, I already think that the Clog Code version just looks a lot better from a design perspective. Both of them are still functional.

If I click on the different buttons, you can see the data will shift. And as the data moves and the numbers move and the charts move, we can still use our mouse to see the actual, like, numbers. So that is all working well.

You'll notice here that we have orders and average order value, but here we just have revenue. We can come down here to channel breakdown, and we can hover over the different elements, and we get the data there. And even here, like, the conversion funnel, right, the purchase funnel, this just looks way more generic and bland, but this one has almost, like, sort of a a gradient that goes across.

And I just think in general, the fonts and the vibe, everything about Cloud Code's version just looks better, even though from like a functional perspective, I think that they're the exact same. Alright. And now the part that you guys probably care more about, which is like the actual metrics of cost, speed, tokens, stuff like that.

So we were using codecs with g b t 5.5 on high, and we're using Cloud Code with Opus 4.7 on high. So, yes, this was like a codecs versus Cloud Code video, but keep in mind that a lot of the actual performance is going to be determined by the underlying model that is powering the harness. So when Opus 4.8 or five drops and GPT six drops, these numbers would obviously look a little bit different.

So let's look at some of the totals and the numbers. Kinda surprising. So Codex, total time across three runs was almost 26, and Claude, total time across three runs was about fifteen minutes.

Total tokens were very similar. We had about 6,000,000, and you can see the breakdown. We're gonna break it down by experiment in just a sec, but about 6,000,000 tokens.

What was interesting is that costed more with Claude code, and we'll break down why once we look at the experiment level breakdown, but keep that in mind. And then the average run, once again, Cloud Code was faster here. And keep in mind, we had one Cloud Code experiment that was like two minutes, and the Codex one was like eight, so that was like an outlier which kinda skewed the data.

But typically, I will say that I found that Codex is actually faster. And keep in mind with the with the token thing here, if you look at these two models side by side, g b t 5.5, Opus 4.7, they have similar input pricing, $5 for a million input tokens, but their output tokens, g b t 5.5 is $5 more expensive.

But GBT 5.5 seems to be super efficient with output tokens, which is why in this experiment, Cloud Code cost us more. Now this is API billing. I'm on a subscription for both of these, so I'm not actually getting charged $11 and $7, but this would actually factor in basically to, like, how fast your session limit is hit.

So let's keep scrolling down here. With the speed thing, we can obviously see that, um, this was the main outlier where Cloud Code finished really quick, almost, you know, two minutes, and then this one took Codex eight minutes. But I guess I stand corrected.

I mean, in all of the results here, Cloud Code was pretty much faster all of them. For the input versus output tokens, we can see these charts might be a little bit hard to read because we have, like, input, we have cash, we have all this kind of stuff. But basically, what happened was Cloud Code was spending more output tokens than codex in all of them, which is like the little highlighted sliver at the top.

You can see Cloud Code's output here was 83 k, almost 84, and Codex's output was 18 k. Over here, Codex's output was 20 k, and Claude's output was 80. And over here, Codex's output was 16, and Claude's output was 41.

So Claude's output tokens is always higher than Codex's, at least in these three examples and based on other testing I've done. That's not like a definitive every single time rule, but it is a consistent pattern. So I think that, you know, we could look at the cost, obviously, but I think that this one chart is very interesting, if I can somehow make this one, like, full screen.

This chart. This is efficiency and time. So the best place to be here would be bottom left.

That means that you're very fast and you're very lean, and the worst place to be would be top right because you're slow and heavy. So on the x axis, we have total tokens, so more expensive as you go this way. And on the y axis, axis we have seconds, so slower as you go up.

And it's really interesting because you can see here that we have two really great data points from Cloud Code, which were experiments two and three, and then we also have this one, which is a clear outlier in the good direction, which was experiment one, which is our research report from Cloud Code. And then we have kind of this accurate little bundle of codecs, which it's pretty consistent.

Like, they're all kind of in this general area. They're all kind of in the middle of this scatter plot. So I thought that this was an interesting one to look at, and I would love to see what would happen if we would have ran, like, a 100 experiments, where where we would see, like, sort of the standard deviation and where we'd see the lines start to form for each of these tools.

And I'm not gonna read these out because I think that it would be boring, but here are the raw numbers. If you wanna pause and take a look, you can certainly take a look through that. The So way that we were actually able to get this data is we just ask either Cloud Code or Codex to read its JSONL, which is like a session log, and it can pull the time, the tokens, the cache reads, all that kind of stuff.

So that's how I pulled the data. If you guys ever curious about a session, just ask it to read the JSONL L and pull that data for you. Alright.

So we just ran Cloud Code and Codex through these three live builds. Same prompt, both tools, three completely different kinds of work. And the honest takeaway before I dig into specifics is that this was not a clean sweep in either direction.

I feel like Codex won at certain things and Cloud won at others. So starting with Claude code, the biggest standout for me was the dashboard test. Claude finished that build in just under two minutes.

Codex took almost eight minutes for the same exact prompt. So Claude was roughly four times faster on the most complex of the three tasks. The token side was even more surprising.

On that same dashboard build, Claude used about 282,000 tokens total, where Codex used about 1,640,000. So almost six times more tokens on the Codex side for one build.

On the visual side, Claude also won the dashboard in my opinion and the landing page. The dashboard came back in dark mode and all the date filters worked, the hover statuses, the revenue chart just felt cleaner and more polished. Whereas Codex's dashboard was functionally the same, but it just felt cheaper to look at.

And the landing page was the same story. Yes. Claude actually did forget to drop in the logo on that landing page, and the scrolling banner had, like, wrong logos and icons, but those are just mistakes that we could fix with one prompt.

But the underlying design, the base that I wanted to start from, I think I liked Claude codes better. The pattern I noticed is that Claude has this way of planning the task tightly before it executes. And Codex tends to just grind through more iterations, which is why the input tokens stack up on its side for the more, you know, complex builds.

So for front end work, especially anything with real interactivity and design polish, I think that Claude was the clear winner in that Now flipping over to Codex, the research report that it built was kind of a standout in my opinion. So Codex finished in about eight minutes and Claude took eight minutes and fifteen seconds, and Codex used about 2,800,000 tokens versus Claude's 4,700,000.

So on the most research heavy task of the three, Codex was both faster and more efficient on tokens. Codex was also significantly faster on the landing page build, three minutes flat versus Claude's four minutes and thirty nine seconds. So if you're looking at pure speed, Codex typically tends to be faster.

The other thing I noticed across all three tests is that Codex's output tokens are way leaner. Output tokens cost more than input tokens, so that is something important to keep in mind. And that's probably why on Codex, I'm not hitting my session limit as quick as with Cloud Code.

On every single build, Codex wrote about two to five x fewer output tokens than Claude. So Codex tends to just be more concise in what it writes back. It seems to be more efficient.

On the visual side for the PDF, I liked Codex's a little bit better. It felt like it had better spacing even though I thought that Cloud Code had a better header and a footer. And it's honestly just a toss but if I had to send one to a client, I probably would have went with Codex's version by a small margin here.

And, obviously, I didn't read through every single sentence of the actual data in the research report, but that was my quick analysis. Alright. So given all that, let me give you my honest take on when to use each.

I would say reach for Cloud Code when you're working on complex front end, when visual design quality matters, when the task requires deep planning, when you want auto delegation, when you're building custom workflows with hooks and skills and channels, and when you need the Cloud Agent SDK to embed agents in your own product, or when you're in an enterprise environment that needs Bedrock or Vertex off.

Then I'd say to reach for Codex when the task is research heavy and pulling from the web, when you're, you know, producing structured documents like PDFs or reports, when you want a single desktop app that handles work trees and review and shipping, when you need to use slash goal for, like, long running objectives, when you want to use at codecs on GitHub PRs, or when your project needs image generation built into the workflow.

On top of those buckets, I wanna come back to my observation from earlier because this is where it actually shapes my decision and practice. Like I said at the beginning of the video, Cloud Code in my experience just feels more creative. It pushes back.

I prefer it as my brainstorming partner. It catches things that I might not have thought of. So when I'm in a planning phase or wrestling with a hard problem, that's usually when I will reach for Cloud Code.

But Codex now just feels really good at executing. It just feels like it obeys me better. It follows instructions, especially as you're working on a project that starts to run a little bit longer.

You're told what to do, it feels like it just does it. And, course, it's been sharper on, like, catching things in the code and reviewing it and plugging holes. And that's why I say it's never like which tool is better.

It's a matter of which tool is better for this specific task. A lot of people have been finding a ton of success with doing planning and brainstorming and strategy with Cloud Code and then bringing in codex to actually, like, just review the code or maybe even execute on that plan. And one more mindset piece I wanna leave you with on top of all of this.

Because you're working with coding agents, all you're really doing is you're making files that live inside of folders that live inside of more folders, you know, markdown files or JSON files or Python scripts or whatever it is, which means you're gonna be pushing all of this stuff to GitHub. You can pull that exact same project into Cloud Code or Codex or Open Claw or Hermes or whatever the next new tool is.

You know, you're not locked into one environment just because you've been building on Cloud Code for the past six months. And if you ever wanna move between tools, it's really not that hard. You know, you open the project in another agent and you say, hey, I built this project in Cloud Code and you are Codex.

Just walk through it, understand it, and then just update anything that needs to change. Or, you know, you could clone it and then have like a Cloud Code version of your project and a Codex version of your project or whatever it is. There's just a few small things that you're gonna have to swap like the Cloud.

Md will now be an Agents. Md. But the agent will figure out pretty much all of that for you.

So the real mindset is just just keep an open mind. You're building portable skills inside portable folders. Whatever tool you the best workflow right now, just use that one.

And that brings me back to the thesis I started this video with, which is it's not a matter of which tool is best, it's a matter of which tool is best for the specific use case in front of you. And some people also might disagree with that. It's just kind of like how do you like to work and what features do you need.

And one last thing before I wrap, everything that I just walked through is accurate as of right now, May 2026. Both of these tools have been shipping at really incredible speeds. Know, new models will drop.

Pricing tiers will shift. Features that are in research preview will graduate or they will be, you know, redacted. So if you're watching this video three months from now, just double check some of the specifics on the actual docs that I mentioned today.

Know, the architectural differences that I walked through are likely to hold up, but some of those exact numbers or stats might not. And I know that we just covered a ton of information in this video, so I broke all of this down into a resource guide that you can access for completely free, and you can find that in my free school community.

The link for that is down in the description. But that is gonna do it for today. So if you enjoyed the video or you learned something new, please give it a like.

Helps me out a ton. And as always, I appreciate you guys making it to the end of the video, and I'll see you on the next one. Thanks, everyone.

The Hook

The bait, then the rug-pull.

For months, Claude Code was the only coding agent worth talking about. Then OpenAI shipped Codex -- and the comparison videos started. This one actually runs the tests.

Frameworks

Named ideas worth stealing.

31:40list

Task-Fit Decision Matrix

Claude Code: complex front-end, visual design, deep planning, auto-delegation, hooks/skills/channels, Agent SDK, enterprise auth
Codex: research-heavy tasks, structured PDFs/reports, WorkTree-native shipping, /goal for long-running work, GitHub PR integration, image generation

A task-type decision rule rather than a blanket preference for one tool.

Steal forAny framework for choosing between AI coding tools on a per-task basis

18:20concept

Output Token Efficiency as Session-Longevity Proxy

Output tokens cost more and burn session limits faster. Codex writes 2-5x fewer output tokens than Claude per equivalent task. This explains why Claude users report hitting limits faster -- and it is measurable from JSONL logs.

Steal forAny explanation of why AI session limits feel inconsistent across tools

CTA Breakdown