Modern Creator
Riley Brown · YouTube

The Latest Codex Updates and The Truth about Opus 4.8

A 27-minute breakdown of why Opus 4.8 barely moved the needle and why Codex platform updates mattered far more.

Posted
3 days ago
Duration
Format
Tutorial
educational
Views
33.6K
1K likes
Big Idea

The argument in one line.

AI model releases have entered a diminishing-returns plateau, and practitioners who build with these tools daily are shifting attention from model benchmarks to super-app UX — the layer where the real productivity delta is happening now.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You use Claude or GPT models for agentic coding and want an honest benchmark comparison, not marketing copy.
  • You are a Codex or Claude Code daily user who wants to know which new platform features are worth your time.
  • You have been using Replit or Lovable and are wondering whether to switch to a raw agent workflow.
  • You are curious about the next wave of AI-native software beyond chat interfaces.
SKIP IF…
  • You need deep technical implementation details — this is commentary and live demo, not a step-by-step tutorial.
  • You are only interested in raw model benchmarks without product context or practitioner opinion.
TL;DR

The full version, fast.

Opus 4.8 benchmarks better on paper but is functionally indistinguishable from 4.7 in real use — multiple practitioners including the host spent hours testing and found no meaningful delta. GPT 5.5 outperforms it on long-horizon engineering tasks at lower cost. The real story of the week was OpenAI Codex: persistent browser login, multi-tab support, and agents that can spawn sub-agents are features that changed daily workflows in measurable ways. The closing concept — agent mini apps, generative UI panels that pop into your agent workspace and inherit your authenticated integrations — may define the next wave of AI-native product design.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0002:51

01 · Intro / Opus 4.8 overview

Anthropic announcement, model card benchmarks, 3-hour personal test. Host and cited practitioners cannot distinguish 4.8 from 4.7.

02:5204:55

02 · GPT 5.5 vs Opus 4.8

DeepSWE data: GPT 5.5 scores higher at lower cost and fewer tokens. Trust for long agentic tasks goes to GPT; Opus wins on design.

04:5605:52

03 · Model updates vs super-app updates

Framing shift: two categories for lab announcements. Super-app innovation is where the real delta is now.

05:5307:39

04 · Codex: Windows compute use + mobile

@computer-use lands on Windows. QR code pairs ChatGPT on iPhone with desktop Codex session in real time.

07:4010:37

05 · Codex browser upgrade

Persistent login across sessions, multi-tab via cmd+open. Demo: Twitter and Notion without re-auth. Host's most-used new feature.

10:3812:40

06 · Codex spinning up sub-agents

One super prompt spawns 6 parallel chat sessions. AI auto-names and self-prompts each thread.

12:4113:48

07 · Other Codex updates

Cmd+G full-text search across all agent chats. GitHub-style activity streak (43 days, 4B tokens).

13:4918:15

08 · People leaving Replit and Lovable

Single Codex prompt with Neon + Vercel + AI Gateway replicates Replit's full value prop. BYOT/BYOA plugin prediction.

18:1625:41

09 · Agent mini apps

Agents generate ephemeral UI panels that inherit plugin auth, handling the final 10% human decisions directly. Tinder-for-email demo. Teases chorus.com.

25:4226:57

10 · Outro

Moved company SF to NYC. Series rebrands to AI Native. Producer vs. consumer manifesto.

Atomic Insights

Lines worth screenshotting.

  • Opus 4.8 scores better on paper but multiple practitioners who tested it for hours found it indistinguishable from 4.7 in real use.
  • GPT 5.5 at medium/high/extra-high gets a better DeepSWE engineering score at lower cost and fewer tokens than Opus 4.8.
  • One practitioner still runs production agents on Opus 4.6 and cannot detect a performance difference across three model generations.
  • Anthropic models hold a specific advantage on design and visual output — slide decks, landing pages, presentations — not on deep agentic coding.
  • The super-app layer is now moving faster than the underlying model layer — an inversion of what was true nine months ago.
  • Codex persistent browser login sounds minor but the host used it every hour for 72 consecutive hours — small UX changes compound into workflow shifts.
  • A single Codex prompt with Neon, Vercel, Google Auth, and AI Gateway replicates the entire stated value proposition of Replit or Lovable.
  • A BYOT/BYOA Replit clone as a Codex plugin would not need to build an agent or pay for tokens — the margin structure is fundamentally better than existing vibe-coding platforms.
  • Plugin OAuth tokens in Codex are locked to the agent and cannot be passed to apps the agent generates — that gap is the design constraint the mini-app concept is trying to solve.
  • Agent mini apps are generative UI panels that inherit plugin authentication and handle the final 10 percent of human decisions directly — they bridge autonomous agents and human oversight.
  • Every archive-vs-send decision inside an AI-drafted email interface is a training signal that tightens future suggestions toward high-confidence sends.
  • The producer/consumer split that defined social media is the same split defining the AI revolution — learning the surfaces agents live on is the producer-side move.
Takeaway

Model hype vs. platform reality in the agent era.

WHAT TO LEARN

When practitioners who build with these tools daily cannot distinguish one model generation from the next, the benchmark press releases stop being the signal — the platform changes are.

  • Benchmark improvements on model cards do not automatically translate to detectable differences in real agentic workflows — test your specific use case before upgrading.
  • GPT 5.5 outperforms Opus 4.8 on long-horizon coding and deep agentic tasks by the metrics that matter to builders: score per dollar and score per token.
  • Anthropic models retain a real advantage in design-heavy outputs — presentations, landing pages, visual documents — where aesthetic judgment matters more than raw task completion.
  • Persistent authentication in an AI browser changes daily workflow more than a 5-point benchmark improvement; the quality of the integration layer is becoming the differentiator.
  • A single well-crafted agent prompt with the right plugin stack (database, hosting, auth, AI gateway) can replicate the full value proposition of purpose-built vibe-coding platforms.
  • The economics of a BYOT/BYOA product are structurally stronger than a bundled AI platform: no agent compute costs, no token subsidies required, higher margin on the interface layer alone.
  • The unsolved problem at the frontier of agent UX is not conversation quality but authentication passthrough — getting generated apps to inherit the user's existing plugin credentials.
  • Generative UI (an agent that creates the right interface for the task at hand) is a more useful frame for the next wave of AI-native products than 'better chat' or 'more autonomous agents.'
  • Every human decision made inside an agent-generated interface is a labeled training signal; the apps that capture those micro-decisions will compound into personalization that static SaaS cannot match.
  • The producer/consumer split from social media is repeating in AI: the people who understand the surfaces agents live on will build leverage; the rest will be optimized against by systems they do not control.
Glossary

Terms worth knowing.

Super app
An AI agent platform (Codex, Claude Desktop) that combines chat, task management, browser, and plugin integrations into one workspace — contrasted with accessing a model through an API or terminal.
DeepSWE
A benchmark company that measures frontier coding agents on original long-horizon software engineering tasks, scoring on cost, time, and output tokens against task completion quality.
Agent mini app
A generative UI panel an AI agent creates on demand inside its workspace, inheriting the user's authenticated integrations so the human can take final actions without leaving the agent environment.
BYOT / BYOA
Bring Your Own Tokens / Bring Your Own Agent — a product model where the platform provides the interface but the user supplies API access and the underlying AI, reducing platform costs and giving users model choice.
Computer use
An AI capability where the model visually interprets and interacts with a computer screen — clicking, typing, and navigating applications rather than operating through code or API calls.
Agent native app
An application designed to be controlled by an AI agent rather than (or in addition to) a human at a keyboard — the agent can create, edit, and retrieve content through the app programmatically.
Resources

Things they pointed at.

01:44channelGreg Eisenberg (AI commentator)
02:44channelMatt Wolf (AI commentator)
02:58toolDeepSWE benchmarks
13:00toolVercel AI Gateway
19:10toolProof (agent-native document editor by Dan Shipper)
24:04productchorus.com
Quotables

Lines you could clip.

01:34
I literally couldn't tell the difference between the two models.
Punchy, standalone, directly contrasts the Anthropic marketing headlineTikTok hook↗ Tweet quote
02:01
We are entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone had a genuine leap? Now it's a slightly better camera and you can't really tell the difference.
Vivid analogy, immediately understandable, high shareabilityIG reel cold open↗ Tweet quote
24:15
Why would I want to use someone else's external platform if my AI agent can generate a UI for me right when I need it.
Clean one-liner thesis for the mini-app concept, no setup needednewsletter pull-quote↗ Tweet quote
26:02
You need to become agent native or agents will just start to use you.
Tight manifesto line with a reversal, producer/consumer framing in one sentenceTikTok hook↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

metaphoranalogy
00:00This week, Anthropic released Opus 4.8, which they say is the most advanced AI model in the world. However, others are saying we've entered the iPhone era of AI models where you can't even tell the difference between each model upgrade.
00:14We're gonna discuss this today. We're also gonna talk about codex. This week, OpenAI released some insane updates to their super app, codex, and some of the updates they didn't even publicly announce.
00:27You're watching AI native where we cover the most important news and updates on the best AI agent platforms and models. My name is Riley Brown. Let's not waste any more time.
00:38Let's dive in. So here we are. This was the Thursday announcement by Anthropic introducing Claude Opus 4.8.
00:46It builds on Opus 4.7 with a sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. And here is the model card.
00:58So on the model card, Opus is apparently better at coding. This agentic coding SWE bench pro. It's not as good as GPT 5.5 at terminal coding, but it's better than all of the other models including Opus four point seven and five point five at reasoning, controlling your computer, doing knowledge work like doc sheets and presentations, and other finance tasks.
01:23And guys, it was genuinely my plan to make a full video on Opus 4.8, but I spent three hours comparing the difference between Opus 4.8 and Opus 4.7, their previous model that they released.
01:35And guess what? I literally couldn't tell the difference between the two models. And I'm not the only one who thinks this.
01:42Uh, Greg Eisenberg, friend of the show, he said, I didn't cover Claude Opus 4.8 on my pod because I don't think it's meaningfully better than GPT 5.5. And I'll add that it's not meaningfully better than 4.7 either.
01:56And he goes, we are entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone had a genuine leap? Now it's a slightly better camera and you can't really tell the difference.
02:08That's where models are heading. 4.6 to 4.7 to 4.8.
02:13Each one is slightly different, but you can't really tell which one is best. In fact, I'll tell you from personal experience, I'm still running AI agents in iMessage, running an AI agent very similar to OpenClaw, and I'm using Opus 4.6. I think it is the best for general agent work at least based on how I use it.
02:30And I literally can't tell the difference between these three models. And it wasn't just Greg. Here's Matt Wolf agreeing with him.
02:36He said, so much this. I spent over one minute talking about OPUS 4.8 in my recent news breakdown and there really wasn't much to say honestly. And when there's a big update, Matt will spend five, sometimes ten minutes talking about a huge update and he only talked about it for one minute.
02:52And now we're gonna compare GPT 5.5 to Opus 4.8. And DeepSwee, which is a company that measures frontier coding agents on original long horizon software engineering tasks, they posted some data that was really interesting.
03:09And so DeepSwee looks at three things. Right? They look at cost, time, and output tokens, and then additionally, they plot it against their score.
03:18So you can see here, these are the GPT models right here, and here's the OPIS models right here. And so the higher up you are on this chart, the better your score.
03:27OpenAI got a better score. And you notice here that the cost goes this way. So the further this direction you are, the more expensive your model is.
03:38So GPT 5.5 medium high and extra high are scoring higher for less cost than Anthropix Opus 4.8.
03:49The OpenAI is getting a better score for a lower cost. Here we can see they're getting a better score at a lower amount of tokens per task which is better and we're also seeing that the average cost per task is just lower.
04:04Right? If you see that this model is clearly the most efficient, it takes less time and it gets a higher score.
04:13And also as of late, I've also noticed a lot of people talking about trust and depth of tasks. This guy said I can trust GPT 5.5 with things I would never trust Opus 4.8 to handle. Yeah.
04:24Opus 4.8 feels good and can be quite addictive to use especially when vibing, but that's mostly surface level. I'll also add that the Opus models in general are better at design.
04:34They're better at presentations. You're gonna get a better slide deck, a better landing page. It looks more appealing.
04:40They put a lot of effort into claw design. However, when you wanna do really long agentic tasks, if you wanna do deep coding work or have it control your computer and even control your text messaging directly from the app, I highly recommend using GPT 5.5.
04:55And so now I divide these large labs announcements into two categories. Right?
05:01There's model updates and then there's super app updates. And like nine months ago, I was way more excited for model updates because every single model update felt like a big step change and everything was done in the terminal.
05:14So there wasn't really that much innovation happening at the app level or you know the app where you use these AI agent tools. And if you've been watching my content for the last four months, I've been obsessed with the super app and the super app like Claude desktop or Codex are these apps where you can very easily talk to AI agents.
05:34Right? You can speak to AI agents where you have your tasks on the left panel. You have your agent and then whatever your agent is working on.
05:42And there's so much innovation that needs to be done to make this a very seamless process so that you can interact with agents for all of your work. And this is exactly what OpenAI did this week. They announced a bunch of different things for their codex application, new updates to their platform.
06:00And so the first update that they announced is there is now Windows computer use. So if you go to the Codex app on Windows, you can now officially type at, uh, computer use and you can have GPT 5.5 inside Codex.
06:17It can control your computer fully. You can say control Canva to do task and you can do this on Windows now.
06:26Another one for those Windows users out there, they now have Windows Codex remote. Inside Codex, if you go down to this phone icon right here, this will give you a QR code. If you have ChatGPT downloaded on your phone, you can now type prompts directly through ChatGPT and it will control codecs which can control your computer.
06:48If you have an iPhone and a Windows computer, you can connect ChatGPT. Right? This is just the ChatGPT app and I'm going to the codec section and now I can press chat and now I can message codecs and I can even use computer use inside the iPhone app and I can say please, uh, check my, uh, desktop and tell me what's there.
07:13And you can see here, right, it's showing up right here. You can do this on Mac and now you can even do this on Windows and these are perfectly synced. You can see here this is the same exact chat thread and it shows up on the desktop app and the phone.
07:26You can literally control codecs from your phone, Mac or Windows. And since codecs can control your computer, you can basically control your computer through ChatGPT, which is a really, really underrated and cool feature.
07:40Feature. Okay. The second set of updates that OpenAI released for Codex, this is the one that I'm gonna use the most and I think it is just the most useful.
07:51And so when you're inside Codex when you're inside Codex, you can open up a browser. Now as of two days ago, these stay signed in.
08:01So I can go to twitter.com and you notice I'm already signed into my profile. I don't know why these tweets aren't loading.
08:08There we go. This is my Twitter feed and I'm automatically signed in. Can also say something like please get my, uh, latest video agent native to link on Notion.
08:25Summarize it, and give me a link here. So since iCodecs is set up to connect to Notion through the Notion plugin, it can find the exact video I'm talking about.
08:38It's gonna give me a link to that video then I can just open it directly inside the Codex browser. This is becoming a full browser and take a look at that. So it responded.
08:49It thought for two minutes it found the Notion document that I'm working on which is for this video. And all I need to do is right click on this and click open in browser and take a look at this. We are automatically signed in to Notion.
09:04Close the sidebar and here's the app open inside Codex and I'm signed into Notion so any document that it creates inside Notion for me I can just open it up.
09:15So now I'm using Codex. I can ask Codex to change anything inside Notion and it will edit the page and I can see it live.
09:23I can add things to it just like I'm using Notion except I don't need to leave the AI powered super app which is Codex. Now I do anticipate that Claude code or the Claude desktop app will have this feature.
09:38It just feels like they're really far behind and they're not prioritizing it. This right here is something that I've been using every single hour for the past seventy two hours since they released this feature. Now that you stay signed in, it's really useful because before you actually had to sign in every time you opened up a web browser.
09:56And another thing that I realized, you can open up many browser tabs. You can't hit plus and open a browser tab, but if you're in your browser and you press command open, look at this. It's opening all of these as new browser tabs.
10:10So I can go from the main tab to this tab to this tab to this tab.
10:16And so we're starting to see this become a full browser that you can use next to your AI agent. So that is two, which is browser tabs stay signed in when you're using the browser inside Codex.
10:29And we also have multiple browser tabs per task and we're starting to see it become as if you had Google Chrome inside Codex. And so this third one is a lot of people's favorites. So now when you use use Codex, agents can spin up other agents.
10:47On top of this agents, you can ask Codex about any chat you have open.
10:55Let me show you how this works. So if we go to Codex, I can now type something like this directly inside Codex.
11:01So and I call this a super prompt. I want you to spin up new chat sessions inside Codex. So like right now I'm about to fire off a chat session and this chat session will actually create six more chat sessions.
11:14So check this out. So I'm gonna run this. And so now you can see here it says all set this up as six separate codex threads with concrete task prompts.
11:24So it's basically going to write prompts in new chat sessions and then they'll show up right here. Okay.
11:30So it's activating some memory. It's it's basically trying to figure out how it wants to prompt the agent and so it says I'm creating six background threads now each with narrow brief and completion criteria.
11:44And here it goes. It's created one, two, three, four, five, and six.
11:51We're gonna see AI rename them. Watch this. So triage, boom, boom, and boom.
11:58So AI created these new chats and you can see here the AI basically prompt this. It's sent by Codex from another thread.
12:07That's how you know Codex prompted it which is really cool. So you can ask Codex to create new threads.
12:14So you can start up 10 threads directly inside Codex and here they are all going to work. And so that's really cool and I haven't even fully discovered all of the use cases that I wanna use for this.
12:25Maybe I might do a full video on that specific feature about using one master agent to spin up sub agents and then you can create an automation which checks in on how those other agent chats went. I think there's a lot of exploration to do there, but that's out of the scope for this video. I do wanna cover some a little other updates that they announced which is there's now better search.
12:45So if we go to codex, if we go to codex, and now if you press command g, I believe, I can now search way better.
12:56Right? You can press command g and I can search for a key term like OpenAI and everywhere OpenAI is mentioned, I can now search not just through the titles but through all of the chats in general.
13:08Right? So it's much easier to search through all the chats. Let's see where I mentioned, uh, command g, where I mentioned Chorus.
13:16These are all of the scripts or all of the chat sessions where I mentioned Chorus. It makes it a lot easier to search through all of the agent chats that I create. Another small thing that they announced was this new GitHub activity page.
13:29So, again, if we go to codecs and you go to settings, profile, here we can see all of the days where I use Codex.
13:38I basically started using Codex forty three days ago. I've been using it every day since forty three day streak. My longest task with three hours and seven minutes, and I've used 4,000,000,000 tokens.
13:47Pretty fun new update to the app. Okay. So now I wanna move to another trend that I've noticed.
13:52A lot of people have been DMing me about their vibe coding platform that they use, whether it's Lovable, Replit, Bolt, etcetera. Many people are moving from these dedicated vibe coding platforms to Codex or Cloud Code because, you know, I think we're about one or two months away from these platforms being full vibe coding platforms.
14:10And many people who use Replit say that, like, it's just significantly easier to just vibe code an app, get it on the Internet, and use it for internal use or sell it as a SaaS. Many people love these vibe coding platforms because it makes everything easy. Because after all, Codex just generates the code and then it lets you see your app in the browser.
14:28Whereas something like Replit generates the code, it makes viewing the app visible while you're building. Right? Just like the in app browser inside codex.
14:36It also sets up authentication. It sets up database and it also does one other thing which is like it has like some security things but mostly that's just an AI prompt and then it also hosts the app on the internet.
14:49Well, what people are realizing now is that all of these are just like a single prompt inside codex. Right? On codex, can run a prompt like this.
14:57You can say please build an internal tool for my company to track whatever it is that you wanna track. For this example, I'm just using video stats. And you could say make this web app.
15:05Use Neon Postgres which is a database service for database. As long as you have an account on Neon and you set up the plugin, this just works one shot. And then you can say use Google for sign in, um, and for off.
15:16And then you could say use Vercel for hosting. Right? And this, uh, puts the app on the Internet.
15:21And then you could say use AI gateway for AI features. So this is another Vercel app where all you need is to sign in to Vercel, get one single API key and once you set that up you can use any AI model. You can also use something called Genmedia which is all of the image and video models and this is by FAL.
15:40And so I've already set this up and made this skill so I can build any app with any AI feature or AI video model directly inside the app and then I can just say like make sure to run many security checks. GPT 5.5 extra high is incredible for checking for vulnerabilities.
15:58So you can just fire off this whole entire prompt and this basically solves for the entire value prop of tools like Replit and Lovable.
16:09And soon, I believe there's going to be someone who builds a fully AI native AI native version of Replit and Lovable.
16:19And this is a product that our team and I, we considered building this tool, um, but we just kind of we fell out of love with building static apps. Agents are just way more fun to work with.
16:29But someone could very easily build an AI native Replit and Lovable which acts as a plugin. And so you could create a skill which handles all of this stuff right here for the user and build it directly inside Codec.
16:44So that's one of my big predictions for the rest of 2026. Someone's going to build a replet and lovable that makes it as easy it is to use lovable but inside Codex. Because with replet and lovable, you use their tokens and you use their agent.
17:02And so the replet agent is actually worse than just using codecs out of the box and it's more expensive because OpenAI heavily subsidizes users to use GPT 5.5 directly in the app.
17:14And so someone could build an AI native version of Replent and Lovable where it's just BYOT and BYOA, which is bring your own tokens and bring your own agent.
17:26You So can imagine a world where I go to Codex and I could say build an app and use, uh, at use at, uh, Lava Plit.
17:37And this is my fictional app that someone could build where it just handles all of that except it acts as a plug in and you use it directly inside Codex and maybe it only cost $10 a month because this company that get that creates it doesn't have to build an agent. They don't have to pay for tokens so it's a bigger margin and they just host the user's web app somewhere and maybe that could cost a little bit more money.
17:59But I genuinely believe that many people who love to vibe code are just gonna end up switching over to Codex and Claude desktop app over time as they become full platforms and vibe coding will just be a skill that any AI agent can do.
18:16To conclude today's video, I wanna talk about just my biggest obsession for the past two months and it has to do with something called an agent mini app and it stems kind of from the in app browser inside Codex and eventually all agent platforms. Okay.
18:32So in my previous video, I covered a topic called an agent native app and I used the example of Dan Shipper who created this app called Proof. And Proof is this document editor that's open source that he made to be an agent native app or an app that you use with your agent.
18:51So you could say, hi agent. I wanna create a document. And the agent can create the document and then you can edit the document yourself.
18:58You can have the agent edit the document, and he basically, he made the connection between the document and the agent incredibly easy. It's very seamless to create a document with this agentic application.
19:10And I've been fascinated by this because we're gonna have agents that will have browsers connected and so many people are gonna make a ton of money building apps that are just agent native. They're not meant to be you for you to go to the app and type a document on their platform. It's made for you to ask your agent to create a document and it just uses this technology and renders it right here.
19:31So this is really really cool and really interesting and it's possible right now to create and use these agent native apps. In fact, Google Docs now because your agent can fully control Google Docs, it can fully control Notion.
19:45This is an example of an AI native app. Right? It is an app that's meant to be used by humans but they added like an agent native feature.
19:53Right? This is just an AI agent native feature of like an app that's meant to be used by going to the platform.
20:01So this is all possible, but there's one thing that's not possible. So on Codex, they have these things called the plugins.
20:08But within the plugins, right, you can actually sign in to all of your apps. And so I have like 30 different plugins like Gmail, like Slack, like, uh, TypeFully, which, uh, allows me to schedule Twitter posts for the future which I use a lot for our company account.
20:25Um, and you know, the list goes on. GitHub, uh, Vercel, etcetera.
20:30All of these different tools. What is not possible right now inside Codecs that I wish was possible, you cannot create an AI native app that connects to these specific integrations.
20:45Right? When I go to plugins and I sign into my Gmail, I'm authenticating.
20:50Right? I'm authenticating to my email.
20:53What I can't do inside Codex is use this authentication to create an app that connects to Gmail. Let me explain what I mean by that.
21:03So if you think of the way we were describing vibe coding earlier where you have your different agent task, you're chatting with your agent and you can get it to create basically any app you want and I'm able to add Neon's, uh, database to it by at mentioning Neon.
21:19Right? This is just a database provider and then it can create an app that has a built in database created by Neon.
21:26But what if what if your agent could generate apps here on the side which I call a mini app which could actually integrate with all of your plugins.
21:39And so you could generate a email mini app or you wouldn't even need to consciously generate an email mini app. Your agent would generate it for you.
21:48So imagine you're using Codex and you say something like, I need to do my email help.
21:56And the agent one thing the agent could do is just send you a bunch of drafts to all of your emails. Right?
22:02It can go through and look through your email. It could come up with drafts to send, but it's really hard to like give you that information in a way where you could easily edit those drafts. What if it created a mini app and the mini app was like a Tinder for, uh, email?
22:21And so it had like it had like a nice input message which is like the person who sent you the message and then it had just like your response. So like it put your response below it and then you could either, um, archive, right, if you don't actually wanna send it send the email or you can just send it as is and since the agent has context over all of your different tools, it'll be really good at understanding your goals and everything.
22:46It'll actually be able to draft a really good email or there would be like an edit button. Let's say you just wanna edit like a few parts of it. You could very quickly edit it and within the app, you could just press send.
22:58So imagine it created an app that you could easily press send. And as you use these apps, right, as you use these mini apps, you would actually learn. Right?
23:08Because every time you press archive, this data would be stored somewhere. I'm not sure how this would technically work but this would be stored somewhere and over time the agent would actually not make suggestions for the types of emails that you would normally archive and it would learn from every single message that you send.
23:25It would learn from all the edits that you make so that every time it suggests an email, it's one that you will very likely send at a very high confidence. So these can be thought of as just like generative UIs that connect with your integrations because right now you could ask it do this but then you'd have to go back to your agent and say send the first one, don't send the second one, send the third one, make an edit to the fourth one, please say this.
23:49What if the agent could just send you the best possible interface that connect with the tools that allowed you to just make the final 10% edits and send it directly in this little mini app? And users would actually be able to create their own interfaces.
24:03Right? And you could create your own mini apps and maybe even share them with your team because every person's unique, every company's unique, and maybe you want to create your own little mini apps that are integrated with all of the things that you've already signed in with.
24:18Why would I want to use someone else's external platform if my AI agent can generate a UI for me right when I need it. And I think this is next, you know, and this is just something that like we've been playing around with and my company in New York, we are I moved my company to New York and we're actually trying to figure this out through iMessage.
24:38I'm not gonna go into detail because I'm gonna be doing like a big announcement soon, but you can actually already use our product. It's chorus.com, uh, and you can create an AI agent and add like, uh, an agent like Claude Code or Codex directly inside iMessage.
24:51And we're trying to figure out how the agent can send you a little link which turns into a mini app. And these mini apps will kind of act as like the operating system for the agent. I genuinely believe that all of the major platforms are gonna kind of circle around this idea, and this what's gonna bring out Jarvis.
25:08Right? How can the AI agent give you the best possible interface for any given task that you can use and and the app actually connects to the integration? You can actually send an email.
25:18You can actually post the social media post. You can actually send the Slack message. Right?
25:23It can suggest things for you and you can properly edit them directly in the interface and I think Codex has a perfect browser for this. The problem is if you try to do this, you actually can't connect your plugins to the apps that you create.
25:38It's just not possible with the way that they built codecs. Anyway, that's it for the update today. Yes.
25:45So I'm here in my Airbnb in New York City. We just moved our company from SF to New York. It's great energy out here, but unfortunately, I don't have a studio.
25:54So we're gonna rebuild our office, rebuild our studio, and I'm going to be 10 x ing my content effort.
26:02My main goal is just to educate people so that you become agent native, uh, which is the new name of this series. I think people need to become agent native or agents will just start to use you. You could think of social media.
26:14Right? If you look at the social media trend over the last ten years, right, there's content creators, right, who kind of take advantage of social media.
26:22And then there's just like the content consumers who kind of get taken advantage of by the algorithm. It addicts you to the platform. It sells you ads.
26:31And so like there's kind of this like you're either a producer or a consumer. I would much rather be on the producer side of this AI revolution. I think it's really important to learn the different concepts.
26:41Um, you should learn the surfaces that these AI agents will exist on, which is why I started this series. So every week, I cover the most important agent news, and I'm I'm loving it right now.
26:52And I'll continue to do it every single week. So thank you guys for watching. I'll see you here for the next video.
The Hook

The bait, then the rug-pull.

Anthropic called it the most advanced model in the world. The practitioners who actually tested it called it a camera bump. In the same week, OpenAI quietly shipped half a dozen Codex updates that changed how the host works every single hour — and nobody sent a press release. This breakdown sorts which story mattered.

Frameworks

Named ideas worth stealing.

04:56model

Two Categories of Lab Announcements

  1. Model updates
  2. Super-app updates

Host's lens for deciding how much attention to give any AI lab announcement — model increments vs. platform/UX changes that affect daily workflow.

Steal forAny content creator covering AI news who wants a consistent editorial frame
18:16concept

Agent Mini App Architecture

Generative UI panels spawned by an agent inside its workspace, inheriting the user's plugin authentication, allowing the human to make final-10% decisions without leaving the agent environment.

Steal forProduct designers or developers thinking about what AI-native applications should look like
17:18concept

BYOT / BYOA Product Model

Bring Your Own Tokens + Bring Your Own Agent: a SaaS pricing model where the platform charges only for interface/hosting, not AI compute, giving users model choice and reducing operational costs.

Steal forFounders considering a vibe-coding or AI tools product competing with Replit/Lovable
CTA Breakdown

How they asked for the click.

VERBAL ASK
24:04product
you can actually already use our product. It's chorus.com, and you can create an AI agent and add like, an agent like Claude Code or Codex directly inside iMessage.

Soft product mention embedded naturally inside the conceptual section rather than a hard sell. Subscribe CTA only in the final seconds.

MENTIONED ON CAMERA
Storyboard

Visual structure at a glance.

open: Opus 4.8 announcement tweet
hookopen: Opus 4.8 announcement tweet00:00
model vs super-app two-category framework
promisemodel vs super-app two-category framework04:56
Codex browser: Twitter signed in
valueCodex browser: Twitter signed in07:40
sub-agent spawning: 6 threads created
valuesub-agent spawning: 6 threads created10:38
mini app concept: browser vs mini app panel
valuemini app concept: browser vs mini app panel18:16
tinder-for-email whiteboard
valuetinder-for-email whiteboard22:40
chorus.com mini app reveal
ctachorus.com mini app reveal25:00
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this