Modern Creator
Jack Roberts · YouTube

Hermes Agent + DeepSeek V4 = 100X Cheaper

A 21-minute walkthrough on running a three-model AI triad overnight — Opus plans, DeepSeek grinds, GPT-5.5 critiques — for 1% of the cost of going all-in on frontier models.

Posted
1 weeks ago
Duration
Format
Tutorial
educational
Views
25.9K
645 likes
Big Idea

The argument in one line.

By combining Hermes Agent with a three-model triad—Opus for planning, DeepSeek V4 for execution, and GPT-5.5 for critique—you can run overnight AI workflows that deliver 95% of frontier-model quality at 1/100th the cost.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • An existing Hermes Agent user who is paying full frontier-model rates and wants to cut costs by 90%+ by routing heavy execution tasks to DeepSeek V4 via OpenRouter.
  • A solo builder or indie hacker who wants background AI jobs running overnight — planning, building, and critiquing — without burning through an expensive Claude Opus budget.
  • Someone comfortable with OpenRouter model selection who wants a concrete three-model triad (planner, executor, critic) they can wire up in an afternoon.
  • A technical non-developer who uses Hermes for life automation and wants to understand which tasks should go to cheap models versus which need frontier intelligence.
SKIP IF…
  • You have not yet set up Hermes Agent — this video assumes a working Hermes install and skips initial setup entirely.
  • You are looking for a code-level deep dive into the triad architecture; the walkthrough stays at the configuration and prompt level, not the implementation layer.
TL;DR

The full version, fast.

Frontier-quality AI work no longer requires frontier-only pricing. By wiring Hermes Agent to OpenRouter and assembling a three-model triad, you can run Claude Opus 4.7 as the planner, DeepSeek V4 as the overnight workhorse at roughly one-hundredth the cost, and GPT-5.5 as the brutal critic that tears each draft apart until it ships. Each model handles the job it does best, and OpenRouter modifiers like nitro, exacto, auto, and bring-your-own-keys route work to the fastest or most tool-accurate provider while preventing rate limits. The practical result is a persistent agent that runs while you sleep, captures 95% of frontier output for 1% of the spend, and improves through critique loops rather than single-model agreement.

Members feature

Chat with this breakdown.

Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.

Create a free account →
Chapters

Where the time goes.

00:0001:16

01 · Software costs less than minimum wage

Cost reframe: AI is now cheaper than a junior dev. Hermes vs Claude Code distinction.

01:1603:12

02 · DeepSeek V4 pricing advantage

100x cheaper than frontier models. $75/M tokens vs $0.87/M. 95% of performance. Benchmark comparison.

03:1204:27

03 · OpenRouter as the single key

One API key unlocks all models with usage tracking. Introduces multi-brain model system concept.

04:2706:33

04 · Multi-brain model system

ChatGPT $20 sub for GPT-5.5, Gemini CLI free. Live demo of Gemini analyzing a YouTube channel visually.

06:3308:22

05 · Six OpenRouter features most people miss

Nitro, Exacto, openrouter/auto, BYOK, Fallbacks, Zero-completion.

08:2210:25

06 · The Triad framework

Plan (Opus 4.7) + Execute (DeepSeek V4 overnight) + Critique (GPT-5.5). Three models, one verdict, no brain isolation.

10:2512:35

07 · The Pantheon and Orpheus persona

Hermes dashboard for visually building specialist personas. Creates Orpheus: the deep-work triad persona.

12:3514:42

08 · Connecting OpenRouter to Hermes

Terminal: hermes setup model, select OpenRouter, enter API key. BYOK setup for DeepSeek.

14:4216:24

09 · Soul.md — feed Hermes who you are

Identity, mission, goals, key metrics, communication style. The more context Hermes has, the smarter every task.

16:2420:00

10 · Live Orpheus demo — niche analysis

Which Texas local service niche for AI/web services? Triad surfaces fire/water/mold restoration as top pick.

20:0021:08

11 · Wrap and CTA

Hermes + DeepSeek = agent that grows with you. Next video teased on maximizing Hermes potential.

Atomic Insights

Lines worth screenshotting.

  • DeepSeek V4 costs $0.87 per million tokens versus $75 for frontier models — running it overnight for grinding tasks costs 100x less at 95% of the output quality.
  • The Plan-Execute-Critique triad assigns the right model to each role: Opus plans and orchestrates, DeepSeek executes the bulk work, and GPT-5.5 critiques and validates.
  • OpenRouter is the key infrastructure layer — one API key unlocks access to every major model, provides a unified billing dashboard, and enables dynamic model switching.
  • Hermes lives across your entire life and is persistent; Claude Code lives inside repos and is session-bound — they are designed for different jobs, not competitors.
  • A learning loop where every task teaches Hermes more about who you are means the system gets more accurate and more useful without any manual training sessions.
  • Software now costs less than minimum wage — the strategic question is not whether to use AI workers but how many hours per day you have them actively running.
  • Gemini CLI with a free Google account gives Hermes multimodal video analysis capability — useful for reviewing YouTube videos, analyzing visual content, and processing media assets.
  • ChatGPT at $20 per month via OAuth gives Hermes access to GPT-5.5 — a cost-effective way to add a premium critic model to the multi-model triad without separate billing.
  • Overnight autonomous runs using DeepSeek as the execution model produce deliverables by morning without consuming Opus tokens for every step.
  • Claude Code and Hermes are complementary rather than competitive — use Code when you are at your desk focused on a codebase, use Hermes when you want things to happen while you sleep.
  • Running Hermes on bare metal rather than Docker is a common setup that trades isolation for simplicity — the right choice depends on how much you trust the tasks you are giving it.
  • Every model has specific strengths: Opus for design and planning, DeepSeek for high-volume execution, Gemini for multimodal tasks — routing each job to the right model is the skill.
Takeaway

Steal the triad.

Builder playbook

Let the cheap model do the overnight grinding — Opus sets the strategy, DeepSeek does the work, a critic closes the loop.

  • Set up OpenRouter as your single API key — one key, every model, usage dashboard included.
  • Wire DeepSeek V4 as the worker model for any task that can run overnight: research, analysis, code review, content outlines.
  • Always add a critic pass before shipping — single-model sycophancy is real, multi-model critique breaks it.
  • Build a Soul.md or equivalent context file so every agent task starts with full business context.
  • Use :exacto suffix on any model doing tool calls — not all models are certified, and agentic systems break on bad tool calls.
  • The triad scales: swap any model in any slot depending on cost vs quality tradeoffs.
Glossary

Terms worth knowing.

Hermes Agent
A persistent personal AI agent platform that runs across a user's whole computing life, learning from each task, scheduling background jobs, and orchestrating other AI models on the user's behalf.
DeepSeek V4
A large open-weights language model from DeepSeek that delivers near-frontier performance on reasoning and coding tasks at roughly one one-hundredth the API price of top closed models.
Claude Code
Anthropic's command-line coding agent that operates inside a specific code repository with a tight tool loop and a bounded session, built for working on codebases rather than general life tasks.
Frontier model
A top-tier, state-of-the-art large language model — typically the newest flagship from OpenAI, Anthropic, or Google — that sets the current ceiling for reasoning and capability.
Claude Opus 4.7
Anthropic's highest-capability Claude model in this setup, used as the planning and orchestration brain because of its reasoning strength.
GPT-5.5
An OpenAI flagship chat model used here as the critic that reviews and tears apart the worker model's output before it ships.
Gemini CLI
Google's command-line tool for calling the Gemini family of models from a terminal, free to use with a Google account and especially strong at multimodal tasks like video analysis.
CLI (command-line interface)
A text-based way to control a program or service by typing commands in a terminal instead of clicking a graphical app.
OpenRouter
A unified API gateway that lets you call hundreds of AI models through a single key and dashboard, with built-in usage tracking, routing, and fallbacks.
Multimodal model
An AI model that can natively process more than one type of input — for example text plus images, audio, or video — instead of text alone.
Tool calling
An LLM's ability to invoke external functions or APIs mid-conversation — querying a database, hitting a web service, running code — so it can take real actions rather than just generate text.
Rate limit
A cap a provider puts on how many requests or tokens you can send in a given window, which throttles or blocks further calls once exceeded.
BYOK (bring your own key)
A setup where a platform routes your requests using API keys you supply for the underlying providers, so usage and billing flow through your own provider accounts.
:nitro suffix
An OpenRouter model modifier that auto-routes a request to whichever provider is currently fastest for that model.
:exacto suffix
An OpenRouter model modifier that restricts routing to providers certified for high tool-calling accuracy, useful when an agent needs reliable function calls.
openrouter/auto
An OpenRouter routing option that picks the best-fit model for a given prompt automatically, with no surcharge over standard pricing.
Zero-completion billing
OpenRouter's policy of not charging for empty or errored model responses, so failed generations don't appear on the bill.
Plan-Execute-Critique triad
A multi-agent pattern where one model plans the task, a second model does the heavy execution work, and a third model critiques the output, with the cycle repeating until the result is good enough to ship.
Persona (in Hermes)
A named, reusable agent configuration inside Hermes with its own role, instructions, and assigned model — invoked by name when you want that specific behavior.
Pantheon (Hermes dashboard)
A visual dashboard inside the Hermes setup for creating, organizing, and editing the user's collection of named agent personas.
Resources Mentioned

Things they pointed at.

Quotables

Lines you could clip.

03:12
Would you pay 1% of the price for 95% of the value?
Entire video thesis in one line, no setup neededTikTok hook↗ Tweet quote
01:11
Software now costs less than minimum wage.
Provocative reframe, works standaloneTikTok hook↗ Tweet quote
17:57
WD-40 was the fortieth version that actually worked, hence the name.
Memorable analogy for the improvement loopnewsletter pull-quote↗ Tweet quote
17:57
If you just ask Claude directly, I have found it just agrees with you for no reason.
Relatable pain point every AI power user has feltIG reel cold open↗ Tweet quote
The Script

Word for word.

metaphoranalogy
00:00When you combine Hermes Agent with tools like DeepSeek, you unlock capabilities that 99% of people don't even realize exist. And I'm gonna show you exactly how to connect Hermes with the world's most powerful models, which means that you can use Hermes from $0, build as much as you want to with no limitations or rate limits, a my new system for using Hermes' most powerful feature that will make you 10 times more productive, even if you're a complete beginner.
00:25And if you're new, I'm Jack. I built this on my life tech startup with the Gizling customers. Now I'm building my own AI businesses, and I just share the stuff that actually works.
00:33So if you haven't already, grab that beautiful coffee, and let's dive straight in. Beautiful.
00:38So let's talk about Hermes then and exactly how we leverage this with DeepSeek. As I'm sure you know, if this Hermes thing is new to you and you wanna have to set it up, I'll put a link on screen somewhere so you can get it all rocking and rolling. And now officially, can actually jump in and have a look at the Hermes plus DeepSeek.
00:53And I'm not just talking about DeepSeek, although DeepSeek is gonna be doing some heavy lifting for us, but you've gotta see why and why I haven't seen many people talk about the strategy I'm gonna cover here. So first thing we have to understand though, before we even take any step, is that software now costs less than minimum wage.
01:09So essentially, the balance has shifted such that we can employ AI that are as a good, if not better than junior developers, to do things for us over time. So the question simply becomes for us, how many minutes of the day do we have basically AI systems working for us, building, improving things whilst we're going about our day to day things using Hermes.
01:30Now we talk a lot about building no code software with Claude code. We talk about Hermes. Remember, they're two very different things.
01:36Claude Code lives inside repos. It's got a tight tool loop, session bound, and it's built for code bases. Hermes lives across our entire life.
01:45It's persistent. So it learns from every task that we give. It is self evolving in that sense.
01:50The more stuff we say, the better it gets. It schedules background jobs and it builds essentially the idea of Hermes is a deep model understanding of who you are.
01:59And the better it knows you, the better it can help you with your life. And the whole point of systems like this connect Claude Co. To Hermes quite nicely.
02:06So let's talk about the reflection of pair. Okay? So here's a learning loop.
02:09Every task teaches Hermes essentially who you are, and that's how it actually gets better and understands more things like this.
02:16Hermes itself works whilst you sleep, so we're gonna give it something in this video that shows you how you can leverage deep seek overnight to do some pretty exceptional stuff for when it's valuable. So the idea with this is that effectively we pick our brain.
02:28We pick a frontier model, the most intelligent and effective model to be what we'd call the conductor, the organizer. I have personally found in my experience, Claude Opus 4.7 is the best model for them.
02:41And the idea here then is that what Hermes can hot swap between every single frontier model and use the right model for the right job. Specifically, I'm gonna show why DeepSeg v four is so powerful in this setup.
02:53I've seen a lot of interesting ideas about running it on your computer, and that is really fun. Sometimes that you'll find that your literal, like, MacBook will melt through the table because, like, the amount of effort it requires. So there's some different trade offs to be aware of here.
03:06And we can even see here just how powerful DeepSeek v four is. Now the message from this graph, because we don't wanna date a benchmark Like I always say, we've seen it in the profile. We've gotta take her on a date first, see what's up.
03:16It's not trying to say that DeepSeek v four is better than OPUS 4.7. Of course, it isn't. But the question is, would you pay 1% of the price for 95% of the value?
03:26And if we can truly get, like, a 100 x output and we don't need that full max level redlining brainpower, what could we realistically accomplish?
03:35That's kind of the question here. And you can see just how comparable they are to each other and how powerful and how well DeepSeek v four has actually physically done here. And this is the benchmark just for your reference.
03:44And so the idea is it's a 100 times cheaper, and we get the same job done overnight, which is freaking fantastic. And you can see here, $75 per million tokens out versus immediately 87¢.
03:55I know what you can buy with 87¢. You can't even buy a hamburger these days without the amount of money. And so the idea here is that we're gonna tag in OpenRooter.
04:02OpenRooter is overpowered because once we give it access to OpenRooter, we can see our usage, we can track our usage in a beautiful dashboard. And on top of that, we can access all of the models and switch it dynamically whenever we want to without the need to have and control a thousand different keys and track usage over here and track usage over there.
04:20In other words, very powerful. It's one key. It unlocks everything.
04:23And this brings us nicely onto the idea of the multi brain model system, being that essentially every model has its own strengths and weaknesses. And we can build systems that bring in the best model for the particular job, meaning it runs twenty four seven overnight.
04:37And so first of all, two models that you absolutely should have in your Hermes agent. Number one is going to be OpenAI's ChatGPT because for your $20 subscription, you effectively get to use your ChatGPT, which is gonna be insane amounts of value and gives you access to ChatGPT 5.5.
04:53The second is to Gemini CLI, and with just simply an email address and a Google account, we can use Gemini. So this is an example of leveraging a model for a specific thing.
05:03I'm Then gonna show you how Deepsea comes to this and makes it incredible. So check this out. I can say, for example, hey, though.
05:09I would like you to use the Gemini CLI to go ahead and look at Jack Roberts's last video and give me a breakdown analysis visually of what you see in the first ten seconds and send that one off. And CLI just stands for command line interface.
05:24It's just a very quick and the easiest way to connect to any service. If you don't have the Gemini CLI installed, all I'm going to do is come over to this GitHub repo right here, click on code, click on copy of that code, and then head over to your language model of choice, could be the called codec, codecs, or anti gravity.
05:41And so for example, if I'm here, I'm just gonna get this command which is, hey there, I'd like you to install the Gemini CLI onto my computer. Okay?
05:48And all it's gonna do is go ahead and grab that GitHub repo. And as you can see, I've already gotten mine installed. And this CLI works exactly the same way with GitHub, with OpenAI, with Vercel.
05:57It's incredible. Now let's come back over and see how it's gotten on. So check this out.
06:00It's user CLI. Fantastic. It's got my YouTube channel, and it's literally broken down visually all the stuff it can do because Gemini is so powerful at breaking down video.
06:09It is the multimodal model, and we can now basically tag in. And this one of cool thing about home is it can bring in any model that wanted to, and this is running on my local computer. Right?
06:19And so essentially, I have the Gemini CLI on that. So if I say, hey, user, it can literally use it for us right there directly. And if it was on hosted somewhere, we'd need to basically install it.
06:27But because it's it's fantastic. And look, it's breaking down the way that I move in my first ten seconds. But this is just the beginning.
06:34It actually gets way crazier than this. Now I've been playing with loads of different models. Now one of the techniques I found is called the council or the triad.
06:41And the idea is we have this super intelligent model, and we bring DeepSeek v four to do an insane amount of heavy lifting, then we have a super intelligent model that reviews and delegates. But we need to build it in the proper system. Now there's a couple of things that you need to know about OpenRooter to get the most out of this model.
06:57Some might actually surprise you. These here are expressions that we can add to the end of any model that we're doing and effectively does some really interesting and useful things that are gonna help us with Hermes.
07:08First of all is Nitro. So this can append to any model and it auto routes to the fastest provider at that moment.
07:15For example, Anthropic forward slash Claude Opus Nitra. Fantastic. You've got Exacto, which is a little Italian, but it's very fantastic, or Asatom.
07:24Only providers rigorously certified for tool calling accuracy. Freaking really awesome.
07:29Right? Because if we're having these like systems and models doing agentic things for us, in other words, they need to tool call. They need to check databases.
07:37You know, not every model is great at tool calling. So we only wanna grab the ones that are great at doing that thing. Another call we have on the smart routing side is open route to auto.
07:45So this picks the best model for your prompt. They're not diamond, no extra fee. That's pretty cool.
07:51Right? That's pretty fantastic. It can do that.
07:52Then you got basically bringing your own keys, which is fantastic. So for example, if we're using something that's getting rate limited like a DeepSeek v four, we can actually bring our DeepSeek key into OpenRouter just to save us that time and make the whole process easier.
08:06We've got fallbacks. And the last one is zero completion. So you're never charged for blank or error responses across their customer base that saves almost $20,000 a week, which is pretty handy, we might say.
08:16So let's talk about the triad and how DeepSeek actually physically fits into this. The idea is three different models. One verdict, no single brain.
08:24No brain isolation. Okay. So if you think of it like this, this is the general strategy.
08:29This is very well reflected in research. I think the triad sounds a lot cooler. I think it needs a little bit of a rebrand.
08:34Triad sounds cool to me. The idea here is we have plan, we have execute, and critique.
08:39And I've often found, genuinely speaking, none. I will never ship anything. I will never ship anything unless it is severely and brutally critiqued.
08:45I actually use the word brutally critique because I wanted to be as critical as possible. Trying to criticize is a skill set in of itself. The idea here is that we have Claude Opus 4.7, which at the moment is the king, ruthlessly, okay, planning.
08:59Now when it's create a plan, the deep sea, the giant whale, okay, that is like one one hundredth of the cost for 95% of that performance doing all the heavy work. Some say it's deep seek labor.
09:09I don't know. Call it what you want to, but it's working hard for us day and night. And this is the deep seek that can churn in the background for twenty four hours while we're sipping our lattes and enjoying our beautiful days and spending time with our families.
09:20And then we have a a critic model, which is just going to essentially pick that apart and make sure that it's correct. And again, then we have a planning model and it works in a beautiful circle like that. So for example, we have called Opus that can decompose a task, write the brief, and execute the workflow.
09:33DeepSeek v four is gonna grind and see the plan overnight. It's cheap enough to retry often if it doesn't get it right. And then we can bring in a different model for the critic.
09:41It doesn't have to be Gemini three. It could be any model that you want to. I typically like to not have it being Opus, I like to just get a slightly different flavor, a different scoop of ice cream, if you will, from a different very capable model.
09:53Most likely, ChatGPT 5.5, but you can tag in Gemini if you want to. And to do this, I'm gonna be using what I call the Pantheon. So last video, I showed you how you pull up this entire beautiful Hermes operating system that effectively is a beautiful dashboard that basically allows you to connect Hermes to your Chord code operating system because we do coding on our computer.
10:11Right? This gives you an overview of your spend, your costs. This dreams for you overnight.
10:16So based on your entire chat history with Claude, your usage, how you're using a chat GPT and Claude in every model in your computer, this will give you dynamic feedback and suggestions. It is auto dreaming.
10:27You can mark this off as done, go through these things, and it's incredible. What's really powerful here is we can connect this to Hermes, which basically means that Hermes has access to all the data and everything you're doing with coding. It shows you all your skills and all of your fantastic memory systems.
10:40Now in Hermes itself, one of the really interesting things that we can do with the Hermes agent here is actually connect it to our memory systems, and we can connect it to everything we're doing. I'll put a link on screen for that full guide breakdown. If you wanna check that one out, you might find that one super helpful.
10:55So this will be a link in the description. I I referenced that video earlier so you get it. Obviously, can chat to Hermes in the chat if you want to.
11:01It's just helpful to have all the one dashboard. But what I wanna look at here is the panther. Now, you can do this just in Hermes chat.
11:06I like to do this because I like to have a visual look at everything that I'm particularly designing. I just find this way easier to do.
11:12So I'm gonna add in a persona. Let's call this one something like Orpheus. That sounds fantastic.
11:17I'm gonna give it a job. Right? Deeply reasons on any topic.
11:21Okay? So this is gonna be a very powerful one. And I pulled together for you a template for this triad system that effectively breaks down the flow so that actually Hermes understands how it works.
11:30And it breaks down into three separate prompts. We have Opus the conductor, who's the conductor of the Hermes triad. We have DeepSeek the worker.
11:36You're the worker. Here's the loop. You read the proof.
11:38You identify three to five angles, listed by the conductor, and then we have the GPT 5.5 who is the critic that looks down and it kind of assesses it dynamically. All you can do is literally come down here, grab the flow like so, copy all the stuff. Obviously, if you're using the Hermes dashboard, awesome.
11:52If not, you could just paste this into Hermes. And so I've just pasted mine in here. And here, I'm gonna add a little bit of description, which basically explains a few lines on what this percent is for and when to summon them.
12:01This is for when I want to go very deep on a topic. We're gonna leverage Opus as a conductor.
12:08DeepSeek is an extremely deep workhorse that can work for hours on the topic, and then we're gonna have ChatGPT to review everything. This is for extremely powerful deep work.
12:17Awesome. And then we're happy with that. Again, you can just give this information that you want to to Hermes.
12:21I personally find the system really helpful. I like visually seeing everything. I think it's fantastic.
12:25Then pick the model that you want orchestrating it. So for us, it's open to 4.7, and then we just click on create Orpheus. And then when that's done, your dashboard will reflect this.
12:32Of course, I'll just give it to your guy in Hermes, and I can click on Orpheus. And let's have a look at what they're doing, and we can see we've got the flow. So if at any point, I just wanna come in and amend it, I can do that, which is really cool and just makes things a lot easier, I think, which is fantastic.
12:44But we're good. So now we just wanna sync these. So I'm just gonna come down here.
12:47I'm gonna cap this syncing prompt and shoot over to Hermes. And I'm just gonna come up to the top of the screen and drop this bad boy in here so we can have a little bit of a conversation. Now, obviously, it's great.
12:56Again, I just like to visually see this. And once we've done that, what we need to do then is connect OpenRooter. And to do that, we've got to use the terminal.
13:03So for example, we can do that on our computer or can do anti gravity. If I do control space bar and I just type in deeper terminal, we have this guy pop up. Now how we actually solve this?
13:11Many different ways. Easiest way to do this, if you haven't already, I'm gonna grab the Hermes setup button real quick. Actually, when the model setup button, you're gonna do command space, type in terminal and it will appear.
13:20And we just enter Hermes space setup space model. And then from here, you can basically select all the ones that you've got. And what you're gonna find on here is open router.
13:28So you're gonna press the space bar to select it like so. And then I've already got mine in, so I'm just gonna keep that one. But here, you'd be prompted for an API key, and we just grab that from open router, which is this website here.
13:37And again, shows you all the different performance, which is really interesting to actually see where the central gravity goes. But then you can move to open router here, click on get an API key, then you simply come over to new key, and then you can create it. You also have this b y o k on the left.
13:49If I click on bring basically, bring your own keys, I can search all the providers. So I can come down here, and I'm gonna type in DeepSeek. I can configure this real quick.
13:57Beautiful. And I can add in any API keys I want to. So we're gonna add a priority API key.
14:01And I recommend this for DeepSeek just so you never hit any rate limits. Then you just head over to DeepSeek API key into Google or Brave, and you're looking for the DeepSeek platform. Again, you just sign in, drop like $510 as much as you want to into DeepSeek.
14:14You are gonna be using it a lot, so I'd throw some good money. You have to get crazy, obviously, but put some money in there, and then you literally copy and paste that key here, and then you have essentially your own keys for DeepSeek if you want to, so you never get rate limited just in case. And then essentially, at this point, we can do anything we want to.
14:28Again, we could chat in Telegram, but I could come up to Hermes and be like, hey, the dude, I'd like to use Orpheus to solve a problem for me. Could you just confirm you know what Orpheus is and let me know if there's anything that you need from me? Send that one off like that.
14:40Obviously, what I'm using with speech to text is Glidel. It enables me to yap and get a lot more yapping doing in the time, which is fantastic. And just like that, we have response.
14:47So I know Orpheus. Orpheus is your deep work tried persona. Opus sets as acts as a coordinator, first interrogating the problem five to 10 clarifying questions, then it writes one tight one page brief for worker slash critic loop.
14:59The intended flow goal is you give it a goal. Orpheus asks clarifying questions, then write a proof, worker attacks it from multiple angles, the critic tears it apart until it's shippable, and then Orpheus validates final artifact. And what I need from you is the problem you want solved, your desired output format, how deep do you want me to go, deep dive or just work for hours mode, and any constraints, budget, tools, deadline, audience tone, things to avoid, and what success looks like.
15:22Little hack for you as well guys. You'll know for example that Hermes itself saves information in its agent memory and its soul. What I would love you to do for your soul, and I'll publish for you as well, and I'll put a link for it down below so you can grab it.
15:33And so this is the soul dot md. Effectively come down and just replace it. This is exactly how to run it and how to use it.
15:39It's stuff on your identity, your mission, and goals. What's your goal this year? I wanna hit $20,000 a month in my SaaS.
15:45I wanna double my investment portfolio. I want to acquire seven companies. Whatever it is, whether you maybe you're billionaire, you wanna be a trillionaire.
15:51Whatever the thing is, make sure Hermes knows what it is. Explain your business, your revenue, your runway, bank payments, any key information you wanted to know, key metrics that Hermes needs to check and be aware of, voice and communication details, how exactly you would like to have it speak to you, show by default, one question at a time, how it needs to write to you, the rhythm, all this detail.
16:10Take a look at this. If there's anything else you want, you can add it in there. But honestly, these are just some really great questions.
16:14You just feed it this information about yourself, then Hermes is gonna have that fantastic context, and then that will appear in your soul.md. And you can even say, hey there, I want you to add this to my soul.md. Go hands free mode, and then just yap to your heart's content.
16:27And literally, that will cover everything. Hey there, I'd like to use my Orpheus skill. Could you just confirm to me anything that you need to know about that persona before I give you a task, please?
16:35This is cool. Orpheus persona, what I need to ask for a task. Opus is a conductor.
16:39DeepSeek is the worker. GPT 5.5 is the critic. When you know desired outcome, success criteria, so let's just tell it.
16:45Hey there. My desired outcome is to know I would love to sell websites and AI services to businesses. My outcome is to know which niche should I personally choose.
16:55Success criteria is a list of the top three niches that have effectively high margin, maybe typically unsexy and untargeted, but have a high need need for AI and automation services.
17:06Could you do a quick one for me? Maybe like ten minutes. I think that's a pretty sharpish on this.
17:11The audience is for me. Constraint is just keep it nice and snappy and short. Not too short, obviously, like a good good amount of detail.
17:17I want emojis per thing. And then the known assumptions, what do I already believe? I well, I have some beliefs that businesses like roofers and pool cleaning companies would be an excellent start based on my experience.
17:30An output format, yeah, just give it me in emojis and breakdowns. I know you're set to interrogate me first, but yeah, you can ask me a couple of clarifying questions if you like to. So you can already see this process and how valuable this is.
17:40I send this off. Now imagine actually setting any important decision through this. Like, if you just ask Claude directly, I've actually found it myself.
17:48It just agrees with you. It just sometimes just agrees with you for no reason. Like, dude, you're just agreeing with everything I'm saying.
17:53This isn't good. We need to you have to back build interrogators. And the beauty of this triad strategy here, guys, is the fact that we've got different models.
18:02We have critique. So think of the number of loops. The idea of progress.
18:05Right? WD 40 is like effectively solves like 90% of household elements. Right?
18:10The only reason it got that is they had a very quick improvement loop. Like, WD 40 was the fortieth version that actually worked, hence the name. And think about how many incremental improvements that you get when you have a critic and review agent running around like this.
18:24And then we have obviously OPUS 4.7 that's setting the strategy. It's just gonna make you unbelievably effective.
18:30Like, it's insane. So when I scroll down, I'm effectively hearing what it's understanding. It's picking the top three niches, which is great.
18:35Success criteria. Click quick clarifying questions. Cool.
18:38So geography is gonna be worldwide. Actually, let's do let's do Texas, please. Office shape.
18:44I'll let you tell me what you think is gonna be most effective for that. Price point, I'm gonna guess again that could be, you know, 1 to 15 k. That's absolutely fine.
18:53Sales motion, I'll let you lead on what you think the best one is gonna be. And to basically guide my decisions, guide my thinking on it, I think that'd be fantastic. And then, yeah, go ahead and let me know what the output is.
19:03Now I've just used this as a random example, but you get the idea of how the system could actually work. One little tip that might save you as well is add in fallback. So for example, if tokens ever run out, default to x y z instead.
19:15So it can physically do that for you. Beautiful. So now I come down and at this response here.
19:18I've got fire, water, mold restoration. It's saying white wins. Emergency leads can be insanely time sensitive.
19:23One job can be three to 50,000 thoughts, guys. If you're not working with these companies, after this video, we need to go ahead and do that right now. Emergency lead capture is really cool.
19:31It's giving you pricing ideology. It's giving you, you know, foundation repair, drainage, waterproofing, and all the reasons why. And what I could do is basically come down and say, this is awesome.
19:40Could you just explain to me your thinking behind this? What each model did and how you arrived at this conclusion, please, so I can best understand that. And so let's see what it did here.
19:47And I just do this to show it's working. I feel like a math tutor now, and Hermes is my little student telling me everything it's done. We've got office and but Opus was a conductor, set the frame, deep sea works.
19:57Jeep two was a critic, and then this gave the final synthesis. So the point is to avoid one model just vibing the answers. So what is good niche?
20:04Instead of asking that, you ask which Texas local service niche is most likely to buy a high margin website, blah blah blah. So judging this on this, which is called the deep sea worker score, basically, scored the market brutally, went through everything, just kind of explains its thought process, which is cool.
20:18But you can set it up to do these loops overnight whilst you're sleeping. You can even use free models if you want to, But I would generally just advise against that just because free is not free as we say in The USA. The point here is that like for a fraction of the cost that you get with DeepSeek, your quality differential is insane.
20:34At the end of the day, if it runs for a million years and the answer is still garbage, what use is that to use? So I like to go for that ratio of what I personally use Opus 4.7 or what. But for massive stuff, using DeepSeek is insane because you get, like, I'd say 95% the value for like 1% of the cost.
20:50And so then we're building out this pantheon of special skills with this particular triad skill. Now the idea here is that Hermes plus DeepSeq with the whole infrastructure we've got is an agent that grows with you.
21:00But it does lead us on to one final question. And that's how to get Hermes to its maximum potential, which we're gonna learn in this video right here.
The Hook

The bait, then the rug-pull.

Jack Roberts opens with a blunt provocation: 99% of people do not know what they are leaving on the table. Then he spends 21 minutes proving it — showing how a three-model triad running overnight through OpenRouter delivers near-frontier AI work at a price so low you can afford to retry it a hundred times.

Frameworks

Named ideas worth stealing.

08:22model

The Triad

  1. Plan (Opus 4.7)
  2. Execute (DeepSeek V4)
  3. Critique (GPT-5.5)

Three-model AI loop: conductor plans, cheap worker grinds overnight, critic tears apart until shippable.

Steal forAny overnight batch task — research, niche analysis, content strategy, code review
10:25concept

The Pantheon

Named specialist personas in Hermes each wired to a specific model mix.

Steal forOrganizing different agent roles in JoeFlow Sessions
14:42concept

Soul.md

Context document feeding Hermes identity, goals, business details, metrics, communication style.

Steal forChef context in JoeFlow — same idea, different name
06:33list

OpenRouter modifiers

  1. :nitro
  2. :exacto
  3. openrouter/auto
  4. BYOK
  5. Fallbacks
  6. Zero-completion

Six string modifiers appended to any model name to change routing/reliability/cost behavior.

Steal forAny system using OpenRouter for model dispatch
CTA Breakdown

How they asked for the click.

20:00next-video
how to get Hermes to its maximum potential, which we are gonna learn in this video right here

Soft next-video CTA only — no subscribe ask, no product pitch. Clean and low-friction.

Storyboard

Visual structure at a glance.

open
hookopen00:00
cost reframe
hookcost reframe01:11
benchmark
valuebenchmark03:12
OpenRouter
valueOpenRouter06:33
triad
valuetriad08:22
Pantheon
valuePantheon10:25
live demo
prooflive demo16:24
CTA
ctaCTA20:00
Frame Gallery

Visual moments.