Modern Creator
Simon Scrapes · YouTube

I Rebuilt Hermes in Claude Code (It's Ridiculously Good)

A 13-minute teardown of why rebuilding an agentic OS from scratch beats installing someone else's assumptions.

Posted
yesterday
Duration
Format
Tutorial
educational
Views
5K
262 likes
Members feature

Chat with this breakdown.

Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.

Create a free account →
Chapters

Where the time goes.

00:0001:05

01 · Cold open + promise

Hermes velocity stat → 'I read through the issues' → thesis: rebuild don't install → what this video covers

01:0503:08

02 · Cost #1 — Inherited assumptions

The self-learning loop grades its own homework. No external validation. Can silently overwrite your good work with no audit log.

03:0803:47

03 · Cost #2 — Can't fix what you don't own

OpenClaw: 200+ CVEs filed since February, 386 malicious packages from one threat actor. You're debugging someone else's code.

03:4705:08

04 · Cost #3 — Doesn't scale across clients

Paul Baier (nontechnical CEO) spent 100+ hours and $1,000+ testing OpenClaw. Hermes is single-tenant by design — separate install per client.

05:0806:25

05 · What he rebuilt: Identity layer

Keeps user.md + memory.md from Hermes but adds per-client brand context folders — voice, ICP, positioning, visual identity — that share procedures across clients.

06:2508:23

06 · Memory system

Keeps Hermes's capped injection (~1,300 char memory.md) but replaces keyword long-term search with MemSearch (semantic/meaning-based recall).

08:2311:00

07 · Self-learning loop critique + skill systems

Hermes auto-generates new skills but ends up with 15 near-duplicate LinkedIn skills with no deduplication or version control. Solution: modular skill components that chain together.

11:0012:56

08 · Build vs. buy trade-off + CTA

Honest framing: faster to start with Hermes, faster to scale with your own. Neither is right for everyone. CTA to AgenTek Academy.

Takeaway

The modular OS beats the installed one.

Own your stack — the AI edition

Hermes is faster to start; your own setup is faster to scale — and the hidden costs of someone else's architecture only surface once you're already committed.

  • Use Simon's three-hidden-costs structure verbatim for any 'why I stopped using X SaaS' video — it works for any AI tool critique.
  • The self-validation problem ('grading your own homework') is a clean, quotable metaphor for any content about AI blind spots.
  • The modular skill system idea directly maps to Joe's own setup: voice.md, ICP.md, format.md as separate source-of-truth files that compose into skill systems.
  • Simon's multi-client identity layer (per-client brand context folders sharing procedures) is worth shipping inside JoeFlow's Sessions panel as a named feature.
  • The MemSearch upgrade (semantic vs. keyword recall) is a concrete next step for any memory system — worth researching for the JoeFlow stack.
Quotables

Lines you could clip.

00:22
You inherit somebody else's architecture, their assumptions, and therefore their problems too. You can't fix what you don't understand underneath.
Clean 2-sentence thesis, no setup neededTikTok hook↗ Tweet quote
01:34
The same model that writes the skill is also the sole judge of its correctness.
Self-validation problem framed in one sentence — memorable, shareableIG reel cold open↗ Tweet quote
03:08
Hermes may be faster to start, but your own setup is actually gonna be faster to scale.
The core trade-off in one lineTikTok hook↗ Tweet quote
10:39
A skill is a modular component that feeds into a skill system. Each one does one job. It lives in one place.
Clean architecture principle, developer-friendlynewsletter pull-quote↗ Tweet quote
11:39
When your brand voice does shift, you just have one file to update and then every skill system that uses that is gonna pull from that single file. So it's infinitely maintainable and scalable.
Concrete payoff of the modular approachIG reel cold open↗ Tweet quote
The Script

Word for word.

metaphoranalogy
00:00Hermes went from zero to 40,000 GitHub stars in forty six days and to compare OpenClaw did it in sixty one. So for Agencik systems this is the fastest adoption ever seen on GitHub and when you look at what they do, the memory systems, the identity layers, and the self learning loops, you can understand why. But before I installed it, I did something most people don't do.
00:20I went and read through the issues and pretty quickly I realized something. The off the shelf systems are fast to begin with.
00:27They're fast to start. But you inherit somebody else's architecture, their assumptions, and therefore their problems too.
00:34You can't fix what you don't understand underneath. So instead of replacing Claude code, I rebuilt the parts I actually wanted inside my own setup. And honestly, it turned out ridiculously good.
00:44Not because it's better than Hermes, but because I actually understand every single layer now. And I built it in a modular way so I can swap pieces in and out, reuse workflows across projects, and evolve the system as the space changes.
00:56So in this video, I'm gonna show you the exact Hermes features I rebuilt inside CoreCode and the parts I deliberately skipped and why understanding the architecture underneath gives you way more leverage long term than just installing something like Hermes blindly. So let's get into it.
01:10But before I show you what I built, let me show you the three hidden costs of installing something like Hermes off the shelf to save you some time and pain later. So cost number one is that you inherit assumptions that you didn't even know existed in the first place. So as an example, the infamous self learning loop on Hermes, the bit that everyone celebrates has no external guardrails.
01:30So effectively telling it to build its own skills automatically then grade your own homework. So we've got the self validation problem.
01:37The same model that writes the skill is also the sole judge of its correctness. So without that external validation step, it basically can't see its own blind spots. It thinks everything is good.
01:47And what that means in practice is it can quietly overwrite the changes that you've made to make your skills better with worse versions and has no version control or audit log. So you can say goodbye to your good hard work.
01:59So cost number two is that you can't fix what you don't understand. So OpenCLaw is one cycle ahead of Hermes. So the first version came out in November.
02:07The first version of Hermes came out in February, but it's the same category of product. But when you look at OpenCLR, we've got over 200 vulnerabilities identified and filed since February.
02:17You can see that we've got a ton of critical and high vulnerabilities that exist for OpenCLR. And a security researcher even found 386 malicious packages on the skills marketplace from a single threat actor.
02:28So when something breaks at this scale, when something is critical to security, you're left debugging somebody else's code because you don't understand the assumptions underneath or their choices they made when they were building it. So cost number three then is it doesn't scale across your business.
02:42So we've got Paul here who's a nontechnical CEO. He spent over a hundred hours and over a thousand dollars testing OpenCLAW over two months.
02:51He wanted to understand if the hype was real, if it could do things that personal AI systems promised they could do, but basically later found that the bugs and security gaps that he identified disqualified it being from any sort of usable. He's now moved on to Claude and has replicated a bunch of the functionality, 30% of OpenCLOS features in the last couple of months.
03:10So Hermes may be faster to start, but your own setup is actually gonna be faster to scale. And the hidden costs of off the shelf software like OpenCraw or Hermes only show up once you're already committed and in the process of building with them. So let's get into what I actually built and what parts I lifted from Hermes.
03:26So the first thing that Hermes actually nails and the first thing I therefore rebuilt is the identity layer. So that agent needs to know who you are, who your business is, and what you stand for. Otherwise, every AI output is gonna sound like an AI output.
03:40So in Hermes, this represents itself as a memory dot m d file and a user dot m d file. It's a super simple setup and designed for one individual client or a single business. But that's also where its limitations come in because it's assuming that you're one person working on one set of stuff, and there's no concept of switching brand contacts, client contacts, or business contacts inside a single setup.
04:01So if you wanted to run Hermes for multiple clients, you'd effectively have to install for each individual client its own Hermes installation with its own memory and user dot m d files. So if you run an agency or multiple clients or even just two distinct brands of your own, you either bake it into one identity and one system in one install and live with that or you spin up entirely separate Hermes installs and each one of those has its own memory its own skills and its own learning loop.
04:27So I'm sure you can see how that embeds a maintenance problem because the skills aren't shared between the clients even though some of the procedures might be repeatable. And it's not a direct knock on Hermes, it's just what they built it for but it's not fit for purpose for a business owner running multiple clients or multiple brands.
04:43So the way that we've built this is to effectively inject context in the same way. So we have it for our own identity inside a user.
04:50Md file, have memories inside a memory. Md file but we also inject shared brand context like voice ICP So each individual client has its own set of shared context, their brand voice, their ICP, their positioning, and their visual identity.
05:14But they're still able to actually access and share the procedures or the skills across those client folders. So we've effectively built the folder structure so you can handle multiple clients or multiple brands but still share the relevant shared context so you don't have to maintain it in multiple places.
05:28It's just one single install versus Hermes for multiple clients would be individual installs that each have their own memory and learnings. Now what Hermes actually does is injects the memory dot m d and user dot m d into the start of every single conversation which drastically improves the short term recall of important information.
05:45So let's on go now to talk about memory, which is probably the most important feature after this shared brand context for getting better results. And I've got to give it to Hermes. They've actually really thought through the way you store, inject, and recall information at various points in the life cycle.
06:00Now before we move on to that if you're enjoying the content so far then drop down below, hit the subscribe button, hit the like on the video, it's massively helpful to me. So let's get back into the memory system that Hermes uses that's actually very very powerful. So when you consider memory, we've basically got three levels here.
06:14We've got storage of context, then we've got how does that context actually get injected into every conversation, and then more long term, how do we recall memories that aren't recent but are still important? The ones that we have to go back and search for it.
06:27So simply put, Hermes auto saves and summarizes conversations every single conversation turn. It then injects important memories back into every conversation through the memory dot md, the user dot md, and sold.m d files.
06:42And that is capped at, I think, 1,300 tokens, which means we're only loading in a limited snapshot of recent important information for every session.
06:51But its biggest limitation is when you go back to actually recall the information that has not been injected into that recent memory and that's because it's searching by keyword and not meaning. So we might be able to recall exact long term memories if we remember the words we used when we were talking to Claude but it's much harder if we can't exactly remember what words we used when we talked to Claude about it, which is pretty likely.
07:13Right? And kind of rendering long term recall in this case a bit useless. Who remembers the exact words they used with a client six months ago in that conversation they were having with Claude.
07:23And this is where it gets really powerful when you're building a custom setup because we can take the stuff that we like about Hermes or the stuff in green like the fact we are capping a memory dot m d file at 2,500 characters or 1,300 characters and injecting that as a recent memory into the conversation as a memory dot md file.
07:41Then where there were limitations like in the recall where we only had keyword search we can take other memory systems like memsearch in this example and make recall much more powerful and that's exactly what we've done with our own agentic operating system. So we're still using some patterns of the recall from Hermes where we effectively check that injected context first but then when the information is not found in that local memory we go deeper and actually search by meaning and not by keywords And that's part of the MemSearch architecture, not the Hermes architecture.
08:10So you can plug and play the bits that you like when you build your own custom system and make it bespoke for your context. Say you needed verbatim recall, you might implement Mem Palace instead of Mem Search for example. Now here's the bit where Hermes gets controversial which is that self learning loop we talked about earlier.
08:27So one of Hermes biggest selling points is the self learning loop. So an agent finishes a task it's gonna write itself effectively a new skill every time and use it the next time, which sounds brilliant in practice. And the first time it happens, it's probably pretty special.
08:42But what happens by the tenth skill or the twentieth skill when you've made tiny iterations on effectively the same process? So effectively what we're doing is we are starting on day one. We are telling it to do a specific task.
08:53And then a couple of weeks later when we come back to do a similar task, it's gonna create two skills that are fairly similar, have a similar description, but are kept as separate skills, maintained as separate skills because it's not gonna capture the nuance in our process. And we also have poor visibility of all the skills that we have existing already, so it's just gonna continue to create more skills.
09:13And each one is gonna capture that approach at the moment in time with that context for that specific situation. So over time you risk ending up with 15 skills that all do roughly the same thing like LinkedIn post v one, v two, LinkedIn post for this client this client instead. It posts writer one and two, all with slightly different context and slightly different bits of logic baked in.
09:35They've all got similar descriptions, it doesn't know which one to use at any which time. Then when your brand voice shifts or when a client's positioning changes, you've got like 15 places to go and update and maintain it.
09:46So yes it's absolutely faster to build this way initially but it's a hell of a commitment to actually maintain properly and basically therefore impossible to scale across multiple clients without the whole thing turning into a bit of a mess. Now we've created personally in house in our own AgenTek OS a whole logic around how to tackle this, and we call this skill systems.
10:05So a skill shouldn't be just a one off task. A skill is a modular component that feeds into a skill system. So each one does one job.
10:12It lives in one place. It has a consistent named format and gets updated in one place and all the updates propagate to the rest of the system. So when you want to do something complex like write a LinkedIn post in your brand voice for a specific audience in a specific format, you don't create a write a LinkedIn post skill that bakes in all of these things.
10:31You actually have the voice, the ICP, the formatting already maintained as separate skills and then the LinkedIn post system just grabs the correct context, the up to date context from one single file for the voice, for the ICP, and the formatting. And then this skill or skill system prompt is effectively chaining those together in the right order.
10:49So when your brand voice does shift, you just have one file to update and then every skill system that uses that is gonna pull from that single file. So it's infinitely maintainable and scalable.
11:00So Hermes is faster to build the first skill but building your own approach is gonna be faster to build the tenth, the hundredth skill system that depends on the actual skill and infinitely easier to maintain. So it begs the question, should you build this for yourself or grab something off the shelf? Well, if you install someone else's stack, you've basically inherited their assumptions about identity, memory, about how their learning loop should work, about whether you'll need multi client context.
11:25And some of those assumptions will work for you, and they might work for you. And Hermes is great as an off the shelf comparison to something like OpenCLR, which was a lot more buggy. But some of those assumptions might not work for you and then you're left actually trying to maintain or fix the broken parts versus actually just building it more slowly for yourself and understanding the assumptions and making it more scalable.
11:45So if you are building it for yourself, you're making those choices on purpose. Yes. You will move You'll get some of it wrong but every layer is something you can see, you can edit and actually reuse.
11:54You can build it in that modular way. And when something does break, you'll have better knowledge of how to actually find the part that's broken and fix that so it's maintainable in the future. So that's effectively the trade off.
12:04It's gonna be faster to start with Hermes but faster to scale with your own built setup. And neither is gonna be the right answer for everyone. Right?
12:12It's just a personal choice. Now I'm definitely not saying my version of the Agenetic operating system or every custom version is better than Hermes in every way. Absolutely not.
12:20But I understand exactly what assumptions have been made under the hood and I can build on it in a modular way, in a slower way that's gonna end up being completely custom to my own setup. So if you want my exact Agentic OS, it's inside the AgenTek Academy in the description below. And it's basically installed in one line, get it up and running today.
12:38And we run through exactly what's inside the OS and all the logic so you're not just left installing something again without understanding the assumptions. You can plug and play the parts you like and leave out the stuff that doesn't work for you. Now if you want to see more around what we've got inside our agentic operating system, watch the next video.
12:55Thanks for watching.
The Hook

The bait, then the rug-pull.

Forty thousand GitHub stars in forty-six days. Before Simon Scrapes installed a single line of Hermes, he did something most people skip: he read through the issues. What he found convinced him to rebuild the parts he wanted instead — and the result turned out ridiculously good, not because it beats Hermes, but because he owns every layer of it.

Frameworks

Named ideas worth stealing.

01:05list

Three Hidden Costs of Off-the-Shelf Agentic OS

  1. Inherit assumptions you didn't know existed (self-validation problem)
  2. Can't fix what you don't understand (debugging someone else's code)
  3. Doesn't scale across your business (single-tenant architecture)

Structured argument for why OpenClaw/Hermes have fundamental architectural issues that only surface once you're committed.

Steal forAny build-vs-buy pitch, any 'why I left SaaS' content, any tool critique video
10:00model

Skill Systems (modular composition)

  1. Voice lives in one file
  2. ICP lives in one file
  3. Formatting lives in one file
  4. Skill system chains them together in the right order

Each skill is a modular component that feeds into a skill system. One update propagates everywhere. Contrasts with Hermes's auto-generated skills that accumulate as near-duplicates.

Steal forClaude Code skills architecture, JoeFlow orchestration, any reusable AI workflow design
06:25model

Memory Hierarchy (Hermes-compatible)

  1. Storage: auto-save + summarize every conversation
  2. Injection: memory.md capped at ~1,300-2,500 chars per session
  3. Short-term recall: injected context checked first
  4. Long-term recall: MemSearch (semantic) not keyword search

Keep what Hermes gets right (capped injection) and replace what it gets wrong (keyword-only long-term recall).

Steal forCustom Claude memory architecture, any persistent context system
CTA Breakdown

How they asked for the click.

12:09product
if you want my exact Agentic OS, it's inside the AgenTek Academy in the description below. And it's basically installed in one line, get it up and running today.

Soft sell, earns the right with a full teardown before pitching. No hard close. Immediately pivots to 'watch the next video' as a secondary CTA.

Storyboard

Visual structure at a glance.

hook — Hermes stars
hookhook — Hermes stars00:00
cost 1 — assumptions
valuecost 1 — assumptions01:05
cost 2 — CVEs
valuecost 2 — CVEs03:08
cost 3 — scaling
valuecost 3 — scaling03:47
identity layer
valueidentity layer05:08
memory system
valuememory system06:25
skill systems
valueskill systems08:23
CTA — AgenTek Academy
ctaCTA — AgenTek Academy12:09
Frame Gallery

Visual moments.