Modern Creator
Simon Scrapes · YouTube

I Built The Best Claude Memory System (Beats Hermes)

A 13-minute teardown of Claude Code memory and a fix assembled from the best pieces of Hermes, MemSearch, and GBrain.

Posted
today
Duration
Format
Tutorial
educational
Views
1.3K
106 likes
Big Idea

The argument in one line.

Claude Code built-in memory is leaky on storage, unbounded on injection, and blind on recall — the fix is to cherry-pick one mechanism from each of three open-source frameworks rather than adopt any single one wholesale.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • You are building on Claude Code and have lost context mid-project or had the agent silently discard a decision.
  • You manage multiple clients or projects in a single Claude environment and need recall that finds facts by meaning, not just exact keywords.
  • You want to understand what is actually happening under the hood of agentic memory before trusting a black-box solution.
  • You are comparing Hermes, MemSearch, or GBrain and want an honest per-job breakdown of what each does well.
SKIP IF…
  • You need a production-grade multi-tenant team memory system today as the shared-Supabase path described is still in development.
  • You are not using Claude Code as your coding harness as the walkthrough is Claude Code-specific even though the framework is portable.
TL;DR

The full version, fast.

Claude Code ships with memory that stores selectively (an agent decides what to keep), injects without any size cap, and has zero search capability for recall. MemSearch adds an automatic post-turn hook that summarizes and appends everything to a daily log. Hermes contributes a capped 1300-token frozen snapshot that loads once per session and gets cached. GBrain supplies a re-ranker and a citation-backed written answer instead of raw chunks. The result stores everything automatically, injects only what matters, and retrieves semantically with sources.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0001:02

01 · Intro: what perfect AI memory looks like

Four properties of an ideal system; problem statement that no single existing framework has all four.

01:0304:38

02 · The 3 Jobs of AI Memory

Storage, injection, and recall defined; decision variables within each job; Claude Code baseline graded decent/basic/weak.

04:3906:13

03 · Storage fix: MemSearch

Keep summarize step; replace agent-decided trigger with automatic post-turn hook; Haiku writes condensed bullets to a daily log.

06:1407:16

04 · Injection fix: Hermes

Frozen snapshot of identity + profile + recent memories, capped at 1300 tokens, cached per session.

07:1709:51

05 · Recall fix: MemSearch + GBrain

Vectors indexed locally at zero API cost; 3-tier cascade checks injected snapshot first; GBrain re-ranker plus written answer with citations on top.

09:5211:01

06 · Assembling the blocks

Full architecture diagram connecting all three layers; one-line install from their agentic OS.

11:0213:26

07 · Scaling for a team

Two paths: isolated-per-person vs shared Postgres/Supabase with row-level security.

Atomic Insights

Lines worth screenshotting.

  • Claude Code memory storage misses anything the agent does not flag, so anything routine at the time is silently lost.
  • Claude Code injection loads CLAUDE.md and the memory index on every session with no character cap, bloating the context window without your input.
  • Claude Code has no search at all for recall: if a fact was not saved to a memory file, the only fallback is resuming the exact session that contained it.
  • MemSearch auto-captures every turn with a cheap fast model, appends condensed bullets to a daily log, and has zero opinion about what matters.
  • Hermes caps its session snapshot at 1300 tokens and caches it, so you pay for those tokens once per session, not on every message.
  • MemSearch has no injection layer: it stores and searches well but never decides how context actually reaches the agent.
  • Hermes recall is keyword-only and cannot find a memory if you describe it differently from how it was originally stored.
  • A 3-tier recall cascade is cost-gated: most queries stop at tier 0 (already in context) or tier 1 (local file grep) and never touch the vector index.
  • GBrain re-ranker does a second pass over collected chunks to surface the most relevant ones first.
  • A written answer with citations is strictly more useful than a list of chunks: it tells you where the information came from and says when it does not know.
  • Building a custom framework from component parts means you understand every layer and can swap pieces out as better options appear.
  • The isolated-per-person team memory path is simple to build but produces no shared brain: each agent is ignorant of everyone else work.
Takeaway

Three-layer framework for AI memory that actually works.

WHAT TO LEARN

Claude Code ships with memory that leaks on storage, bloats on injection, and is blind on recall — and each weakness has a targeted fix from a different open-source framework.

  • Automatic storage beats agent-decided storage: anything the agent does not flag is silently lost, so an always-on post-turn hook that summarizes and appends to a daily log is strictly safer.
  • Inject a capped cached snapshot rather than an unbounded file dump: 1300 tokens of frozen identity, profile, and recent memories keeps context lean and charges you once per session.
  • Keyword search alone fails in multi-client or multi-project environments where you remember what happened but not the exact words: hybrid search finds memories by meaning.
  • Raw chunk retrieval is a noise problem: a re-ranker that does a second ordering pass before synthesis surfaces the most relevant results first.
  • Citing sources at retrieval is not a nice-to-have: a confident answer with no source is worse than no answer when the work is real and client-facing.
  • Building from components you understand beats adopting an opaque framework: when a better piece appears you can swap it out rather than waiting on an upstream maintainer.
  • The simple team memory path (one index per person) has zero shared brain: if your team needs agents to share knowledge across clients or projects, you need a shared store with row-level security from the start.
Glossary

Terms worth knowing.

MemSearch
Open-source Python library that auto-captures conversation turns after every message, summarizes them with a cheap model, and indexes the results as local vectors for hybrid search. It has no injection layer.
Hermes
Open-source memory framework for Claude Code that provides a capped (~1,300-token), cached frozen-snapshot injection at session start. Its recall is keyword-only.
GBrain
Memory framework built by Gary Tan (YC) that adds a re-ranker to reorder retrieved chunks by relevance and returns a written answer with file-level citations rather than raw chunks.
Frozen snapshot
A fixed bundle of memory files (identity, profile, recent memories) loaded once at session start, capped to a token budget, and cached so the cost is paid only once per session.
Hybrid search
A retrieval method combining keyword search (exact token matching) with semantic vector search (meaning-based similarity) so queries find memories regardless of the exact words used.
Re-ranker
A second model pass that reorders a list of retrieved chunks by relevance before synthesis, reducing the noise the final model has to reason through.
Row-level security (RLS)
A Postgres/Supabase feature filtering every query by authenticated user identity, allowing agents to share a single store without accessing each other scoped data.
Vector index
An index storing text chunks as numeric embeddings so semantic similarity search can be run locally at zero API cost.
Resources

Things they pointed at.

00:12toolHermes
00:12toolGBrain
00:12toolMemSearch
11:02toolSupabase
Quotables

Lines you could clip.

09:14
A confident answer with no source is actually worse than useless when you are running real work for a client.
Standalone, punchy, applies universally to AI trustTikTok hook↗ Tweet quote
04:14
Storage is decent. The injection is pretty basic, and the recall is quite frankly quite weak.
Clean verdict, no setup neededIG reel cold open↗ Tweet quote
10:17
Storing everything, injecting just what matters, and able to recall semantically by meaning with the sources too.
Perfect summary line at the pivot pointNewsletter pull-quote↗ Tweet quote
The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

00:00Let me show you what perfect memory looks like for an agentic system. It remembers a decision you made six months ago and finds it even when you can't remember the exact words you use. It loads the right context automatically so you never start from zero.
00:13And when it gives you an answer, it can tell you exactly where that answer came from. And even when it doesn't know, it says so instead of confidently making something up. Now imagine that scaling across a team of people using it.
00:24Now the problem today is that no single system does all of that really well. After digging through Hermes, gbrain, memsearch, and a bunch of other memory architectures, I started to realize something.
00:35They each solve a different part of the problem. So instead of picking one, I pulled the best ideas from each and rebuilt them in Cyclore code. And honestly, it's better than any one system on its own.
00:46And more importantly, I understand every layer underneath it, which means I can swap pieces out, extend it when I need it, and keep it portable as things change.
00:56So in this video, I'll show you exactly how it works, the ideas I borrowed from each system and why, and how to copy this up for yourself. So let's get straight into it. And, firstly, I wanted to explain why not just use ClawCode out of the box because ClawCode does already have a memory system, and it does three jobs.
01:13So the three jobs are storage, injection, and recall.
01:17So storing a fact when you want it to or when you ask it to, injecting the right context when you open a session, and recalling something old when you ask about it. And for each of those, there's more than one way that you could actually do it. And I worked this out by going through around 20 different memory frameworks, and you can see parts of that in my previous videos.
01:36But the point is the same variables kept coming up. So take storage for example.
01:41You've got two separate decisions to make. Firstly, who triggers the save? Is it a hook firing automatically, or is it gonna be the agent deciding what's worth keeping from what you told it?
01:52Secondly, what form does it save in? Is it gonna be verbatim, so word for word, or summarized by an agent first, and which is better for your use case?
02:00So verbatim keeps everything, but it's bulky in terms of storage. And summarize is obviously more lean, but the agent decides what survives, so you might lose important context. Then separately, injection has its own variables.
02:11So you can either have a hook that loads the right files every single session, so it's guaranteed to push into the memory, or you can add that context to something that the agent decides to pull like the claw dot m d. And you're obviously waiting on a judgment call from that agent to decide that too.
02:25And underneath that also, do we have a cap on the number of characters that are injected? So cap's gonna keep the context not bloated, but we might lose some context with that cap. And then recall comes down to how you search.
02:36So keyword search finds exact words, but semantic search is gonna help us actually find words by meaning. And then you've got hybrid, which does both. So those are the jobs and the choices you've got to make when picking out a memory framework for your use case.
02:49And here's what ClawCode actually picks for each of those. So for storage, it's agent decided and summarized.
02:56So the agent quietly noticed things worth keeping and writes a condensed version to the disk. And the problem with that is it's super selective.
03:04If the agent doesn't flag something, then it never gets saved at all. So this storage is okay for this, but it is a bit leaky.
03:12Now injection on the other hand is gonna load in the Claude dot m d by default and the memory index at every session start. So the memory index is a file that points to specific memory files that, again, an agent has decided to save over your conversation history in that repository. So it's loaded in using a hook, which means it always happens no matter what, but it's completely unbounded.
03:34There's no cap on characters, and therefore, there's nothing keeping it lean. So the injection measures inside ClogCode out of the box are pretty basic.
03:42And then for recall, I'd say this is the worst of the three out the box. There's basically no search at all. So there's no keyword search, no semantic search.
03:51And if a fact wasn't saved as a memory in a memory file that's referenced by the memory dot m d index file, then it's completely gone. And your only fallback is to basically go and find the session that you're in and resume that session specifically to basically take the existing context of that conversation. So if that context has already been compressed, you might have even lost that context already.
04:10So storage is decent. The injection is pretty basic, and the recall is quite frankly quite weak.
04:17But when you zoom out holistically at why we need these, like, all three of these storage, injection, and recall are required to improve the quality of your outputs. And that's why open source frameworks have focused so heavily on memory and injection of that context. So the plan is actually super simple.
04:31For each of these jobs, we keep the part of that actually works, and we layer on add ons to enhance it. So let's start with storage. So if you remember, ClawCode storage choice was agent decided and then summarized.
04:46And the summarized part is actually fine. It's gonna keep things clean in our memory storage. But the agent decided part is where it's arguably the weakest.
04:53So the fix is to keep the summarizing but add an automatic hook so nothing depends on the agent actually noticing and therefore deciding what to keep.
05:02And that's exactly why we've chosen mem search for this part of the framework. So after every single turn, a hook fires. It's gonna summarize what says with a cheap fast model like Haiku and appends it to a daily log.
05:15So it doesn't have an opinion about what matters. It's basically just gonna capture all of the information and then turn it into a condensed format and store in a daily memory file. So why did we choose MemSearch for this and not the others?
05:27Well, Hermes also auto captures, but it actually captures the raw transcript, which is bulkier than we probably need. But it's just a preference at the end of the day.
05:35And g brain two goes further again. It builds out a whole knowledge graph, pulling out people and companies and linking them, which sounds really clever, but it's probably a little bit overkill for just a business running a handful of clients. It's just too much.
05:47Whereas mem search hits the exact combination that we wanted. So it's automatic and it's summarized. So for storage, we're basically keeping Claude's auto memory and adding the mem search capture hook on top of it.
05:58Nice and simple. Now how do we get that information that we stored back into every conversation or the most important parts of it. But before we discuss that, if you're getting value from this video, do me a massive favor and hit the subscribe button below.
06:09It genuinely helps my channel. Claude code's injection choice was hook loaded, so always in, but not capped at character count.
06:17So Hermes handles this in a really smart way. So at the session start, it actually takes a frozen snapshot of pertinent and specific memory files.
06:25So think your identity, your user profile, and your most important recent memories that it saved into the memory. So it's gonna be capped around 1,300 tokens, so it's not gonna bloat the context, and it's cached.
06:36So you're only gonna pay for these tokens once per session. It's not gonna reinject them into every bit of conversation. So why then did we choose Hermes for injection and not MemSearch?
06:45Because MemSearch actually, believe it or not, doesn't have an injection layer at all. It's just a storage and search library. So it stores it really well, but it never decides on how that memory reaches the agent.
06:55So we have to employ something like Hermes to make sure that actually a snapshot is taken and then injected into the context for great short term recall. Now we took the frozen snapshot from Hermes, but actually we didn't take the rest of Hermes because its recall is keyword only.
07:10Now just to remind you, Claude Codes recall choice was no search at all. So you'd have to go back and resume the specific session. So this is the biggest upgrade to the system around memory, and it's where two frameworks are actually combined.
07:24This is where we can add the most value in terms of Claude code out the box. So the base comes from mem search. So everything we stored at the start in the storage stage gets indexed as vectors on your machine.
07:34And because it's on your machine, there's zero API cost. And then the search is completely hybrid. So we're able to do semantic search by meaning and also by keywords.
07:44But the way that this works is actually a multi tier system. So what it's gonna do is check the injected memory first. So it's not gonna go and search deeper if we have the information already in that frozen snapshot.
07:55It will stop at tier zero. But if it needs to recall information at a deeper level, it will basically go through a three tier system delving deeper until eventually it does this hybrid keyword and semantic search to pull back all the full context that's been summarized by MemSearch in the first place. So again, why do we choose MemSearch over Hermes for recall?
08:13Well, is actually the flip side of the injection choice. So Hermes recall is keyword only, but MemSearch allows us to actually layer on top that semantic or by meaning research. So it's gonna find your conversation no matter which word you use, which is really important if you're handling multiple clients or lots of different projects.
08:30Now why did we actually combine that with some of the elements of g brain? We combined it because basically MemSearch returns you chunks back.
08:37So it's like 10 bits of text that potentially match what you've asked it for. But you still have to read them and work out the answer or the agent does. So we've basically taken an element from g brain, and this is a framework that Gary Tan from y Combinator built.
08:51It basically does two things on top of this same hybrid search. First, it's gonna use a re ranker, which does a second pass over all of the collected chunks and reorder the matches so the best ones are more likely to come first.
09:04And secondly, and this is the one I actually care about, it doesn't return chunks. It returns a written answer with citations. So it's not just throwing back a bunch of information that may be relevant.
09:14It's actually gonna return that written answer with a bunch of citations where it's taking that information from. So most importantly, that's gonna tell you which file or conversation it came from. And if the information isn't there, it's actually gonna tell you it isn't there.
09:26And remember at the start when I said a good memory system should admit what it doesn't know? Well, that is exactly this. So a confident answer with no source is actually worse than useless when you're running real work for a client.
09:39So for our recall tactic then, we've decided to combine both. MemSearch for that hybrid search that finds by meaning and then the g brain approach on top for the reranking and that really well cited source answer. So when you pull everything together then, it looks a little bit like this when you consider storage, injection, and recall.
09:59So it's a combination of the best practices of multiple frameworks loaded into a custom built framework. But you understand exactly what's under the hood because you've cherry picked the best parts of both. You've decided which parts to use based on the variables that you can choose from, and you're storing everything, injecting just what matters, and able to recall semantically by meaning with the sources too.
10:21So everything you've seen so far, we've already built into our agentic operating system, and it comes shipped out of the box. So you get going from day one with a one line install. If you want that, it's down in the description below.
10:31So there's something critical that we haven't mentioned yet that we're building into our framework in June, and it relies on the fact that mostly you want to use memory like you do with shared folders. You often work collaboratively with others.
10:44So you need other team members to access certain parts of your memory and then the other parts to be completely private to you personally or to each individual user that's got it installed on their machine. Now because we've built everything as these separate blocks that plug and play together and not one locked framework, the same setup can be used to effectively scale to a whole team.
11:04But building it for a team is where things do get harder. So memory has to be shared where it should be and private where it shouldn't. So it's split across clients, departments, projects, etcetera.
11:14And the g brain was actually the inspiration for the architecture behind this. It solves it with what Gary Tang calls a company brain. So one central store that each person gets their own individual slice scoped by their login access token.
11:28So person a, for example, can only access the clients they're actually meant to access. Person b might have more access, and this is all scoped by row level security and the token that we give to that individual user.
11:41There are effectively two ways to do this, and I covered the full team setup in a separate video that I'll link at the end. But here's the short version. The simple way, which is gonna suit some teams, is one memory index, one database per person.
11:55So it's built only from the files they're allowed to sync. So person a builds a memory file from the client files that they're allowed to access. Person b, the same.
12:04Person c, the same. But this comes at the expense of actually then you don't have a shared brain at all. You just have isolated memories between these individuals.
12:13And that's what we've described throughout this video. And then the scalable way is one shared store, something like a Postgres database held on Superbase, for example, with row level security. So every memory row is tagged by client or by project or by department, and every query then is filtered by who's asking.
12:31So every query person a makes, we're able to restrict access in that memory. So only those permitted actually get access.
12:38So this, as you can imagine, is significantly harder to set up, but you get one shared brain across the whole team, and that's the one that we're implementing personally. And I'd highly recommend if you work across multiple users to do that too. So quick recap.
12:52I pulled the best ideas from each memory framework and rebuilt them inside ClawCode. Now why did I do that? Because no single framework had a comprehensive answer to the three jobs of storage, injection, and recall.
13:04So we've just taken the best of each framework and pulled it into one comprehensive system that's plug and play. You can use it with claw code, codecs, and any other harness. So next video, I'm walking you through the full team operating system that this memory system plugs into.
13:18And if you want to just get this all out the box with setup tutorials, then check out the academy link in the description below. See you in the next one.
The Hook

The bait, then the rug-pull.

Perfect memory for an AI agent should find a decision you made six months ago even when you cannot remember the exact words, load the right context without being asked, cite every answer, and admit when it does not know. No single framework does all four so this one steals from three.

Frameworks

Named ideas worth stealing.

01:03model

Storage / Injection / Recall matrix

The three jobs of AI memory, each with two decision variables. Used throughout to evaluate and compare frameworks.

Steal forEvaluating any memory architecture or explaining agentic memory to a non-technical audience
07:17model

3-Tier recall cascade

  1. Tier 0: check frozen snapshot (zero cost)
  2. Tier 1: grep context/memory/transcripts (zero LLM cost)
  3. Tier 2: MemSearch hybrid vector search
  4. Tier 3: full semantic plus re-ranker plus cited answer

Cost-gated recall that stops as soon as a tier satisfies the query.

Steal forAny RAG pipeline where you want to minimize API calls on shallow queries
08:35concept

GBrain re-rank plus citation pattern

Second pass over chunks to reorder by relevance, then synthesize a written answer with file-level citations.

Steal forAny retrieval system where the consumer needs to trust or audit the answer
CTA Breakdown

How they asked for the click.

VERBAL ASK
10:17product
If you want that, check out the academy link in the description below.

Soft mid-video subscribe ask at 6:14, product CTA at 10:17 and 13:22. Restrained and credible.

MENTIONED ON CAMERA
11:02toolSupabase
FROM THE DESCRIPTION
PRIMARY CTAWhere the creator wants you to go next.
OTHER LINKSAlso linked in the description.
Storyboard

Visual structure at a glance.

hook
hookhook00:00
3 jobs intro
promise3 jobs intro01:03
STORE diagram
valueSTORE diagram02:30
INJECT diagram
valueINJECT diagram03:20
MemSearch storage flow
valueMemSearch storage flow04:39
RECALL 3-tier cascade
valueRECALL 3-tier cascade07:17
full architecture diagram
valuefull architecture diagram09:52
CTA
ctaCTA13:06
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this