Big Idea

The argument in one line.

Claude Code's native memory system is fundamentally weak at retrieval, but combining MemSearch's automatic full-transcript capture with Hermes's curated frozen-snapshot injection and a four-tier progressive recall chain creates a hybrid that beats all three individual systems.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…

You're building Claude agents or workflows and currently lose context between sessions or across long projects.
A developer using Claude Code who wants a practical hybrid memory setup without building from scratch or waiting for Anthropic updates.
You've experimented with single memory approaches (MemSearch, Hermes, or default Claude memory) and hit limitations you need to work around.

SKIP IF…

You're using Claude through the web interface for one-off conversations — this is built for persistent agent architectures and multi-session workflows.
You haven't integrated Claude into a codebase or custom system yet — the setup assumes developer-level comfort with prompt injection and context management.

TL;DR

The full version, fast.

Claude Code's built-in memory lags behind open-source agentic systems because storage is selective, injection is limited to CLAUDE.md, and recall has no real mechanism beyond scanning past sessions. The fix is a hybrid built around three jobs every memory system must answer: how information is stored, how it gets injected into context at session start, and how it gets recalled later. Combine MemSearch's stop-hook to capture every turn into a locally-vectorized database with Hermes's curated memory.md, user.md, and soul.md files that get injected as a cached frozen snapshot each session. For recall, check that in-context snapshot first, then escalate through semantic vector search, expanded chunks, and raw transcripts only when needed.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →

Chapters

Where the time goes.

00:00 – 03:20

01 · Cold Open -- 3 Questions Every Memory System Must Answer

Frames the whole video around Store, Inject, and Recall. Introduces MemSearch and Hermes as the two strongest open-source challengers. Sets expectation: this is not about more context -- it is about the right context.

03:20 – 10:05

02 · STORE -- How the Three Systems Capture Information

Side-by-side comparison. Claude Code: auto-memory, sparse, promotes to global after 3+ repeats. MemSearch: stop hook after every turn, Haiku bullets, Milvus vector DB (local CPU, zero cost). Hermes: agent-driven add/replace/remove, MEMORY.md (2200 char) + USER.md (3375 char) + SQLite raw transcript + 7-day curator.

10:05 – 13:08

03 · INJECT -- How Memory Reaches the Agent at Session Start

Claude Code loads CLAUDE.md + conditional memory file injection via pre-tool-use hook. MemSearch has NO injection layer. Hermes loads a frozen snapshot of SOUL.md + USER.md + MEMORY.md (~1300 tokens, prefix-cached) once per session.

13:08 – 19:04

04 · RECALL -- How the Agent Retrieves Past Information

Claude Code: checks auto-memory files, if not saved it is lost -- no search, no grep, no vectors. MemSearch: 3-tier retrieval (L1 hybrid vector+keyword search, L2 expand chunk context, L3 raw session transcript). Hermes: Tier-0 in-context MEMORY.md check, FTSS keyword query, Gemini Flash summarisation of top 3 sessions.

19:04 – 23:12

05 · Recommended Hybrid Setup -- Taking the Best of All Three

Store: auto-memory + MemSearch stop hook + agent writes MEMORY.md/USER.md + nightly memsearch index cron. Inject: Hermes frozen snapshot (~3000 tokens cached). Recall: Tier-0 in-context, L1 MemSearch hybrid, L2 expand, L3 raw transcript. Free plan.md available.

Atomic Insights

Lines worth screenshotting.

Claude Code's built-in memory barely saves anything — it only captures what the model explicitly decides is worth remembering, which is a fraction of what actually matters.
Every agentic memory system comes down to three questions: how does information get stored, how does it get injected into the next session, and how does it get recalled on demand.
MemSearch captures every turn of every conversation automatically using a stop hook, vectorizes the results locally at zero API cost, and makes them semantically searchable by meaning rather than keyword.
Hermes takes the opposite approach — the agent decides what to save, enforces character limits that force consolidation, and purges duplicates so the curated memory stays lean and injectable.
Injecting a frozen snapshot of memory.md, user.md, and soul.md at session start costs about 1,300 cached tokens and produces dramatically better recall than relying on the default CLAUDE.md alone.
The fatal flaw of strong storage with no recall system is that information you saved is effectively lost — if you can't surface the right thing at the right time, storage is theater.
MemSearch's three-tier progressive recall — vector search, expand with metadata, then raw transcript — only goes as deep as it needs to, which keeps routine queries fast.
Tier-zero recall that checks what's already in the session's context before hitting the database is the most overlooked optimization — the cheapest search is the one you don't do.
Combining MemSearch's automated capture with Hermes's curated injection produces a system that never loses anything and never bloats the active context window.
A nightly consolidation job that re-indexes all raw transcripts into the vector database means the system compounds in depth every day without manual intervention.
The reason Claude Code's memory lags behind open source solutions is structural — the built-in system was designed for single-session completeness, not multi-project longitudinal recall.
Client work from six months ago being instantly retrievable by semantic search is the difference between an AI assistant and an AI that actually compounds knowledge over time.
Markdown as the source of truth for memory is the right design — it means the vector database is rebuildable from scratch if it gets corrupted, because the raw files are always the ground state.
Hermes's curator runs every seven days and prunes raw transcripts down to consolidated facts — which keeps the injectable memory small enough to be useful without being lossy.
Handing Claude Code a plan.md that describes the entire hybrid memory setup and asking it to implement itself is the correct workflow — building memory infrastructure by hand defeats the purpose.

Takeaway

Steal the three-verb framework.

Memory architecture playbook

Every memory decision in any agent system maps to just three questions: Store, Inject, Recall -- and Claude Code out of the box fails at two of them.

Add the MemSearch stop hook today -- it captures everything your auto-memory misses with zero extra cost (Haiku is cheap).
Split your CLAUDE.md into SOUL.md / USER.md / MEMORY.md right now -- the frozen snapshot injection pattern gives you Hermes recall quality for free.
Build the Tier-0 check: before any vector DB query, check what is already in context -- instant and free.
The free plan.md at scrapeshq.notion.site/claude-memory-systems is a paste-in blueprint -- hand it to Claude Code and let it self-install.
Frame any AI memory content you create around Store / Inject / Recall -- it is the clearest mental model for this category and Joe could own it in the creator space.

Glossary

Terms worth knowing.

Agentic memory system: A framework that lets an AI agent save, retrieve, and reuse information across separate conversations, so it can remember facts, decisions, and context instead of starting fresh each session.
Claude Code: Anthropic's command-line coding assistant that runs Claude inside a terminal session, with hooks, slash commands, and project-level configuration files for customizing how the model behaves.
MemSearch: An open-source memory layer for Claude Code that captures every conversation turn to markdown, then indexes the content into a local vector database for semantic search and progressive recall.
Hermes agent: An open-source Claude Code memory setup that uses agent-curated memory files plus a periodic curator job to consolidate facts, then injects a frozen snapshot of those files at the start of each session.
System prompt: The hidden instruction block sent to a language model before the user's message, used to set persistent rules, persona, and reference material the model should treat as always-on context.
CLAUDE.md: A markdown file Claude Code automatically loads into its system prompt at session start, used to store project-specific instructions, conventions, and context that should persist across conversations.
Context window: The total amount of text a language model can hold in active memory during a single conversation, measured in tokens, beyond which older content gets dropped or compressed.
Token: The basic unit a language model uses to count text, roughly equal to a short word or word fragment, which determines both context-window limits and API pricing.
Stop hook: A Claude Code hook that automatically fires when a conversation turn finishes, letting the user run scripts to log, summarize, or process the exchange without manual intervention.
Pre-tool-use hook: A Claude Code hook that runs immediately before the agent calls a tool, often used to inject extra context, check permissions, or load relevant memory files into the conversation.
Haiku: Anthropic's smallest and cheapest Claude model, fast enough to run on every conversation turn for lightweight jobs like summarization, classification, or short bullet extraction.
Gemini Flash: Google's smaller, low-latency Gemini model, used here as a cheap summarizer that condenses retrieved session transcripts into a short answer before passing them back to the main agent.
Vector database: A storage system that holds text as numerical embeddings so content can be searched by meaning rather than exact wording, enabling semantic recall of related information.
Milvus: An open-source vector database that stores embeddings locally and supports fast similarity search, used here to run MemSearch's semantic memory layer on the user's own machine with no API costs.
Embedding: A numerical representation of a piece of text that captures its meaning, allowing similar concepts to be found through math even when they share no exact words.
Chunk: A small slice of a larger document, sized so a model or search index can process it independently, typically used as the unit that gets embedded and stored in a vector database.
Hash: A short fixed-length string generated from a piece of content, used as a unique identifier so the system can detect duplicates and link chunks back to their source.
Semantic search: Retrieval that matches by meaning rather than exact keywords, so a query about pricing can surface notes that mention revenue or monetization even when the word pricing never appears.
BM25: A classic ranking algorithm that scores documents by keyword overlap and frequency, often paired with vector search in hybrid retrieval systems to balance exact matching with semantic similarity.
Hybrid search: A retrieval approach that runs keyword matching and semantic vector search together, then merges the results so the system catches both literal term hits and conceptually related content.

Resources

Things they pointed at.

01:28toolMemSearch

01:36toolHermes

01:15linkskool.com/scrapes ↗

22:27linkClaude Memory Systems plan.md ↗

Quotables

Lines you could clip.

10:08

“It is not about loading more context in. It is about loading the right context at the right time only.”

Clean standalone principle, no setup needed, directly challenges the default instinct to stuff context windows→ TikTok hook↗ Tweet quote

13:22

“If you can store as much information as you want, but if you cannot get it out at the right time, then it is not worth having a good storage mechanism in the first place.”

Punchy inversion -- storage without recall is worthless.→ IG reel cold open↗ Tweet quote

08:58

“MemSearch and Hermes go 10 x further than the basic claw code out the box.”

Tight quantified claim, validates the upgrade journey→ newsletter pull-quote↗ Tweet quote

22:12

“Right now it is far, far behind what you can get from systems that are currently open source and free to access.”

Closing contrast -- Anthropic vs open source. Strong CTA setup.→ IG reel cold open↗ Tweet quote

The Script

Word for word.

Read-along

Don't just watch it. Burn it in.

See every word as it's spoken — crank it to 2× and still catch all of it. The same dual-channel trick behind Amazon's Kindle + Audible.

00:00Right now, ClawCode's memory system is still way behind a lot of what the open source community has already figured out. So in a recent video, I broke down these seven levels of clawed memory systems. And whilst researching that, I ended up digging through some really advanced setups that people are building right now.

00:16Setups like the Hermes agent, MEMSearch, and a bunch of others. And to my own surprise, a lot of these systems looked incredibly advanced, but the core ideas underneath them are actually very simple to replicate. So underneath all the complexity, it always comes down to just two questions.

00:32When and how does information get written to memory, and when and how does it get retrieved again? So in this video, I'm gonna show you what ClawCode's memory looks like today, what the newest systems are actually doing differently, and then the setup I'd actually recommend if you want ClawCode to stop forgetting things.

00:49And one thing upfront, this isn't about loading more context into ClawCode. It's about keeping context lean, only retrieving the right information when it's actually needed.

00:59So let's get into it, and we can start off by talking about the three questions that every memory system has got to answer. So firstly, it's all about storage. How does information actually get saved and at what point?

01:11So what happens when somebody says something to Claude that's worth remembering? How does that actually get stored in the system? So you might say our landing page is school.com/scrapes, and you want Claude to always remember that information.

01:25So in some way, we want the agent to actually go away and save that, and we want that to be consistent and reliable. Or a decision like we're using Stripe, not PayPal, same thing. You want that to be saved into the memory and then retrieved at a later date.

01:39So we wanna understand how this information gets saved with all these different memory systems. Then we wanna understand how information gets injected. So you're probably familiar that the claude.md file gets injected into the system prompt whenever we prompt Claude, so it's injected every single time.

01:53So how do we actually take important context of recent memory and push it to the agent during our conversation so that next time you do start a session, you can open Claude code and the memory of the most recent or most important information is loaded in automatically. But it's only a snippet of that information.

02:11It's not tens of thousands of tokens. We have a small curated always there set of memory that's pushed in.

02:17So that, for example, Claude already knows your landing page URL or already knows your Stripe decision because we made that, and that's an important decision. So we've got storage and injection. But then more importantly for long term memories, how do we actually go and find and recover past information that we've told it?

02:33Information that we told it about client x six months ago. That's the information that we need to be able to recall. And this could be as recent as last week or it could be, you know, several years ago or months ago.

02:43So we might ask, what did we decide about pricing last Tuesday? And it might have a step by step process of let's check what's been loaded in the injection phase. If not, let's go deeper.

02:53And if not, let's go even deeper. And we need a framework to actually store and retrieve that information from the long term memory.

03:00So how does it store? How does it inject? And how does it recall?

03:04So these are the three themes that we're gonna follow through this video and talk about the different systems like ClawCode out the box, Hermes, and MemSearch, which are two of the best systems that I've found on the market. They often take completely different approaches.

03:17So let's get into the first section, which is all about storage. So when you have a conversation with Claude, it's actually auto detecting certain things you say in the background and writing them silently to dot m d files.

03:29These are stored at a per project level in the global space. So we've got the dot Claude project slash projects, and then we're storing memory folders back there. We then have a memory dot m d index, which is updated with all the files for which it can point to.

03:43So when you have a conversation in the future with Claude, it can always reference those files. Now this is on a per project basis, but if you repeat things multiple times and you have certain things, certain preferences that are done three or more times, then it gets promoted to a global dot Claude slash memory folder. And you can actually see this if you go directly into your Claude code terminal and do slash memory.

04:05It will say, do you want to look at your user memory, which is saved in the claw dot m d? Do wanna look at the project memory, which is also m d? Or do you want to open the auto memory folder?

04:14So if you open that auto memory folder, then you can actually go and see all of the files and the index of files that that's created, and you can see that those actually point to each other. So these are happening automatically in the background, and I wouldn't say they're very comprehensive.

04:29It's kind of mostly if you're telling it this is a really important thing, but, otherwise, it's not really gonna store a huge amount of information. Now let's look at what the open source community has figured out around this. So how do they store and capture information as you go through?

04:43So MemSearch uses a Claude code stop hook. So it's gonna fire after every turn, not just the memory worthy turns.

04:50So it's gonna call Haiku, which is gonna summarize each turn into bullets. And it uses Haiku because it's a cheap, fast model, and it's doing it all the time.

04:59It's gonna append that data to a memory slash date file with session anchors. So, you know, when you close a session and you have a specific session ID, it's gonna append that or the notes from that session to a specific memory file.

05:14So it's storing literally everything. It then periodically runs MEMS search index, or you can run this manually. Each bit of information gets chunked into a hash.

05:22Now the reason it's converting that information into a hash is because it can then embed those chunks and turn them into vectors. Those vectors are then stored in a MILVUS vector database, and it's all done locally on your CPU.

05:37So there's zero API cost. And what this actually means for you, it's not very relevant in terms of what it's being stored as. It's being stored as vectors, so literally a sequence of numbers.

05:47But what it does is store really effectively a meaning and a bunch of metadata associated with that specific memory. This is great for the retrieval stage later because it means we can actually retrieve information by meaning instead of just by keyword search.

06:01So not only do we have the markdown files, everything is also indexed and vectorized and put into a database in the back end automatically for us. That is absolutely critical for the retrieval stage later.

06:13And what's great about this is it basically treats markdown as the source of truth. So everything is appended as markdown, and then everything else is rebuildable later from those markdown files. So if you lost this database, you could actually rebuild it from all the memories that have been appended to that date.

06:29And the other good thing about it or good and bad, you could say, is it captures everything. So it's not just what auto memory from ClawCode thinks is the most relevant thing. It's actually gonna capture absolutely everything.

06:40Now you might wonder, is that overkill? Well, we can come to what Hermes does in a minute and decide for yourself whether that is overkill because Hermes actually takes a completely different approach. And it's closer to what ClawCode is doing out the box because actually the agent is deciding what to save.

06:57The agent has access to a couple of tools inside Hermes, so add, replace, or remove. And what it's doing is adding those to a memory dot m d file and a user dot m d file. So similar to what you've seen probably in OpenCLORE or if you've set up your own Agencik OS, you might have a memory dot m d and a user dot m d, But this isn't the same as Claude's memory dot m d.

07:16This is a memory dot m d with a cap on the number of characters that retains the most important information, and we'll talk about how it does that. So memory dot m d stores environment information, things you've done, and then user dot m d is all about user profile.

07:31So anything you say about the way you work or the way that you want to operate, user dot m d stores. It also has mechanisms in there for deduplicating. So whenever the agent thinks it's gonna add, replace, or remove something important, it will also check for duplicates and make sure that it's not writing duplicate information to our valuable memory space.

07:51Now all of these are kind of useless unless the information gets injected at some point, which we'll talk about next. But the important thing to know is these caps on characters enforce consolidation.

08:01So where MemSearch captures absolutely everything, the point in the Hermes memory logic is that it enforces that consolidation for when it injects that context later on.

08:10But in some ways, it is very similar to MemSearch because every turn, it also auto saves the complete raw transcript to a database in the background. And it uses a curator. So every seven days, it goes through and prunes and consolidates all of the information that we've just talked about.

08:27So the curator's job is to keep everything clean. What it does is remove the raw transcripts from that information. So whilst MemSearch stores exact raw transcripts, Hermes actually consolidates and prunes that information.

08:39So they're actually both excellent, especially when you compare it to claw code. And if you look in your own memory dot m d with the auto memory, it barely saves a thing.

08:48So MemSearch and Hermes go 10 x further than the basic claw code out the box. So which one would I actually recommend that you use in this approach? Well, MemSearch captures everything automatically with that stop hook, but it's raw and uncurated.

09:03Hermes is gonna capture our curated facts, especially those that are gonna be put into memory dot m d and user dot m d, which is lean and intentionally lean. But if the agent doesn't think to save something, it's kinda like with our Claude auto memory, it's still actually grabbing the full transcript and saving it into something that we can retrieve from a database at a later point.

09:25So my answer to which one should you actually use, I actually think we should combine the logic of both here. We should use automatic capture for completeness and then curated facts for what matters most because this is really important for the injection of the context phase. So take the best of both and combine it so we've got a long term search from this embedded vector database that we can search by meaning, but also the power of choosing specific information to store in the memory dot m d and user dot m d.

09:53So now that we come to the injection phase, we can actually push that information into our context without having to search through a load of raw uncurated transcripts in the background.

10:03So memory injection into the context window is quite misunderstood. It's not about loading more context in. Like we always talk about, it's loading the right context at the right time only.

10:16So the default behavior of Claude code is when you start a session, you inject the full Claude dot m d, and that's why we wanna keep the Claude dot m d ideally under 200 lines. That goes in with the system prompt.

10:29And then before you use a tool or before Claude uses a tool, there is actually a pre tool use hook which grabs the memory dot m d index, looks through those list of memory files that were stored earlier, and decides does it need based on your your query to actually go and research one of those memory files and inject that into the context too.

10:50If it does, it will inject that in as additional context inside the conversation. So this is a pretty decent starting point, but actually we can learn a lot from the way Hermes does this. We already saw that it captured a user dot m d and memory dot m d file with more information that's periodically updated and consolidated.

11:10We can actually inject those into the context window. But first let's quickly cover memsearch because it might surprise you here but memsearch actually has no injection layer at all. It just relies on the default behavior of Clog code injection the Clog.

11:24Md and the memory. Md. MemSearch is really built for the recall which we'll come to.

11:29So think of MemSearch as storage and search basically, a storage and search library that massively improves long term recall. Whereas Hermes I think nails this. So at the session start it basically loads a frozen snapshot similar to the way that Claude uses Claude.

11:45Md but it will not only use the Claude. Md, it will additionally add in the memory. Md, the user dot m d, and soul dot md every single time.

11:54And that comes to around 1,300 tokens that are put into every single conversation window. Now this is per session because it's a frozen snapshot, so it gets cached in the memory.

12:04So you don't spend 1,300 tokens every time you send a message. It's just at the start of a session conversation.

12:11The session ID will have that context save. So anything that's saved to memory dot m d, user dot m d, sold dot m d during the session will be written to the disc in the background and will not be loaded into that conversation, but will be loaded into the next conversation. So it's a really obvious choice for what logic we'd like to use for the actual injection layer and that's let's use ClawCode's behavior plus Hermes actual frozen snapshot to load in the memory dot m d, user dot m d, and sol dot m d, which as we saw in the storage stage consolidates recently biased and most important information inside these three folders or these three markdown files.

12:51Now, yes, you are loading in 1,300 tokens every single session, but compared to the huge context windows, the increased performance you're gonna get from recent consolidated memories, in my opinion, is worth it.

13:04Now this is where stuff gets really interesting in recall because this is probably the biggest gap that ClaudeCode has out the box. Most of the time, we're not working just on a task by task basis with ClaudeCode.

13:17We have a bunch of clients. We have a bunch of projects on the go. And actually storing that information is critical.

13:23But recall is the most important thing. If you can store as much information as you want, but if you can't get it out at the right time, then it's not worth having a good storage mechanism in the first place. And ClawCode out the box has a really poor, dare I say it, recall system.

13:37So basically it's user asked about the past, some question about the past, It's gonna check the auto memory files which we've already seen. And if it's not been saved in there, it's completely lost.

13:47You might have opened the memory files that you had from earlier inside your project repository. It really is quite selective about what it saves. You probably don't have a huge amount of information stored there.

13:58So actually recalling past conversations and information is gonna have to just go and trawl through previous conversations you've had and actually burn through a load of tokens trying to find relevant information, and it has no methodology for doing so right now. Now you can, of course, use the resume flag to actually resume a previous conversation, but you have to know which session you actually wanna resume to get that context back.

14:19So for ClawCode, the storage of information is okay. The injection is basic with just the ClawDot MD, but the recall is actually really weak and where we can benefit most from external systems. So how does that compare to MemSearch if a user were to ask about something from the past week, the past month, the past six months?

14:37Well, MemSearch has a really powerful three tier retrieval system that basically only goes deeper if it needs to. It works on the same principles of progressive disclosure. So user asks a question about the past and we're gonna use the MEMS search search query.

14:52It's basically going to convert your query into vectors so that you can go and find in the vector database where we stored the information earlier semantic matches for your queries. Then because it's stored as vectors, we'll also be able to find matches for monetization, revenue, price.

15:09So it doesn't have to be exact keyword matches like we're actually searching in the vector database by meaning here. And it even has a method to do that by keywords. So the dense vectors allow it to search by meaning.

15:20The BM 25 keywords allow it to actually keyword match and then it's basically summarized in one list of these are the closest matches to your relevant query that you asked about the past. Now it will pass that back to the agent first and if there's nothing that's totally relevant, then it's able to actually go one level deeper.

15:38So at that point, it could stop and actually find really relevant queries, find exactly what we're looking for from information in the past. If that answers the question, great. However, if that does not answer the question, then it jumps to tier two which is search expand.

15:51And MEMS search expand gives it more context, more metadata, a summary of information around the match that we potentially found. And, again, if that is not good enough and we need the raw dialogue, then it's gonna go to the next tier level three, which actually has all of the session dialogue that we had. Because if you remember, every single message we send, it's summarized into bullets and then appended to the memory and then that is indexed.

16:16So all of the raw dialogue is actually saved and we can retrieve that with level three if we need to as a last resort. Now all of these take more tokens as we go down, but if you need a reliable system for retrieving information about your client's project six months ago, then MemSearch is gonna be the one. Now you might have identified the limitation in this approach which is if we're asking about the past it immediately thinks okay instead of searching the local context let's go and do a database query.

16:44So that's gonna be slower than just checking our local in context memory. So Hermes uses a really clever approach for this.

16:53First instead of going deeper into the database it's actually just gonna check our memory. Md. That has the question that the user has asked been actually accessible via just the memory.

17:04Md, which means it can actually get it from the context that it's already received.

17:09So the power in injecting this frozen snapshot means that actually for some queries, it's gonna be able to be answered just from the context that's already in the memory.

17:21And that will basically be zero cost and instantaneously accessible. So it should, in theory, always search the context of that existing conversation before it goes down to the levels and searches the database.

17:33So if it is not found in there, then it goes deeper and searches the sessions. And we already mentioned those were stored in a database the same as we did for mem search. But instead of being a vector database, it's just searching by keywords effectively.

17:47So then what it's gonna do is basically return the top three matching set sessions by relevance and summarize it using Gemini Flash and pass that back into the agent. So Hermes is really good at exact keyword matching.

18:00So if we were to ask it about pricing, it could find things about pricing, but it might not necessarily find things about revenue because that's by meaning and not keywords. However, they do do one really smart thing which we're gonna adapt and use, which is inject this memory dot m d into the conversation history. And then also by default, as a level zero, check that memory dot m d.

18:21So check what's already in context before jumping down into the MEM search hybrid search, the MEMS search expand, and the level three down here. So what we'd actually ideally do is grab this memory dot m d, check, and put that into the MEMS search flow so that we have a hybrid of both of those.

18:39So we can treat this step as almost like a level zero between MemSearch and Hermes so that we actually check what's already in context before we go deeper and check the vector database. So the user asks about the past, it's gonna check the memory dot m d and the context that's in that existing window. And if not found, then it's gonna go on to the MEM search to start searching the vector database by keyword and meaning and then continue to level two and level three if it needs to do so.

19:04So that's a lot of information. Now how do you actually set this up for yourself and take the best elements of each system that can be worked together?

19:14So here's what I'd actually recommend when taking the best from each system. So let's run through store, inject, and recall and the life cycle of a conversation as it happens.

19:24So but we will, of course, leverage everything that's already built into code that works well as best practice. So as a conversation happens, we're gonna leverage the auto memory, which is built in and saves those memory dot m d files to the Claude global folder for us. But after every term completes, we're gonna add in the memsearch stop hook that's basically gonna capture word for word all of our transcripts of our conversations so that those can be put into a daily memory.

19:49But what we want to do is maintain a memory dot m d and a user dot m d file so that actually if the agent decides that something is important, it's not just relying on Claude code to add, replace or remove into memory. Md or user. Md files.

20:03Now that covers actually storing more context so that we can actually retrieve it later. We, of course, also leverage the vector database of MemSearch which is actually consolidating this information into long term semantically searchable memory.

20:16So basically we're gonna run a nightly job to consolidate all the information that were put into that database. All the transcripts, all the raw transcripts are gonna be consolidated using this memsearch index every single night.

20:28And if all of this is sounding a little bit too complex for you to actually go and set up, then I'm gonna show you later where we've got an exact guide for free on how to give this plan to Claude code and it will go through all your file systems and work out how to actually implement this and do all the installations for you.

20:43Now injection, we actually leveraged Hermes logic. So when the session starts, we wanna inject a little bit more context than just Claude dot m d.

20:51We wanna inject the sole dot m d, the user dot m d, memory dot m d, and then possibly today's log if you could also inject yesterday's log if you think that would be relevant too. So that would be 3,000 tokens that are cached at the start of every session, which will really be important when we come to actually recalling it.

21:08So then we jump onto the recall segment of the flow. And what we've done here is combine the tier zero of Hermes where we check the memory dot m d and daily log first.

21:19So those are injected inside the system prompt every time we send a message, but they're cached. So what we're doing is basically before digging deeper into the vector database to search past history, we check the local recent data that's been loaded into the conversation already. So memory dot m d and daily log, that has zero cost, and it's also pretty much immediate because it already has it in context.

21:41If that is not found, then we jump on to the MEMS search traditional level one, level two, level three, where we search the queries using the hybrid keyword and semantic or vector search. We then expand those with the chunks. And then if we do not find the information still, then we can actually pass the raw transcripts and passed that information back to the agents.

22:02So this setup gives us the ability to actually search information really quickly from local recent files and prioritize those, but also gives us the ability to actually search further back in less recent history to recall all our old knowledge to the point where we can literally pull out the raw dialogue at the end. The one thing I want you to take away is none of this is complicated individually but it's all about preserving best practice for storage, injection, and recall so we can massively improve the memory usage inside your crawl code sessions.

22:33If you're working on projects and multiple clients, then this is an absolute must have. And I know Anthropic are working on their own memory systems, but right now it's far, far behind what you can get from systems that are currently open source and free to access.

22:47Now I'll link below a completely free plan.md document for you to pass this into Claude and set it up for yourself. Now if you do want this straight out the box, done for you, you know it's gonna work well, then we'll be implementing this inside our own Agentic operating system next week.

23:03That's also linked down inside the academy in the description below. If you want to see what other options I considered for memory, out the next video.

23:11Thanks for watch

The Hook

The bait, then the rug-pull.

Claude Code default memory is losing information you cannot afford to lose. Simon Scrapes spent time digging through the most advanced open-source setups -- Hermes and MemSearch -- and found that the core ideas underneath all the complexity are actually simple. This is the teardown that shows you exactly what is missing and how to fix it.

Frameworks

Named ideas worth stealing.

01:00model

Store / Inject / Recall

Store -- how does info get written to memory?
Inject -- how does it reach the agent during a session?
Recall -- how does the agent find old info when asked?

Three-question framework for evaluating any agentic memory system. Every decision about memory architecture maps to one of these three verbs.

Steal forAny MCN+ lesson on AI agent setup, or a CLAUDE.md audit checklist

14:07model

MemSearch 3-Tier Retrieval

L1: memsearch search -- hybrid dense vectors + BM25 keywords + RRF fusion
L2: memsearch expand chunk_hash -- full markdown section around match
L3: parse-transcript session.json -- raw dialogue as last resort

Progressive disclosure retrieval: only go deeper when needed. Semantic search means pricing finds monetization without exact keyword match.

Steal forAny long-term memory layer for JoeFlow or MCN agents

11:30model

Hermes Frozen Snapshot Injection

SOUL.md (~1.8 kB) -- agent identity / operating principles
USER.md (1.4 kB cap) -- user profile, preferences, working style
MEMORY.md (2.5 kB cap) -- curated project facts, decisions, context
Daily log -- optional, today session context

~3000 tokens loaded once at session start, prefix-cached. Mid-session writes persist to disk but take effect NEXT session (frozen snapshot principle).

Steal forCLAUDE.md architecture upgrade -- split monolithic CLAUDE.md into SOUL / USER / MEMORY files

19:04model

Hybrid Memory Architecture

STORE: auto-memory + MemSearch stop hook + agent MEMORY.md/USER.md writes + nightly cron index
INJECT: Hermes frozen snapshot (~3000 tokens cached per session)
RECALL: Tier-0 in-context, L1 MemSearch hybrid, L2 expand, L3 raw

Combines completeness (MemSearch captures everything) with quality (Hermes curates what matters most) and speed (in-context check before any DB query).

Steal forBlueprint for the MCN+ agentic OS memory layer, or a paid workshop module

CTA Breakdown