The argument in one line.
Claude Code's native memory system is fundamentally weak at retrieval, but combining MemSearch's automatic full-transcript capture with Hermes's curated frozen-snapshot injection and a four-tier progressive recall chain creates a hybrid that beats all three individual systems.
Read if. Skip if.
- You're building Claude agents or workflows and currently lose context between sessions or across long projects.
- A developer using Claude Code who wants a practical hybrid memory setup without building from scratch or waiting for Anthropic updates.
- You've experimented with single memory approaches (MemSearch, Hermes, or default Claude memory) and hit limitations you need to work around.
- You're using Claude through the web interface for one-off conversations — this is built for persistent agent architectures and multi-session workflows.
- You haven't integrated Claude into a codebase or custom system yet — the setup assumes developer-level comfort with prompt injection and context management.
The full version, fast.
Claude Code's built-in memory lags behind open-source agentic systems because storage is selective, injection is limited to CLAUDE.md, and recall has no real mechanism beyond scanning past sessions. The fix is a hybrid built around three jobs every memory system must answer: how information is stored, how it gets injected into context at session start, and how it gets recalled later. Combine MemSearch's stop-hook to capture every turn into a locally-vectorized database with Hermes's curated memory.md, user.md, and soul.md files that get injected as a cached frozen snapshot each session. For recall, check that in-context snapshot first, then escalate through semantic vector search, expanded chunks, and raw transcripts only when needed.
Chat with this breakdown.
Modern Creator members can chat with any breakdown — ask for the hook, quote a framework, find the exact transcript moment. Unlocks at T2: refer 3 friends + add your own API key.
Create a free account →Where the time goes.

01 · Cold Open -- 3 Questions Every Memory System Must Answer
Frames the whole video around Store, Inject, and Recall. Introduces MemSearch and Hermes as the two strongest open-source challengers. Sets expectation: this is not about more context -- it is about the right context.

02 · STORE -- How the Three Systems Capture Information
Side-by-side comparison. Claude Code: auto-memory, sparse, promotes to global after 3+ repeats. MemSearch: stop hook after every turn, Haiku bullets, Milvus vector DB (local CPU, zero cost). Hermes: agent-driven add/replace/remove, MEMORY.md (2200 char) + USER.md (3375 char) + SQLite raw transcript + 7-day curator.

03 · INJECT -- How Memory Reaches the Agent at Session Start
Claude Code loads CLAUDE.md + conditional memory file injection via pre-tool-use hook. MemSearch has NO injection layer. Hermes loads a frozen snapshot of SOUL.md + USER.md + MEMORY.md (~1300 tokens, prefix-cached) once per session.

04 · RECALL -- How the Agent Retrieves Past Information
Claude Code: checks auto-memory files, if not saved it is lost -- no search, no grep, no vectors. MemSearch: 3-tier retrieval (L1 hybrid vector+keyword search, L2 expand chunk context, L3 raw session transcript). Hermes: Tier-0 in-context MEMORY.md check, FTSS keyword query, Gemini Flash summarisation of top 3 sessions.

05 · Recommended Hybrid Setup -- Taking the Best of All Three
Store: auto-memory + MemSearch stop hook + agent writes MEMORY.md/USER.md + nightly memsearch index cron. Inject: Hermes frozen snapshot (~3000 tokens cached). Recall: Tier-0 in-context, L1 MemSearch hybrid, L2 expand, L3 raw transcript. Free plan.md available.
Lines worth screenshotting.
- Claude Code's built-in memory barely saves anything — it only captures what the model explicitly decides is worth remembering, which is a fraction of what actually matters.
- Every agentic memory system comes down to three questions: how does information get stored, how does it get injected into the next session, and how does it get recalled on demand.
- MemSearch captures every turn of every conversation automatically using a stop hook, vectorizes the results locally at zero API cost, and makes them semantically searchable by meaning rather than keyword.
- Hermes takes the opposite approach — the agent decides what to save, enforces character limits that force consolidation, and purges duplicates so the curated memory stays lean and injectable.
- Injecting a frozen snapshot of memory.md, user.md, and soul.md at session start costs about 1,300 cached tokens and produces dramatically better recall than relying on the default CLAUDE.md alone.
- The fatal flaw of strong storage with no recall system is that information you saved is effectively lost — if you can't surface the right thing at the right time, storage is theater.
- MemSearch's three-tier progressive recall — vector search, expand with metadata, then raw transcript — only goes as deep as it needs to, which keeps routine queries fast.
- Tier-zero recall that checks what's already in the session's context before hitting the database is the most overlooked optimization — the cheapest search is the one you don't do.
- Combining MemSearch's automated capture with Hermes's curated injection produces a system that never loses anything and never bloats the active context window.
- A nightly consolidation job that re-indexes all raw transcripts into the vector database means the system compounds in depth every day without manual intervention.
- The reason Claude Code's memory lags behind open source solutions is structural — the built-in system was designed for single-session completeness, not multi-project longitudinal recall.
- Client work from six months ago being instantly retrievable by semantic search is the difference between an AI assistant and an AI that actually compounds knowledge over time.
- Markdown as the source of truth for memory is the right design — it means the vector database is rebuildable from scratch if it gets corrupted, because the raw files are always the ground state.
- Hermes's curator runs every seven days and prunes raw transcripts down to consolidated facts — which keeps the injectable memory small enough to be useful without being lossy.
- Handing Claude Code a plan.md that describes the entire hybrid memory setup and asking it to implement itself is the correct workflow — building memory infrastructure by hand defeats the purpose.
Steal the three-verb framework.
Every memory decision in any agent system maps to just three questions: Store, Inject, Recall -- and Claude Code out of the box fails at two of them.
- Add the MemSearch stop hook today -- it captures everything your auto-memory misses with zero extra cost (Haiku is cheap).
- Split your CLAUDE.md into SOUL.md / USER.md / MEMORY.md right now -- the frozen snapshot injection pattern gives you Hermes recall quality for free.
- Build the Tier-0 check: before any vector DB query, check what is already in context -- instant and free.
- The free plan.md at scrapeshq.notion.site/claude-memory-systems is a paste-in blueprint -- hand it to Claude Code and let it self-install.
- Frame any AI memory content you create around Store / Inject / Recall -- it is the clearest mental model for this category and Joe could own it in the creator space.
Terms worth knowing.
- Agentic memory system
- A framework that lets an AI agent save, retrieve, and reuse information across separate conversations, so it can remember facts, decisions, and context instead of starting fresh each session.
- Claude Code
- Anthropic's command-line coding assistant that runs Claude inside a terminal session, with hooks, slash commands, and project-level configuration files for customizing how the model behaves.
- MemSearch
- An open-source memory layer for Claude Code that captures every conversation turn to markdown, then indexes the content into a local vector database for semantic search and progressive recall.
- Hermes agent
- An open-source Claude Code memory setup that uses agent-curated memory files plus a periodic curator job to consolidate facts, then injects a frozen snapshot of those files at the start of each session.
- System prompt
- The hidden instruction block sent to a language model before the user's message, used to set persistent rules, persona, and reference material the model should treat as always-on context.
- CLAUDE.md
- A markdown file Claude Code automatically loads into its system prompt at session start, used to store project-specific instructions, conventions, and context that should persist across conversations.
- Context window
- The total amount of text a language model can hold in active memory during a single conversation, measured in tokens, beyond which older content gets dropped or compressed.
- Token
- The basic unit a language model uses to count text, roughly equal to a short word or word fragment, which determines both context-window limits and API pricing.
- Stop hook
- A Claude Code hook that automatically fires when a conversation turn finishes, letting the user run scripts to log, summarize, or process the exchange without manual intervention.
- Pre-tool-use hook
- A Claude Code hook that runs immediately before the agent calls a tool, often used to inject extra context, check permissions, or load relevant memory files into the conversation.
- Haiku
- Anthropic's smallest and cheapest Claude model, fast enough to run on every conversation turn for lightweight jobs like summarization, classification, or short bullet extraction.
- Gemini Flash
- Google's smaller, low-latency Gemini model, used here as a cheap summarizer that condenses retrieved session transcripts into a short answer before passing them back to the main agent.
- Vector database
- A storage system that holds text as numerical embeddings so content can be searched by meaning rather than exact wording, enabling semantic recall of related information.
- Milvus
- An open-source vector database that stores embeddings locally and supports fast similarity search, used here to run MemSearch's semantic memory layer on the user's own machine with no API costs.
- Embedding
- A numerical representation of a piece of text that captures its meaning, allowing similar concepts to be found through math even when they share no exact words.
- Chunk
- A small slice of a larger document, sized so a model or search index can process it independently, typically used as the unit that gets embedded and stored in a vector database.
- Hash
- A short fixed-length string generated from a piece of content, used as a unique identifier so the system can detect duplicates and link chunks back to their source.
- Semantic search
- Retrieval that matches by meaning rather than exact keywords, so a query about pricing can surface notes that mention revenue or monetization even when the word pricing never appears.
- BM25
- A classic ranking algorithm that scores documents by keyword overlap and frequency, often paired with vector search in hybrid retrieval systems to balance exact matching with semantic similarity.
- Hybrid search
- A retrieval approach that runs keyword matching and semantic vector search together, then merges the results so the system catches both literal term hits and conceptually related content.
Things they pointed at.
Lines you could clip.
“It is not about loading more context in. It is about loading the right context at the right time only.”
“If you can store as much information as you want, but if you cannot get it out at the right time, then it is not worth having a good storage mechanism in the first place.”
“MemSearch and Hermes go 10 x further than the basic claw code out the box.”
“Right now it is far, far behind what you can get from systems that are currently open source and free to access.”
Word for word.
The bait, then the rug-pull.
Claude Code default memory is losing information you cannot afford to lose. Simon Scrapes spent time digging through the most advanced open-source setups -- Hermes and MemSearch -- and found that the core ideas underneath all the complexity are actually simple. This is the teardown that shows you exactly what is missing and how to fix it.
Named ideas worth stealing.
Store / Inject / Recall
- Store -- how does info get written to memory?
- Inject -- how does it reach the agent during a session?
- Recall -- how does the agent find old info when asked?
Three-question framework for evaluating any agentic memory system. Every decision about memory architecture maps to one of these three verbs.
MemSearch 3-Tier Retrieval
- L1: memsearch search -- hybrid dense vectors + BM25 keywords + RRF fusion
- L2: memsearch expand chunk_hash -- full markdown section around match
- L3: parse-transcript session.json -- raw dialogue as last resort
Progressive disclosure retrieval: only go deeper when needed. Semantic search means pricing finds monetization without exact keyword match.
Hermes Frozen Snapshot Injection
- SOUL.md (~1.8 kB) -- agent identity / operating principles
- USER.md (1.4 kB cap) -- user profile, preferences, working style
- MEMORY.md (2.5 kB cap) -- curated project facts, decisions, context
- Daily log -- optional, today session context
~3000 tokens loaded once at session start, prefix-cached. Mid-session writes persist to disk but take effect NEXT session (frozen snapshot principle).
Hybrid Memory Architecture
- STORE: auto-memory + MemSearch stop hook + agent MEMORY.md/USER.md writes + nightly cron index
- INJECT: Hermes frozen snapshot (~3000 tokens cached per session)
- RECALL: Tier-0 in-context, L1 MemSearch hybrid, L2 expand, L3 raw
Combines completeness (MemSearch captures everything) with quality (Hermes curates what matters most) and speed (in-context check before any DB query).
How they asked for the click.
“I will link below a completely free plan.md document for you to pass into Claude and set it up for yourself.”
Soft lead-in referencing the paid agentic OS, then free plan.md as the accessible on-ramp. Clean two-tier CTA: free DIY vs done-for-you community.











































































