Modern Creator
AI Stack Engineer · YouTube

OpenCode Persistent Memory Across Sessions, 10x Token Savings

A 9-minute motion-graphics walkthrough of how ClaudeMem bolts persistent local memory onto OpenCode — and why the three-layer retrieval design saves 10x the tokens.

Posted
2 days ago
Duration
Format
Tutorial
educational
Views
12.2K
421 likes
Big Idea

The argument in one line.

Every coding agent session starts cold because there is nowhere to store what was learned — ClaudeMem closes that gap with a local observation database and a three-layer retrieval design that costs a tenth of loading full history.

Who This Is For

Read if. Skip if.

READ IF YOU ARE…
  • A developer who uses OpenCode, Claude Code, or any terminal agent daily and finds themselves re-explaining the same project context at the start of every session.
  • Someone whose context window budget is getting eaten by re-hydrating old decisions rather than doing actual work.
  • Anyone running multi-week projects with a coding agent where continuity across sessions matters more than any single run.
SKIP IF…
  • You use one-off, throwaway coding sessions and never return to the same codebase with an agent.
  • You are already satisfied with manually managed CLAUDE.md or system-prompt context injection and do not want an automated background service.
TL;DR

The full version, fast.

Coding agents reset every session, turning context re-entry into pure token overhead. ClaudeMem plugs into OpenCode via lifecycle hooks, records what the agent does (files opened, edits, commands, API calls), compresses observations with AI into a local SQLite database, and uses a vector search index for semantic retrieval. A three-layer search workflow — cheap index first, timeline context second, full detail only when needed — is claimed to use one-tenth the tokens of loading full records. Install is a single command. All data stays local by default, with a private tag to exclude secrets from capture.

Free for members

Chat with this breakdown — free.

Sign in and you get 23 free chat messages on us — ask for the hook, quote a framework, find the exact transcript moment, generate a markdown action plan. Bring your own key when you want unlimited.

Create a free account →
Chapters

Where the time goes.

00:0000:55

01 · The cold-start problem

Names the pain: agents forget everything between sessions, turning context re-entry into token waste. Branded slide sequence S01-S04.

00:5501:43

02 · OpenCode and the strange boost

Introduces OpenCode as a provider-agnostic terminal agent. Notes the January 2026 Anthropic third-party block accelerated adoption by making provider agnosticism look like insurance.

01:4302:48

03 · What ClaudeMem does

Silently watches agent activity (files, edits, commands, API calls), compresses to summaries, stores locally, injects relevant pieces at next session start.

02:4803:45

04 · Local database and vector search

Architecture: SQLite for storage plus a vector search index for semantic retrieval — plain-language queries surface memories even when phrased differently than originally recorded.

03:4504:37

05 · The 3-layer retrieval design

Cheap index first (~50-100 tokens), timeline context second, full detail last and only for specific items. Claimed 10x token savings vs. loading full records.

04:3703:57

06 · Lifecycle hooks

How capture is automated: hooks fire at session start, prompt sent, tool run, session end. No manual input required.

03:5705:35

07 · One-line install on OpenCode

npx claude-mem install --ide opencode. Installer handles Bun and uv if missing. Requires Node 20+ and OpenCode pre-installed.

05:3506:52

08 · Web viewer and first-session reality

Worker runs at localhost:37701. Dashboard shows no items on fresh install by design. Memory builds as sessions accumulate.

06:5208:02

09 · What actually changes

Session two onwards: agent stops re-pitching ruled-out options, remembers bug patterns, matches code style. Cold vs. warm prompt comparison illustrates the gap.

08:0208:34

10 · Interface, privacy, and edge features

MCP tools expose search to the agent. Private tags exclude secrets from capture. Data stays local. Beta: endless mode + OpenClaw gateway for Slack/Discord/Telegram.

08:3409:12

11 · The honest part and the bigger picture

Caveats: wrong assumptions get persisted; pause during throwaway sessions; prune stale memories. Closing argument: persistent memory is the line between a one-off helper and a weeks-long build partner.

Atomic Insights

Lines worth screenshotting.

  • Every token spent re-explaining project context to a fresh agent session is pure overhead that burns budget before real work begins.
  • OpenCode grew after Anthropic blocked third-party tools from Claude consumer subscriptions in January 2026 — provider-agnosticism became insurance, not a nice-to-have.
  • A vector search index on top of SQLite means you can describe a past decision in completely different words and still surface the right memory.
  • The three-layer retrieval pattern — cheap index, timeline context, full detail — applies each filter only to items that passed the previous one, not to everything at once.
  • The first session after install feels identical to before because the memory is still empty; the value compounds from session two onward.
  • A wrong agent assumption that gets compressed and saved can carry forward as a persistent false belief — the database inherits the agent's mistakes, not just its insights.
  • Local-only storage is the privacy default: no project history leaves the machine, and private tags exclude secrets from capture entirely.
  • Memory that compounds across weeks is what separates an agent that is handy for a one-off task from one that genuinely keeps pace with a production codebase.
  • Running the same prompt cold vs. memory-warm reveals the difference immediately: generic defaults vs. context-aware output that matches your patterns on the first try.
  • Bun and uv are both auto-installed if missing — the single-command install is genuinely one command with no manual dependency setup.
Takeaway

Why every agent session starting cold is a compounding tax.

WHAT TO LEARN

Re-explaining project context to a fresh agent session is not just friction — it is a measurable token cost that compounds across every day of development on the same codebase.

  • Every session that starts cold forces the agent to guess at decisions you already made — the corrections you give it are tokens spent going backwards, not forwards.
  • A three-layer retrieval pattern — cheap index first, timeline context second, full detail only for specific items — keeps memory injection from cannibalizing the context window you need for actual work.
  • Vector search on past session observations means you can describe a prior decision in plain language and surface the right memory even if the phrasing is completely different from how it was originally captured.
  • The quality of persistent memory is bounded by the quality of what the agent did during sessions — a wrong assumption that gets compressed and saved becomes a persistent false belief that requires deliberate correction.
  • Local-only storage removes the cloud dependency that would make a background memory service a single point of failure for production workflows, and it is the privacy default, not an opt-in.
  • The compound effect of memory only becomes visible after the second session — expecting immediate results from a fresh install is the wrong mental model for evaluating whether the tool works.
  • Pausing memory capture during throwaway or experimental branches is not optional hygiene — it prevents the permanent library from accumulating dead-end context that will mislead future sessions.
Glossary

Terms worth knowing.

ClaudeMem (claude-mem)
An open-source background service that captures coding agent session activity, compresses it into a local database, and injects relevant context into future sessions. Supports OpenCode, Claude Code, Gemini CLI, and others.
OpenCode
An open-source terminal-based AI coding agent that is provider-agnostic, able to run Anthropic, OpenAI, Google, or local Ollama models interchangeably.
Lifecycle hooks
Trigger points built into an agent framework that fire at defined moments such as session start, prompt sent, tool run, and session end. Used by ClaudeMem to capture observations automatically without user action.
Vector search index
A database layer that stores text as mathematical embeddings so that semantically similar phrases match even when the exact words differ, enabling plain-language queries against past session memory.
3-layer retrieval
ClaudeMem's search strategy: a cheap index of IDs and tiny summaries, then timeline context around interesting results, then full detail fetched only for specific items. Applied in sequence to minimize token cost.
Endless mode
A beta ClaudeMem feature designed to keep memory coherent across very long stretches of continuous work, beyond normal session boundaries.
OpenClaw gateway
An integration that runs ClaudeMem as a persistent memory layer on a gateway server, with the ability to stream live observations to external services like Discord, Slack, or Telegram.
Bun
A JavaScript runtime used by ClaudeMem to run the background worker process that captures and compresses session observations.
uv
A Python package manager and runtime used by ClaudeMem to power the vector search component. Auto-installed during setup if not already present.
Resources

Things they pointed at.

Quotables

Lines you could clip.

00:31
Every word of re-explaining is burning tokens just to get back to the starting line you were already at.
Standalone, punchy, no setup neededTikTok hook↗ Tweet quote
08:44
Persistent memory is quietly becoming the line between an agent that's handy for a one-off task and one you can actually build with over weeks.
Strong thesis close, self-containedIG reel cold open↗ Tweet quote
08:00
Treat it like a tool you steer, not one you set loose and forget.
Aphoristic, memorable, honest framingnewsletter pull-quote↗ Tweet quote
The Script

Word for word.

metaphor
00:00So you build something good with your coding agent, you close the terminal, and the next day, it's like the two of you never met. That's the quiet frustration nobody warns you about when you start using AI agents in the terminal. The agent that knew your whole codebase yesterday, the one that finally understood your naming style and the weird workaround you needed for that one service, wakes up the next morning as a total stranger.
00:24It forgot the architecture. It forgot the bug you killed together. It forgot the thing you corrected it on four times.
00:31So you start over typing the same context back in, and every word of that re explaining is burning tokens just to get back to the starting line you were already at. Alright. So OpenCode is one of the best terminal agents out there right now.
00:46And that's exactly why this gap stings. It lives in your terminal, it's open source, and it isn't chained to one AI provider. So you can run Anthropic, OpenAI, Google, or even a local model through Alama.
00:59Millions of developers reach for it every month, and the numbers keep climbing. A lot of that growth got a strange boost back in January 2026 when Anthropic blocked third party tools from using Claude through consumer subscriptions.
01:13Instead of slowing OpenCode down, that pushed even more people toward it, because being provider agnostic suddenly looked less like a nice to have and more like insurance. But for all that flexibility, OpenCode has the same hole every agent has.
01:29Each session starts cold. Your project history lives in your head, not in the tool. And once you close a session, whatever the agent learned basically evaporates.
01:39Claude MEM is what closes that gap, and it now plugs straight into OpenCode. The simple way to put it, it hands your agent a real long term memory. While you work, it quietly watches what the agent actually does.
01:53The files it opens, the edits it writes, the commands it runs, the calls it makes. Then it takes all of that and uses AI to compress it down into clean little summaries and saves them into a database that sits on your own machine. Next time you open OpenCode in that same project, it pulls the relevant pieces back in on its own.
02:13So the agent shows up to the new session already knowing the story so far, instead of asking you to retell it. What makes this more than a fancy notepad is how it stores and finds things. Everything goes into a local SQLite database, and on top of that, there's a vector search index, which means it isn't just matching exact words.
02:33You can ask about something in plain language, and it'll surface the right memory even if you describe it totally differently than how it got recorded the first time. There's also a search system the agent itself can reach for. So mid task, it can glance back through your project history and pull up the part that's actually relevant without dumping your entire past into the context window.
02:55That last point is the part I really want you to get because it's where the token savings live. The search runs in layers. First, it does a cheap lookup that returns a short index, basically just IDs and tiny summaries, costing almost nothing.
03:10Then if something looks worth a closer look, it can grab a timeline around that moment to see what else was happening at the time. And only then, for the specific items that matter, does it pull the full detail. The whole design is built to avoid loading everything at once.
03:25The makers say this layered approach saves roughly 10 times the tokens compared to grabbing full records up front. So your context budget stays open for the real work instead of getting eaten by old history. Under the hood, all of this is wired into the agent through life cycle hooks, little trigger points that fire when a session starts, when you send a prompt, when a tool runs, and when a session ends.
03:48That's how the capturing happens automatically without you lifting a finger. Installing it on OpenCode is genuinely one line.
03:57Open your terminal and run npx clod mem install dash dash I d e and then OpenCode. That flag points the installer straight at OpenCode.
04:07And here's a detail I really like. If you just run npx claud mem install with no flag, it actually scans your machine for coding agents you already have. So it'll pop up a list with options like Claude code, Gemini CLI, OpenCode, and a few others, and let you multi select which ones to wire up.
04:27For Claude code, you could even add it as a plug in from inside the tool. But since we're focused on OpenCode here, the command with the open code flag is the clean direct path, so use that one. The installer does the heavy lifting so you don't have to.
04:42It runs a quick runtime check, and if bun or u v are missing, it just installs them for you. Bun is the JavaScript runtime that runs the background worker. U v handles the Python side that powers the vector search.
04:55Before you run anything, make sure you've got Node version 20 or higher and OpenCode itself already installed. Everything else, the database, the runtimes, it sorts out during setup. Once it finishes, you'll see a message telling you the worker is running at a local address.
05:11That worker is the small background service that does all the capturing and compressing while you code. And that address is your web viewer. Open it in your browser, and you get a clean dashboard for your memory.
05:23Right after install, it'll just say no items to display because you haven't built up any memory yet, which is exactly what you should expect on a fresh setup. As you start working, that empty screen fills up with observations streaming in live. Now here's the part most demos skip, what actually changes after you install it.
05:43The first session won't feel different because the memory is still empty. The magic shows up on the second session and every one after. You open OpenCode the next day, and the agent already has the context loaded.
05:56It remembers you picked one database approach over another and why. So it stops pitching the option you already ruled out. It remembers a bug pattern you hit before.
06:05So when it shows up again, it goes straight to the fix instead of debugging from zero. It remembers your code style and your folder structure, so its edits land closer to what you actually want on the first try.
06:17The continuity compounds. The longer you run it on a project, the sharper it gets because it's slowly building a real picture of how that codebase works and how you like things done. A clean way to feel the difference yourself is to run the same prompt twice.
06:33Once on a fresh open code session with no memory and once with claud mem active on a project it's already seen. The cold one gives you something generic, missing your patterns, repeating default choices, needing a few rounds of correction. The one with memory comes out closer on the first try because it isn't guessing at your context, it already has it.
06:54ClaudeMem exposes its search through MCP tools so the agent talks to your memory through a clean, standard interface. And if you set those up, you get sharper results pulling from your project history. There's a privacy feature that matters more than people expect.
07:10You can wrap sensitive content like keys or secrets in private tags, and it'll skip storing that stuff entirely. Since the database lives on your own machine, your project history isn't getting shipped off to some server either, which is the right default.
07:25There's also a beta channel with experimental stuff, including something they call endless mode, built for keeping memory coherent across really long stretches of work, and you can flip between stable and beta right from that web viewer's settings. And if you live further out on the edge, there's even an OpenClaw gateway integration, which runs ClawdMem as a persistent memory layer on a gateway and can feed live observations out to places like Discord, Slack, or Telegram.
07:53So I want to be honest about the part that needs care because this isn't magic and you can trip yourself up. The memory is only as good as what goes into it. If the agent makes a wrong assumption during a session and that gets compressed and saved, it can carry that mistake forward.
08:09So on a serious production codebase, be deliberate. Pause it when you're doing something throwaway or experimental, and clean out memories that aren't pulling their weight.
08:20Treat it like a tool you steer, not one you set loose and forget. The bigger picture is that persistent memory is quietly becoming the line between an agent that's handy for a one off task and one you can actually build with over weeks.
08:34OpenCode already gave you the freedom to run any model you want in your terminal. Bolting a memory layer on top means it stops resetting on you every single morning. For anyone shipping real software, that's the difference between an assistant that helps in the moment and one that genuinely keeps pace with your project over time.
08:54And since it's open source and runs entirely on your own machine, there's almost nothing standing between you and trying it on whatever you're building today. Alright. So that's it from the video, and I hope you enjoyed it.
09:06If you did, please like this video and subscribe to the channel, and I'll see you in the next video.
The Hook

The bait, then the rug-pull.

The quiet frustration nobody warns you about when you start using AI agents in the terminal: the agent that finally understood your naming style and the weird workaround you needed for that one service wakes up the next session as a total stranger. Every word of re-explaining burns tokens just to reach the starting line you were already at.

Frameworks

Named ideas worth stealing.

04:15model

The 3-Layer Memory Retrieval Workflow

  1. search — compact index, IDs + tiny summaries (~50-100 tokens)
  2. timeline — chronological context around interesting observations (low cost)
  3. get_observations — full detail only for filtered IDs (~500-1,000 tokens)

ClaudeMem's token-efficient memory lookup applies three sequential filters before fetching expensive full-detail records, claiming ~10x savings vs. naive full-record loading.

Steal forAny RAG or agent memory system where context budget is a constraint — the progressive filter pattern applies broadly.
CTA Breakdown

How they asked for the click.

VERBAL ASK
09:00subscribe
If you did, please like this video and subscribe to the channel, and I'll see you in the next video.

Minimal single-sentence close after the main content. No product pitch, no newsletter, no sponsor.

MENTIONED ON CAMERA
FROM THE DESCRIPTION
OTHER LINKSAlso linked in the description.
Storyboard

Visual structure at a glance.

open — VS Code cold start
hookopen — VS Code cold start00:00
You build something good. Then it forgets you.
hookYou build something good. Then it forgets you.00:13
Everything you built together — erased.
promiseEverything you built together — erased.00:24
OpenCode — not chained to one provider
contextOpenCode — not chained to one provider00:58
each session starts cold
problemeach session starts cold02:13
ClaudeMem GitHub repo
solutionClaudeMem GitHub repo02:33
it quietly watches, compresses, saves
valueit quietly watches, compresses, saves02:48
local DB + vector search
valuelocal DB + vector search03:34
3-layer MCP retrieval workflow
value3-layer MCP retrieval workflow04:15
it never loads everything at once
valueit never loads everything at once05:03
one-line install on GitHub
ctaone-line install on GitHub05:58
post-install: worker running at localhost:37701
demopost-install: worker running at localhost:3770107:27
web viewer — no items on fresh install
demoweb viewer — no items on fresh install08:20
cold guesses vs. memory already has it
proofcold guesses vs. memory already has it10:00
MCP tools / private tags / local-only
valueMCP tools / private tags / local-only10:50
this is not magic — you can trip yourself up
caveatthis is not magic — you can trip yourself up11:40
memory = the dividing line
thesismemory = the dividing line12:27
open source, runs on your machine — try it today
ctaopen source, runs on your machine — try it today09:08
Frame Gallery

Visual moments.

Watch next

More from this channel + related breakdowns.

Chat about this