How to Build a Persistent Memory System for Claude Code: Short-Term, Long-Term, and Scoped Access
Claude Code forgets everything between sessions. Learn how to build a three-layer memory system with source citation, semantic search, and team-scoped access.
Why Claude Code Keeps Starting from Zero
Every time you close a Claude Code session, everything it learned about your project disappears. The variable naming conventions you mentioned last Tuesday, the architecture decision you explained twice last week, the reason you chose that particular database schema — gone. You’re back to square one.
This statelessness isn’t a bug. It’s how Claude Code is designed. But it creates a real problem for anyone building serious software: you waste time re-explaining context, your agent makes inconsistent decisions, and the longer a project runs, the more painful each new session becomes.
Building a persistent memory system for Claude Code solves this. With the right architecture, Claude Code can recall what it learned last week, apply project-specific rules automatically, and know which decisions belong to which team or codebase. This guide walks through a three-layer approach: short-term session memory, long-term semantic memory, and scoped access that keeps the right context available to the right agent.
The Three-Layer Memory Model
Most memory systems for AI agents fail because they treat memory as a single flat thing — either you have it or you don’t. A more useful model separates memory into three distinct layers:
Short-term (session context): Information that’s relevant for the current task. Examples: the file you’re editing, recent tool outputs, a debugging thread you’re following. This should be fast and cheap to access, but it doesn’t need to survive session resets.
Long-term (persistent knowledge base): Information that stays relevant across sessions. Examples: architectural decisions, team conventions, recurring patterns in the codebase, past bug resolutions. This needs semantic search — you don’t always know exactly what you’re looking for.
Scoped access (project or team): Information segmented by who it belongs to. Examples: rules that apply only to the frontend codebase, secrets that only certain agents should see, memory tied to a specific team’s workflow.
Getting all three right means Claude Code behaves like a developer who actually remembers your project — rather than a consultant showing up fresh each morning.
Building Short-Term Memory with CLAUDE.md
The simplest starting point is Claude Code’s built-in support for CLAUDE.md files. Claude Code reads these automatically at the start of each session, which makes them the easiest hook for injecting short-term, session-scoped context.
Setting Up Your CLAUDE.md Structure
You can use a hierarchy of CLAUDE.md files to scope context to different levels:
~/.claude/CLAUDE.md— global rules that apply across all projects/project-root/CLAUDE.md— project-level conventions/project-root/subdirectory/CLAUDE.md— subsystem-specific rules
A well-structured project-level CLAUDE.md might include:
# Project Context
## Architecture
- Backend: Node.js + Express, deployed on AWS Lambda
- Database: PostgreSQL via Prisma ORM
- Frontend: Next.js 14 with App Router
## Conventions
- Always use named exports, never default exports
- Error handling follows the Result pattern (see /lib/result.ts)
- All API routes require JWT validation middleware
## Active Work
- Current sprint: Authentication refactor (ticket AUTH-112)
- Do not modify /auth/legacy until migration is complete
This gives Claude Code immediate session context without any infrastructure. But it’s static — you have to update it manually, and it doesn’t grow with your project.
Automating CLAUDE.md Updates
To make short-term memory actually dynamic, you can write a lightweight script that updates the “Active Work” section of your CLAUDE.md at session start. A simple approach:
#!/bin/bash
# update-context.sh — run before starting claude code
RECENT_COMMITS=$(git log --oneline -5)
OPEN_FILES=$(git diff --name-only HEAD)
cat > .claude_context.md << EOF
## Recent Git Activity
$RECENT_COMMITS
## Modified Files
$OPEN_FILES
EOF
# Append to CLAUDE.md
sed -i '/## Active Work/,$d' CLAUDE.md
echo "## Active Work" >> CLAUDE.md
cat .claude_context.md >> CLAUDE.md
This isn’t semantic memory — it’s just automated context injection. But it covers a large percentage of the “what are we even working on?” overhead that eats time at session start.
Building Long-Term Memory with Semantic Search
Short-term memory handles session setup. Long-term memory is where the real value lives — and it requires a different approach entirely.
The challenge with long-term project knowledge is retrieval. You can’t predict exactly what Claude Code will need. So instead of keyword search (which requires precise matching), you want semantic search: given a query like “how do we handle database migrations,” return the most relevant stored knowledge, even if the stored entry says “schema changes” instead.
Choosing a Vector Store
For most Claude Code setups, the pragmatic options are:
- ChromaDB — local, no infrastructure required, good for single-developer setups
- Qdrant — local or self-hosted, better performance at scale
- Pinecone or Weaviate — hosted options, good for team environments
- SQLite with sqlite-vec — surprisingly capable for small to medium knowledge bases, no separate service needed
For a solo developer or small team, ChromaDB or SQLite with vector extensions is the least-friction option.
Structuring Memory Entries
Each memory entry should include more than just content. The schema matters:
{
"id": "mem_20240915_arch_001",
"content": "We decided to use Prisma over raw SQL because the team has mixed SQL experience and type safety is more important than query flexibility for this project.",
"metadata": {
"type": "architecture_decision",
"project": "payment-service",
"created_at": "2024-09-15T14:32:00Z",
"source": "architecture-review-2024-09-15.md",
"source_line": 47,
"author": "team",
"tags": ["database", "orm", "prisma"]
},
"embedding": [0.023, -0.41, ...]
}
The source and source_line fields are critical. Without them, you get memory outputs like “you decided to use Prisma” with no way to verify or understand the original context. Source citation turns retrieved memory from assertion into traceable fact.
Implementing Retrieval with Source Citation
Here’s a minimal retrieval function in Python using ChromaDB:
import chromadb
from anthropic import Anthropic
client = chromadb.PersistentClient(path="./memory_store")
collection = client.get_or_create_collection("project_memory")
def retrieve_with_citation(query: str, n_results: int = 5) -> str:
results = collection.query(
query_texts=[query],
n_results=n_results,
include=["documents", "metadatas", "distances"]
)
formatted = []
for doc, meta, dist in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0]
):
relevance = round((1 - dist) * 100, 1)
citation = f"[Source: {meta.get('source', 'unknown')}"
if meta.get('source_line'):
citation += f", line {meta['source_line']}"
citation += f" | Relevance: {relevance}%]"
formatted.append(f"{doc}\n{citation}")
return "\n\n---\n\n".join(formatted)
When Claude Code calls this before generating a response, it gets content plus provenance. The agent can say “according to the architecture review from September 15th” rather than making claims that feel invented.
Writing Memories Automatically
Retrieval is only half the system. You also need a way to write new knowledge into the store without constant manual intervention.
One approach: after each significant Claude Code session, run a post-session summarizer that extracts decisions, patterns, and resolutions and writes them to the store.
def extract_and_store_memory(session_transcript: str, source_file: str):
extraction_prompt = """
Review this Claude Code session transcript. Extract:
1. Architecture or design decisions made
2. Bugs found and how they were resolved
3. Conventions or patterns established
4. Any explicit "remember this" statements
Format each as a single, standalone fact.
"""
anthropic = Anthropic()
response = anthropic.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": extraction_prompt + "\n\n" + session_transcript}
]
)
facts = parse_facts(response.content[0].text)
for i, fact in enumerate(facts):
collection.add(
documents=[fact],
metadatas=[{
"type": "session_extract",
"source": source_file,
"created_at": datetime.now().isoformat()
}],
ids=[f"mem_{datetime.now().strftime('%Y%m%d')}_{i:03d}"]
)
This closes the loop: sessions write knowledge in, future sessions pull knowledge out.
Implementing Scoped Access
When you’re working across multiple projects or with a team, undifferentiated memory creates more problems than it solves. The agent needs to know that the authentication conventions from the payment service shouldn’t apply to the marketing site.
Scoped access means organizing memory so Claude Code retrieves context appropriate to its current task — and doesn’t pull in irrelevant or conflicting information.
Project-Level Scoping
The simplest form of scoping is namespacing your ChromaDB collections by project:
def get_collection(project_id: str):
return client.get_or_create_collection(f"memory_{project_id}")
def retrieve_for_project(query: str, project_id: str, n_results: int = 5):
collection = get_collection(project_id)
return collection.query(query_texts=[query], n_results=n_results)
You can extend this with a global collection that holds shared conventions, then merge results:
def retrieve_with_scope(query: str, project_id: str) -> str:
project_results = retrieve_for_project(query, project_id, n_results=3)
global_results = retrieve_for_project(query, "global", n_results=2)
# Combine, tagging each result with its scope
all_results = tag_results(project_results, "project") + \
tag_results(global_results, "global")
return format_results(all_results)
- ✕a coding agent
- ✕no-code
- ✕vibe coding
- ✕a faster Cursor
The one that tells the coding agents what to build.
Team-Scoped Access
For team environments, you need memory that’s shared across developers working on the same project. The pattern here is a shared vector store (hosted or self-hosted) with access control at the metadata level.
Each memory entry gets a team field in its metadata:
{
"content": "Never use async/await in the legacy worker files — they predate our Node version upgrade.",
"metadata": {
"team": "backend",
"project": "core-api",
"access": "backend-team",
"created_by": "carlos",
"verified": true
}
}
At retrieval time, you filter by team:
def retrieve_for_team(query: str, project_id: str, team: str):
collection = get_collection(project_id)
return collection.query(
query_texts=[query],
n_results=5,
where={"$or": [
{"access": team},
{"access": "all"}
]}
)
The verified flag is worth adding. Memory entries marked as verified by a human have higher trust than ones extracted automatically from sessions. You can use this to weight results or flag uncertainty in retrieved content.
Handling Memory Conflicts
In long-running projects, you’ll eventually have conflicting memories: an early decision that got reversed, a convention that changed. Without conflict resolution, Claude Code will confidently apply outdated rules.
A simple mitigation: timestamp all entries and allow them to be marked as superseded.
def supersede_memory(old_id: str, new_content: str, reason: str):
# Mark old entry as superseded
old = collection.get(ids=[old_id])
old_meta = old["metadatas"][0]
old_meta["superseded"] = True
old_meta["superseded_reason"] = reason
collection.update(ids=[old_id], metadatas=[old_meta])
# Add new entry with reference to old
collection.add(
documents=[new_content],
metadatas=[{
"supersedes": old_id,
"created_at": datetime.now().isoformat()
}],
ids=[f"mem_{datetime.now().strftime('%Y%m%d%H%M%S')}"]
)
At retrieval time, filter out superseded entries:
results = collection.query(
query_texts=[query],
n_results=5,
where={"superseded": {"$ne": True}}
)
Wiring It All Together: A Reference Architecture
With all three layers in place, a complete session flow looks like this:
Session start:
- Run context script to update
CLAUDE.mdwith recent git activity and open issues - Load project metadata into session context (tech stack, active sprint, known constraints)
During session (per query):
- Intercept the user query
- Run semantic search against project + global memory collections
- Format retrieved memories with source citations
- Inject into Claude Code’s context as a “memory” block before the user message
Session end:
- Extract decisions and resolutions from the session transcript
- Write new entries to the vector store with appropriate scope and source metadata
- Flag any entries that supersede existing knowledge
Periodic maintenance:
- Review auto-extracted entries for accuracy (the
verifiedfield workflow) - Prune stale or low-relevance entries
- Consolidate duplicate entries that say the same thing
This isn’t a complex system — a working implementation runs to maybe 300–400 lines of Python. The complexity is in the discipline of maintaining it, not building it.
Where MindStudio Fits
If you want this memory architecture without wiring up Python scripts, vector stores, and session hooks manually, MindStudio’s Agent Skills Plugin offers a faster path.
The @mindstudio-ai/agent npm SDK lets Claude Code call pre-built capabilities as simple method calls. Instead of standing up a ChromaDB instance and writing retrieval functions from scratch, you can use the plugin’s workflow primitives to handle the infrastructure layer — rate limiting, retries, auth — while your agent focuses on what to remember and when.
Remy is new. The platform isn't.
Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.
A concrete example: instead of building a post-session summarizer from scratch, you can define a MindStudio workflow that takes a session transcript, extracts key decisions, and writes them to a structured memory store — then expose that workflow as a callable endpoint Claude Code can hit at session end.
For teams, MindStudio’s scoped workflow approach also maps naturally to the team-scoped memory model described above. You can build separate workflows with different access controls for different teams, then expose them as typed capabilities the agent calls without needing to know what’s behind them.
You can try MindStudio free at mindstudio.ai.
Common Mistakes to Avoid
Storing too much. Every session produces a lot of text. If you write all of it to the memory store, retrieval quality degrades fast. Only store decisions, patterns, and resolutions — not process.
Skipping source citation. Memory without provenance is just assertion. When Claude Code says “the team decided X,” you want to be able to verify that. Source citations are the difference between trustworthy memory and confident hallucination.
Not scoping at the start. It’s much harder to add scoping retroactively once you have hundreds of entries. Define your scope model — project ID, team, global — before you write your first entry.
Ignoring memory conflicts. Over months, a project accumulates reversed decisions. Without supersession logic, Claude Code will confidently apply rules that no longer apply.
Using memory as a substitute for good documentation. The memory system works best as a cache of active, working knowledge — not a replacement for proper docs. Keep your architecture docs in version control; use the memory system to make them accessible to the agent in context.
Frequently Asked Questions
Does Claude Code have built-in persistent memory?
Not across sessions. Claude Code reads CLAUDE.md files at session start, which provides a form of persistent context, but it doesn’t remember anything from previous sessions natively. Everything you want it to know between sessions must be explicitly injected — either through CLAUDE.md files or a more sophisticated memory retrieval system.
What’s the difference between short-term and long-term memory for AI agents?
Short-term memory is context that lives for the duration of a task or session — the files you’re editing, recent tool outputs, the current problem you’re solving. Long-term memory persists across sessions and is retrieved semantically — you search for it based on relevance to the current query, not because you explicitly called it up. For Claude Code, short-term memory is handled by CLAUDE.md and session context; long-term memory requires a vector store.
What vector database should I use for Claude Code memory?
For a single developer, ChromaDB or SQLite with the sqlite-vec extension are the lowest-friction options — no separate service to run, easy to embed in a project. For teams needing shared memory, Qdrant (self-hosted) or Pinecone (hosted) are solid choices. The database matters less than the schema and retrieval logic you build on top of it.
How do I prevent Claude Code from retrieving outdated or conflicting memories?
Other agents ship a demo. Remy ships an app.
Real backend. Real database. Real auth. Real plumbing. Remy has it all.
Add a superseded boolean field to each memory entry’s metadata. When a decision changes, mark the old entry as superseded and add a new one with a reference back. At retrieval time, filter out superseded entries. This keeps the store accurate without deleting historical context you might want to audit later.
How does scoped memory work for teams?
Each memory entry gets metadata tags indicating which project and team it belongs to. At retrieval time, you filter by the current agent’s project and team context, then also pull from a shared global collection for cross-project conventions. This way, an agent working on the frontend codebase doesn’t get confused by backend-specific rules, and company-wide conventions are still available to everyone.
Can I use this memory system with other AI coding tools, not just Claude Code?
Yes. The architecture — CLAUDE.md-style context files plus a vector store with semantic retrieval — is model-agnostic. The same retrieval functions work whether you’re injecting context into Claude Code, Cursor, GitHub Copilot Workspace, or any other tool that accepts system context. The main variable is how each tool accepts injected context at session start.
Key Takeaways
- Claude Code is stateless by design — persistent memory requires explicit architecture, not assumptions.
- Three layers cover the full spectrum: short-term session context (
CLAUDE.md+ automation), long-term semantic memory (vector store + retrieval), and scoped access (namespaced collections with metadata filtering). - Source citation is non-negotiable — memory without provenance creates confident, unverifiable claims.
- Supersession logic prevents outdated decisions from corrupting future sessions.
- The system writes memory automatically (post-session extraction) and retrieves it semantically — reducing manual overhead without sacrificing accuracy.
If you want to build agents that remember, reason, and act across sessions without managing all this infrastructure manually, MindStudio is worth a look — especially for teams who need scoped, shareable agent capabilities without standing up separate services.

