Skip to main content
MindStudio
Pricing
Blog About
My Workspace

How to Build a Persistent Memory System for Claude Code: Short-Term, Long-Term, and Scoped Access

Claude Code forgets everything between sessions. Learn how to build a three-layer memory system with source citation, semantic search, and team-scoped access.

MindStudio Team RSS
How to Build a Persistent Memory System for Claude Code: Short-Term, Long-Term, and Scoped Access

Why Claude Code Keeps Starting from Zero

Every time you close a Claude Code session, everything it learned about your project disappears. The variable naming conventions you mentioned last Tuesday, the architecture decision you explained twice last week, the reason you chose that particular database schema — gone. You’re back to square one.

This statelessness isn’t a bug. It’s how Claude Code is designed. But it creates a real problem for anyone building serious software: you waste time re-explaining context, your agent makes inconsistent decisions, and the longer a project runs, the more painful each new session becomes.

Building a persistent memory system for Claude Code solves this. With the right architecture, Claude Code can recall what it learned last week, apply project-specific rules automatically, and know which decisions belong to which team or codebase. This guide walks through a three-layer approach: short-term session memory, long-term semantic memory, and scoped access that keeps the right context available to the right agent.


The Three-Layer Memory Model

Most memory systems for AI agents fail because they treat memory as a single flat thing — either you have it or you don’t. A more useful model separates memory into three distinct layers:

Short-term (session context): Information that’s relevant for the current task. Examples: the file you’re editing, recent tool outputs, a debugging thread you’re following. This should be fast and cheap to access, but it doesn’t need to survive session resets.

Catch up on Hermes — free 60-minute live workshop
The free Hermes Agent crash courseReserve your spot

Long-term (persistent knowledge base): Information that stays relevant across sessions. Examples: architectural decisions, team conventions, recurring patterns in the codebase, past bug resolutions. This needs semantic search — you don’t always know exactly what you’re looking for.

Scoped access (project or team): Information segmented by who it belongs to. Examples: rules that apply only to the frontend codebase, secrets that only certain agents should see, memory tied to a specific team’s workflow.

Getting all three right means Claude Code behaves like a developer who actually remembers your project — rather than a consultant showing up fresh each morning.


Building Short-Term Memory with CLAUDE.md

The simplest starting point is Claude Code’s built-in support for CLAUDE.md files. Claude Code reads these automatically at the start of each session, which makes them the easiest hook for injecting short-term, session-scoped context.

Setting Up Your CLAUDE.md Structure

You can use a hierarchy of CLAUDE.md files to scope context to different levels:

  • ~/.claude/CLAUDE.md — global rules that apply across all projects
  • /project-root/CLAUDE.md — project-level conventions
  • /project-root/subdirectory/CLAUDE.md — subsystem-specific rules

A well-structured project-level CLAUDE.md might include:

# Project Context

## Architecture
- Backend: Node.js + Express, deployed on AWS Lambda
- Database: PostgreSQL via Prisma ORM
- Frontend: Next.js 14 with App Router

## Conventions
- Always use named exports, never default exports
- Error handling follows the Result pattern (see /lib/result.ts)
- All API routes require JWT validation middleware

## Active Work
- Current sprint: Authentication refactor (ticket AUTH-112)
- Do not modify /auth/legacy until migration is complete

This gives Claude Code immediate session context without any infrastructure. But it’s static — you have to update it manually, and it doesn’t grow with your project.

Automating CLAUDE.md Updates

To make short-term memory actually dynamic, you can write a lightweight script that updates the “Active Work” section of your CLAUDE.md at session start. A simple approach:

#!/bin/bash
# update-context.sh — run before starting claude code
RECENT_COMMITS=$(git log --oneline -5)
OPEN_FILES=$(git diff --name-only HEAD)

cat > .claude_context.md << EOF
## Recent Git Activity
$RECENT_COMMITS

## Modified Files
$OPEN_FILES
EOF

# Append to CLAUDE.md
sed -i '/## Active Work/,$d' CLAUDE.md
echo "## Active Work" >> CLAUDE.md
cat .claude_context.md >> CLAUDE.md

This isn’t semantic memory — it’s just automated context injection. But it covers a large percentage of the “what are we even working on?” overhead that eats time at session start.


Building Long-Term Memory with Semantic Search

Short-term memory handles session setup. Long-term memory is where the real value lives — and it requires a different approach entirely.

The challenge with long-term project knowledge is retrieval. You can’t predict exactly what Claude Code will need. So instead of keyword search (which requires precise matching), you want semantic search: given a query like “how do we handle database migrations,” return the most relevant stored knowledge, even if the stored entry says “schema changes” instead.

Choosing a Vector Store

For most Claude Code setups, the pragmatic options are:

  • ChromaDB — local, no infrastructure required, good for single-developer setups
  • Qdrant — local or self-hosted, better performance at scale
  • Pinecone or Weaviate — hosted options, good for team environments
  • SQLite with sqlite-vec — surprisingly capable for small to medium knowledge bases, no separate service needed
Hermes, walked through line by line — free 1-hour workshop
The free Hermes Agent crash courseReserve your spot

For a solo developer or small team, ChromaDB or SQLite with vector extensions is the least-friction option.

Structuring Memory Entries

Each memory entry should include more than just content. The schema matters:

{
  "id": "mem_20240915_arch_001",
  "content": "We decided to use Prisma over raw SQL because the team has mixed SQL experience and type safety is more important than query flexibility for this project.",
  "metadata": {
    "type": "architecture_decision",
    "project": "payment-service",
    "created_at": "2024-09-15T14:32:00Z",
    "source": "architecture-review-2024-09-15.md",
    "source_line": 47,
    "author": "team",
    "tags": ["database", "orm", "prisma"]
  },
  "embedding": [0.023, -0.41, ...]
}

The source and source_line fields are critical. Without them, you get memory outputs like “you decided to use Prisma” with no way to verify or understand the original context. Source citation turns retrieved memory from assertion into traceable fact.

Implementing Retrieval with Source Citation

Here’s a minimal retrieval function in Python using ChromaDB:

import chromadb
from anthropic import Anthropic

client = chromadb.PersistentClient(path="./memory_store")
collection = client.get_or_create_collection("project_memory")

def retrieve_with_citation(query: str, n_results: int = 5) -> str:
    results = collection.query(
        query_texts=[query],
        n_results=n_results,
        include=["documents", "metadatas", "distances"]
    )
    
    formatted = []
    for doc, meta, dist in zip(
        results["documents"][0],
        results["metadatas"][0],
        results["distances"][0]
    ):
        relevance = round((1 - dist) * 100, 1)
        citation = f"[Source: {meta.get('source', 'unknown')}"
        if meta.get('source_line'):
            citation += f", line {meta['source_line']}"
        citation += f" | Relevance: {relevance}%]"
        
        formatted.append(f"{doc}\n{citation}")
    
    return "\n\n---\n\n".join(formatted)

When Claude Code calls this before generating a response, it gets content plus provenance. The agent can say “according to the architecture review from September 15th” rather than making claims that feel invented.

Writing Memories Automatically

Retrieval is only half the system. You also need a way to write new knowledge into the store without constant manual intervention.

One approach: after each significant Claude Code session, run a post-session summarizer that extracts decisions, patterns, and resolutions and writes them to the store.

def extract_and_store_memory(session_transcript: str, source_file: str):
    extraction_prompt = """
    Review this Claude Code session transcript. Extract:
    1. Architecture or design decisions made
    2. Bugs found and how they were resolved
    3. Conventions or patterns established
    4. Any explicit "remember this" statements
    
    Format each as a single, standalone fact.
    """
    
    anthropic = Anthropic()
    response = anthropic.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": extraction_prompt + "\n\n" + session_transcript}
        ]
    )
    
    facts = parse_facts(response.content[0].text)
    
    for i, fact in enumerate(facts):
        collection.add(
            documents=[fact],
            metadatas=[{
                "type": "session_extract",
                "source": source_file,
                "created_at": datetime.now().isoformat()
            }],
            ids=[f"mem_{datetime.now().strftime('%Y%m%d')}_{i:03d}"]
        )

This closes the loop: sessions write knowledge in, future sessions pull knowledge out.


Implementing Scoped Access

When you’re working across multiple projects or with a team, undifferentiated memory creates more problems than it solves. The agent needs to know that the authentication conventions from the payment service shouldn’t apply to the marketing site.

Scoped access means organizing memory so Claude Code retrieves context appropriate to its current task — and doesn’t pull in irrelevant or conflicting information.

Project-Level Scoping

The simplest form of scoping is namespacing your ChromaDB collections by project:

def get_collection(project_id: str):
    return client.get_or_create_collection(f"memory_{project_id}")

def retrieve_for_project(query: str, project_id: str, n_results: int = 5):
    collection = get_collection(project_id)
    return collection.query(query_texts=[query], n_results=n_results)

You can extend this with a global collection that holds shared conventions, then merge results:

def retrieve_with_scope(query: str, project_id: str) -> str:
    project_results = retrieve_for_project(query, project_id, n_results=3)
    global_results = retrieve_for_project(query, "global", n_results=2)
    
    # Combine, tagging each result with its scope
    all_results = tag_results(project_results, "project") + \
                  tag_results(global_results, "global")
    
    return format_results(all_results)
REMY IS NOT
  • a coding agent
  • no-code
  • vibe coding
  • a faster Cursor
IT IS
a general contractor for software

The one that tells the coding agents what to build.

Team-Scoped Access

For team environments, you need memory that’s shared across developers working on the same project. The pattern here is a shared vector store (hosted or self-hosted) with access control at the metadata level.

Each memory entry gets a team field in its metadata:

{
  "content": "Never use async/await in the legacy worker files — they predate our Node version upgrade.",
  "metadata": {
    "team": "backend",
    "project": "core-api",
    "access": "backend-team",
    "created_by": "carlos",
    "verified": true
  }
}

At retrieval time, you filter by team:

def retrieve_for_team(query: str, project_id: str, team: str):
    collection = get_collection(project_id)
    return collection.query(
        query_texts=[query],
        n_results=5,
        where={"$or": [
            {"access": team},
            {"access": "all"}
        ]}
    )

The verified flag is worth adding. Memory entries marked as verified by a human have higher trust than ones extracted automatically from sessions. You can use this to weight results or flag uncertainty in retrieved content.

Handling Memory Conflicts

In long-running projects, you’ll eventually have conflicting memories: an early decision that got reversed, a convention that changed. Without conflict resolution, Claude Code will confidently apply outdated rules.

A simple mitigation: timestamp all entries and allow them to be marked as superseded.

def supersede_memory(old_id: str, new_content: str, reason: str):
    # Mark old entry as superseded
    old = collection.get(ids=[old_id])
    old_meta = old["metadatas"][0]
    old_meta["superseded"] = True
    old_meta["superseded_reason"] = reason
    collection.update(ids=[old_id], metadatas=[old_meta])
    
    # Add new entry with reference to old
    collection.add(
        documents=[new_content],
        metadatas=[{
            "supersedes": old_id,
            "created_at": datetime.now().isoformat()
        }],
        ids=[f"mem_{datetime.now().strftime('%Y%m%d%H%M%S')}"]
    )

At retrieval time, filter out superseded entries:

results = collection.query(
    query_texts=[query],
    n_results=5,
    where={"superseded": {"$ne": True}}
)

Wiring It All Together: A Reference Architecture

With all three layers in place, a complete session flow looks like this:

Session start:

  1. Run context script to update CLAUDE.md with recent git activity and open issues
  2. Load project metadata into session context (tech stack, active sprint, known constraints)

During session (per query):

  1. Intercept the user query
  2. Run semantic search against project + global memory collections
  3. Format retrieved memories with source citations
  4. Inject into Claude Code’s context as a “memory” block before the user message

Session end:

  1. Extract decisions and resolutions from the session transcript
  2. Write new entries to the vector store with appropriate scope and source metadata
  3. Flag any entries that supersede existing knowledge

Periodic maintenance:

  1. Review auto-extracted entries for accuracy (the verified field workflow)
  2. Prune stale or low-relevance entries
  3. Consolidate duplicate entries that say the same thing

This isn’t a complex system — a working implementation runs to maybe 300–400 lines of Python. The complexity is in the discipline of maintaining it, not building it.


Where MindStudio Fits

If you want this memory architecture without wiring up Python scripts, vector stores, and session hooks manually, MindStudio’s Agent Skills Plugin offers a faster path.

The @mindstudio-ai/agent npm SDK lets Claude Code call pre-built capabilities as simple method calls. Instead of standing up a ChromaDB instance and writing retrieval functions from scratch, you can use the plugin’s workflow primitives to handle the infrastructure layer — rate limiting, retries, auth — while your agent focuses on what to remember and when.

Remy is new. The platform isn't.

Remy
Product Manager Agent
THE PLATFORM
200+ models 1,000+ integrations Managed DB Auth Payments Deploy
BUILT BY MINDSTUDIO
Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

A concrete example: instead of building a post-session summarizer from scratch, you can define a MindStudio workflow that takes a session transcript, extracts key decisions, and writes them to a structured memory store — then expose that workflow as a callable endpoint Claude Code can hit at session end.

For teams, MindStudio’s scoped workflow approach also maps naturally to the team-scoped memory model described above. You can build separate workflows with different access controls for different teams, then expose them as typed capabilities the agent calls without needing to know what’s behind them.

You can try MindStudio free at mindstudio.ai.


Common Mistakes to Avoid

Storing too much. Every session produces a lot of text. If you write all of it to the memory store, retrieval quality degrades fast. Only store decisions, patterns, and resolutions — not process.

Skipping source citation. Memory without provenance is just assertion. When Claude Code says “the team decided X,” you want to be able to verify that. Source citations are the difference between trustworthy memory and confident hallucination.

Not scoping at the start. It’s much harder to add scoping retroactively once you have hundreds of entries. Define your scope model — project ID, team, global — before you write your first entry.

Ignoring memory conflicts. Over months, a project accumulates reversed decisions. Without supersession logic, Claude Code will confidently apply rules that no longer apply.

Using memory as a substitute for good documentation. The memory system works best as a cache of active, working knowledge — not a replacement for proper docs. Keep your architecture docs in version control; use the memory system to make them accessible to the agent in context.


Frequently Asked Questions

Does Claude Code have built-in persistent memory?

Not across sessions. Claude Code reads CLAUDE.md files at session start, which provides a form of persistent context, but it doesn’t remember anything from previous sessions natively. Everything you want it to know between sessions must be explicitly injected — either through CLAUDE.md files or a more sophisticated memory retrieval system.

What’s the difference between short-term and long-term memory for AI agents?

Short-term memory is context that lives for the duration of a task or session — the files you’re editing, recent tool outputs, the current problem you’re solving. Long-term memory persists across sessions and is retrieved semantically — you search for it based on relevance to the current query, not because you explicitly called it up. For Claude Code, short-term memory is handled by CLAUDE.md and session context; long-term memory requires a vector store.

What vector database should I use for Claude Code memory?

For a single developer, ChromaDB or SQLite with the sqlite-vec extension are the lowest-friction options — no separate service to run, easy to embed in a project. For teams needing shared memory, Qdrant (self-hosted) or Pinecone (hosted) are solid choices. The database matters less than the schema and retrieval logic you build on top of it.

How do I prevent Claude Code from retrieving outdated or conflicting memories?

Other agents ship a demo. Remy ships an app.

UI
React + Tailwind ✓ LIVE
API
REST · typed contracts ✓ LIVE
DATABASE
real SQL, not mocked ✓ LIVE
AUTH
roles · sessions · tokens ✓ LIVE
DEPLOY
git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

Add a superseded boolean field to each memory entry’s metadata. When a decision changes, mark the old entry as superseded and add a new one with a reference back. At retrieval time, filter out superseded entries. This keeps the store accurate without deleting historical context you might want to audit later.

How does scoped memory work for teams?

Each memory entry gets metadata tags indicating which project and team it belongs to. At retrieval time, you filter by the current agent’s project and team context, then also pull from a shared global collection for cross-project conventions. This way, an agent working on the frontend codebase doesn’t get confused by backend-specific rules, and company-wide conventions are still available to everyone.

Can I use this memory system with other AI coding tools, not just Claude Code?

Yes. The architecture — CLAUDE.md-style context files plus a vector store with semantic retrieval — is model-agnostic. The same retrieval functions work whether you’re injecting context into Claude Code, Cursor, GitHub Copilot Workspace, or any other tool that accepts system context. The main variable is how each tool accepts injected context at session start.


Key Takeaways

  • Claude Code is stateless by design — persistent memory requires explicit architecture, not assumptions.
  • Three layers cover the full spectrum: short-term session context (CLAUDE.md + automation), long-term semantic memory (vector store + retrieval), and scoped access (namespaced collections with metadata filtering).
  • Source citation is non-negotiable — memory without provenance creates confident, unverifiable claims.
  • Supersession logic prevents outdated decisions from corrupting future sessions.
  • The system writes memory automatically (post-session extraction) and retrieves it semantically — reducing manual overhead without sacrificing accuracy.

If you want to build agents that remember, reason, and act across sessions without managing all this infrastructure manually, MindStudio is worth a look — especially for teams who need scoped, shareable agent capabilities without standing up separate services.

Related Articles

What Is the Harness Maintenance Checklist? 5 Questions to Ask Before Every Model Update

Before updating your AI agent's model, audit what it reads, what it can touch, what its job is, what proof it provides, and whether it still delivers value.

Workflows Multi-Agent AI Concepts

AI Agent Harness Maintenance: Why Agents Break When Models Get Better

Agents can fail not because the model degraded but because it improved. Learn why harness maintenance is the most underrated skill in agentic AI development.

Workflows Multi-Agent AI Concepts

How to Use Claude Code /goal and Auto Mode Together for Fully Autonomous Workflows

Combine Claude Code's Auto Mode and /goal command to run tasks end-to-end without approvals or early stops. Here's the setup and when to use it.

Workflows Automation Multi-Agent

How to Build an Expert AI Coding Workflow: Skills, Automations, Loops, and Cloud Agents

Top agentic coders use skills, automations, loops, and cloud agents to ship code 24/7. Here's the full workflow from beginner prompting to expert automation.

Workflows Automation Multi-Agent

AI Agent Harness Maintenance: Why Your Wrapper Breaks When the Model Gets Better

Agents break when models improve, not just when they fail. Learn the four principles of harness maintenance that keep AI workflows reliable over time.

Workflows Automation Multi-Agent

How to Build an AI Stock Trading Bot With OpenClaw: Strategy, Setup, and Lessons Learned

Learn how to build an autonomous stock trading agent with OpenClaw, including strategy design, cron job scheduling, and what not to do with options.

Multi-Agent Finance Automation

Presented by MindStudio

No spam. Unsubscribe anytime.