Why your coding agent forgets everything — and what actually gives it memory
AI coding agent memory is a workaround, not a feature of the model: LLMs are stateless by design, so every session starts with a blank context window. Whatever your agent learned about your codebase yesterday — the architecture, the conventions, the decision you explained twice — is gone, and it pays tokens to rediscover it. This guide maps the memory tooling landscape by category, with honest tradeoffs, and puts a number on the cost of forgetting.
Why does your coding agent forget everything between sessions?
Because the model has no storage. An LLM's only "memory" is the context window of the current request; close the session and it's empty. Agent products layer persistence on top — files loaded at startup, summaries, retrieval — but the model itself never remembers. Each session begins with your agent knowing nothing about your repo.
The visible symptom is the re-read tax: the agent re-lists directories, re-opens the same files, re-derives the same understanding at the start of every session. Measured across real agent workloads, reading and navigation is ~76% of all agent tokens — and a meaningful share of that is re-discovery of things a previous session already knew. At typical agent costs of $400–1,500/month for heavy users, the forgetting itself is a line item.
What about compaction — doesn't the agent summarize its context?
Compaction keeps a session alive; it doesn't preserve knowledge. When the window fills (Claude Code auto-compacts near ~95% capacity), the history is summarized — and detail is lost. Early decisions, subtle constraints, the why behind a change: summaries keep conclusions and drop reasoning.
Two practical notes from the field: facts in a rules file survive compaction while facts in chat history don't — so durable invariants belong in files, not conversation — and compaction can fail outright on very long conversations, forcing a context-clearing restart at exactly the moment the accumulated understanding was most valuable.
What are the options for giving an agent memory?
Four categories, each with a real failure mode:
| Approach | What it is | Where it breaks |
|---|---|---|
| Rules files (CLAUDE.md, AGENTS.md, .cursor/rules) | Markdown loaded every session | Manual upkeep; goes stale; taxes every session it bloats |
| Built-in auto-memory | Agent-maintained notes (e.g. MEMORY.md) | Lossy — captures what seemed important, capped at startup load |
| Memory MCP servers | External store served over MCP | Quality varies by what's stored and how it's retrieved |
| Vector RAG over the repo | Embedding search on code | Falling out of favor for code — similarity ≠ structure |
Rules files are the converging standard — AGENTS.md was adopted across major agents in 2026 — and they're genuinely good for durable invariants. Their structural weakness is maintenance: hand-written notes capture what you thought mattered, and developers skip writing them ~40% of the time. Stale memory is worse than no memory; the agent trusts it.
Why is vector RAG falling out of favor for code?
Because code questions are structural, not semantic. "Who calls this function?" has an exact answer; embedding similarity gives an approximate one. Notably, the top-scoring SWE-bench Verified agent systems don't use vector retrieval over the repo — they navigate with file trees, grep and exact reads instead.
RAG still earns its keep on prose — docs, tickets, design discussions — where similarity is the right primitive. For code itself, the alternatives are the agent's stock navigation (exact but token-hungry: whole files, grep dumps) or a code graph — a structural index that answers "definition, callers, slice" directly. We compare the two approaches in depth in RAG vs code graph, and the token difference is what our benchmark measures: −86% navigation / −90% read tokens against stock navigation, fidelity-gated.
What should live where? (a working split)
Keep rules files small and durable: conventions, invariants, commands — the things that are true every session. Let session knowledge (what the agent learned, decided, and got blocked on) live in a system that maintains itself, because anything manual will rot. Use retrieval for prose, and structure for code.
The test for any memory setup is the Monday-morning question: when a fresh session starts on a codebase the team worked on all last week, does the agent know — the architecture, the conventions the team accreted, what failed here before — or does it spend the first 50,000 tokens finding out? That's the gap unerr is built for: a local, continuously-updated map of the codebase plus the team's conventions and session history, served to Claude Code, Cursor, Copilot or any MCP agent — memory that doesn't depend on anyone remembering to write it down. What it costs when agents don't have it is exactly the re-read tax — estimate yours here.
Related: token optimization for coding agents · reduce Claude Code costs · the benchmark methodology · FAQ.