Skip to content
← All posts
Guides
June 6, 2026 · Updated June 12, 2026 · 4 min read

Why your coding agent forgets everything — and what actually gives it memory

AI coding agent memory is a workaround, not a feature of the model: LLMs are stateless by design, so every session starts with a blank context window. Whatever your agent learned about your codebase yesterday — the architecture, the conventions, the decision you explained twice — is gone, and it pays tokens to rediscover it. This guide maps the memory tooling landscape by category, with honest tradeoffs, and puts a number on the cost of forgetting.

Why does your coding agent forget everything between sessions?

Because the model has no storage. An LLM's only "memory" is the context window of the current request; close the session and it's empty. Agent products layer persistence on top — files loaded at startup, summaries, retrieval — but the model itself never remembers. Each session begins with your agent knowing nothing about your repo.

The visible symptom is the re-read tax: the agent re-lists directories, re-opens the same files, re-derives the same understanding at the start of every session. Measured across real agent workloads, reading and navigation is ~76% of all agent tokens — and a meaningful share of that is re-discovery of things a previous session already knew. At typical agent costs of $400–1,500/month for heavy users, the forgetting itself is a line item.

What about compaction — doesn't the agent summarize its context?

Compaction keeps a session alive; it doesn't preserve knowledge. When the window fills (Claude Code auto-compacts near ~95% capacity), the history is summarized — and detail is lost. Early decisions, subtle constraints, the why behind a change: summaries keep conclusions and drop reasoning.

Two practical notes from the field: facts in a rules file survive compaction while facts in chat history don't — so durable invariants belong in files, not conversation — and compaction can fail outright on very long conversations, forcing a context-clearing restart at exactly the moment the accumulated understanding was most valuable.

What are the options for giving an agent memory?

Four categories, each with a real failure mode:

ApproachWhat it isWhere it breaks
Rules files (CLAUDE.md, AGENTS.md, .cursor/rules)Markdown loaded every sessionManual upkeep; goes stale; taxes every session it bloats
Built-in auto-memoryAgent-maintained notes (e.g. MEMORY.md)Lossy — captures what seemed important, capped at startup load
Memory MCP serversExternal store served over MCPQuality varies by what's stored and how it's retrieved
Vector RAG over the repoEmbedding search on codeFalling out of favor for code — similarity ≠ structure

Rules files are the converging standard — AGENTS.md was adopted across major agents in 2026 — and they're genuinely good for durable invariants. Their structural weakness is maintenance: hand-written notes capture what you thought mattered, and developers skip writing them ~40% of the time. Stale memory is worse than no memory; the agent trusts it.

Why is vector RAG falling out of favor for code?

Because code questions are structural, not semantic. "Who calls this function?" has an exact answer; embedding similarity gives an approximate one. Notably, the top-scoring SWE-bench Verified agent systems don't use vector retrieval over the repo — they navigate with file trees, grep and exact reads instead.

RAG still earns its keep on prose — docs, tickets, design discussions — where similarity is the right primitive. For code itself, the alternatives are the agent's stock navigation (exact but token-hungry: whole files, grep dumps) or a code graph — a structural index that answers "definition, callers, slice" directly. We compare the two approaches in depth in RAG vs code graph, and the token difference is what our benchmark measures: −86% navigation / −90% read tokens against stock navigation, fidelity-gated.

What should live where? (a working split)

Keep rules files small and durable: conventions, invariants, commands — the things that are true every session. Let session knowledge (what the agent learned, decided, and got blocked on) live in a system that maintains itself, because anything manual will rot. Use retrieval for prose, and structure for code.

The test for any memory setup is the Monday-morning question: when a fresh session starts on a codebase the team worked on all last week, does the agent know — the architecture, the conventions the team accreted, what failed here before — or does it spend the first 50,000 tokens finding out? That's the gap unerr is built for: a local, continuously-updated map of the codebase plus the team's conventions and session history, served to Claude Code, Cursor, Copilot or any MCP agent — memory that doesn't depend on anyone remembering to write it down. What it costs when agents don't have it is exactly the re-read tax — estimate yours here.


Related: token optimization for coding agents · reduce Claude Code costs · the benchmark methodology · FAQ.

See it on your own repo

Free to start. One install, your codebase, real numbers.