RAG vs code graph: how should a coding agent know your codebase?

RAG and code graphs are both answers to the same question — how does a coding agent know a codebase it can't fit in context? — but they answer different kinds of questions. Vector RAG retrieves by semantic similarity: "what looks like this?" A code graph indexes structure: "what calls this, what does this import, where is this defined?" For coding agents, that difference decides both correctness and cost.

How does each approach work?

Vector RAG chunks the repo, embeds the chunks, and at query time retrieves the most similar ones into context. It's the default architecture for document Q&A, applied to code.

A code graph parses the repo into its actual structure — definitions, references, imports, call edges — and serves exact answers: this function's body, its callers, the file's outline. No similarity, no ranking; the graph either has the edge or it doesn't.

	Vector RAG	Code graph
Query model	"What's similar to X?"	"What calls/defines/imports X?"
Answer type	Approximate, ranked chunks	Exact entities and edges
Strong at	Prose: docs, tickets, discussions	Structure: navigation, blast radius
Weak at	Precise structural questions	Fuzzy conceptual search
Failure mode	Plausible-but-wrong chunk	Missing index = no answer (visible)
Freshness	Re-embed on change	Re-parse on change

Why is RAG losing ground for code navigation?

Because agent questions about code are overwhelmingly structural, and similarity approximates what structure answers exactly. "Who calls processPayment?" has a correct answer; the top-5-similar-chunks version of it can miss a caller — and a silently missing caller is how agents ship breaking changes. It's notable that the top-scoring SWE-bench Verified agent systems don't use vector retrieval over the repo; they navigate with file trees, grep and exact reads.

But stock navigation has its own cost problem: exact, yes — and token-hungry. Whole-file reads and grep dumps are why reads run ~76% of agent tokens. The practical contest isn't RAG vs grep; it's expensive-exact (raw reads) vs cheap-exact (a graph serving just the slice).

What does the token bill say?

Retrieval cost differs less than what enters the context afterward. RAG injects k chunks whether or not they're right — and wrong chunks both waste their own tokens and trigger follow-up reads. Raw navigation injects whole files. A graph injects the entity: the function body, the caller list, the outline.

That last difference is what our benchmark measures head-to-head on real repos: serving navigation and reads from a code graph cut navigation tokens 86% and read tokens 90% against stock tools, fidelity-gated with a real tokenizer — the compressed answer had to contain every fact the task needed. And because every admitted token re-bills on every subsequent roundtrip, the context saving compounds across the whole session.

So when is RAG still the right tool?

When the corpus is prose and the question is fuzzy. Design docs, past incidents, tickets, discussions — similarity is the right primitive there, and a graph has nothing to offer prose. Conceptual code questions with no structural anchor ("where do we handle retries, roughly?") also suit hybrid setups: retrieve to locate, then walk the graph to verify.

The clean division of labor: RAG for knowledge, graph for structure — and for coding agents, whose questions are mostly structural, that puts the graph at the center of context ops. That's the architecture bet unerr makes: a local graph of your codebase, served to every MCP agent your team runs, with prose-shaped memory layered on top (memory integration) rather than embeddings underneath.

How does each approach work?

Why is RAG losing ground for code navigation?

What does the token bill say?

So when is RAG still the right tool?

See it on your own repo