Skip to content
← All posts
Benchmarks
June 5, 2026 · 2 min read

How unerr cuts code-navigation tokens 86–90%

Your coding agent spends most of its tokens reading your code, not writing it. Every task starts with the same ritual — list files, open files, grep, re-open the same module it saw ten minutes ago. That navigation-and-read loop is the dominant token cost of agentic coding, and it's what unerr attacks.

This post is the methodology and the numbers. The short version:

ToolNavigation tokensRead tokens
unerr−86%−90%
graphify−43%−81%
RTK−30%−49%

Same repos, same tasks, same tokenizer, head to head.

What we measured

The benchmark replays a fixed set of code-navigation and code-reading workloads — the calls an agent actually makes while working: find a symbol, trace its callers, read the relevant slice of a file, get the structure of a module. Each workload runs twice:

  1. Baseline — the agent's stock tools (full-file reads, grep-style search).
  2. Tool under test — the same information need served by unerr, graphify, or RTK.

We count the tokens the model would ingest in each case, with a real tokenizer — not character-count approximations.

The fidelity gate

Compression is trivial if you're allowed to drop information. Every result above is fidelity-gated: the compressed output must still contain the facts the workload needs (the symbol, its signature, its callers, the code slice). A response that saves tokens but loses the answer scores zero. The percentages are savings at equal usefulness, not savings at any cost.

Why the gap

unerr serves agents from a local code graph built ahead of time — entities, references, structure — so a "who calls this?" question is a graph lookup, not a repo-wide grep plus a stack of full-file reads. Reads return the relevant slice with structure, not the whole file. The graph answers in under 5 ms locally, supports 8 languages, and ships 22 MCP tools that agents (Claude Code, Cursor, Copilot, Windsurf, and any MCP-compatible agent) call natively.

Scope — what this number is and isn't

The −86%/−90% is measured on navigation and read workloads — the dominant cost, but not your whole bill. We publish 75%+ as the conservative public figure for overall savings, and the per-agent translation (longer sessions on Claude Code, credits going further on Cursor, smaller API bills on BYO-key agents) varies with each agent's billing model. We'd rather you verify than trust:

Reproduce it on your repo

$ tsx benchmarks/agent-token-bench/run.ts <your-repo>

The harness clones nothing and uploads nothing — it runs locally against your working copy, prints the per-workload table, and the fidelity gate is in the open. If your numbers differ from ours, we want to know.

Benchmark run against unerr v0.2.11.

See it on your own repo

15 minutes, your codebase, real numbers. No slides.