How to reduce Claude Code costs: where the tokens actually go
If you want to reduce Claude Code costs, start with where the tokens actually go: not the code it writes, but the code it reads. Claude Code token usage is dominated by navigation — listing files, opening files, re-opening the same module it saw ten minutes ago — and because the API is stateless, that entire reading history is re-sent on every single message. This guide covers how the pricing works, why the limits feel tighter than they are, and what actually moves the bill.
Why is Claude Code so expensive?
Claude Code is expensive because every message re-sends the full conversation history — including every file it has read — as input tokens. The cost of a session grows with its length squared, not linearly. Most of that growth is file reading and re-reading, which measures at 76% of agent tokens in published research.
Three mechanics compound:
- The re-send tax. The model has no memory between calls. Message 201 carries messages 1–200 with it. A 20-step session at ~1,000 tokens per step bills roughly 210,000 cumulative input tokens — not 20,000.
- The re-read loop. Agents re-open files they already saw. One GitHub issue documents cache-read tokens consuming up to 99.93% of a user's usage quota — quota spent on context the model had already seen.
- Subagent multiplication. Agent teams hold one context window each. Multi-agent sessions run around 7× the tokens of a single thread.
We measured the re-read loop directly in our benchmark: code navigation and reads are the dominant token cost of agentic coding, and they compress 86–90% without losing fidelity.
How does Claude Code pricing work: Pro vs Max vs API?
Claude Code runs on either a subscription (Pro $20/mo, Max 5× $100/mo, Max 20× $200/mo) or pay-as-you-go API billing. Subscriptions are flat-rate with usage limits; the API has no limits but bills every token. For heavy daily use, subscriptions are typically 2–2.5× cheaper than the equivalent API spend.
| Plan | Price | What you get |
|---|---|---|
| Pro | $20/mo | ~6–7M Sonnet input tokens/mo equivalent; 5-hour + weekly limits |
| Max 5× | $100/mo | 5× Pro's limits |
| Max 20× | $200/mo | 20× Pro's limits (~33M+ tokens/mo equivalent) |
| API | per token | No limits; Sonnet $3/$15, Opus $5/$25 per M input/output |
The break-even is well documented: API-only beats Pro only below roughly 50 sessions a month. One widely shared case: a developer's 10B tokens over 8 months would have cost ~$15,000 on the API — they paid ~$800 on Max.
What does Claude Code actually cost per developer?
Anthropic's own published figure is about $6 per developer per day, with 90% of users under $12/day. Enterprise deployments average ~$13 per active day — $150–250 per developer per month. Heavy agentic users report far more: $400–1,500/mo is a common range once agent teams and large monorepos are involved.
The spread between the median and the heavy tail is the real story. The median user chats; the expensive user runs long autonomous sessions on a big repo — exactly the workload where re-reads and context re-sending compound. If you want to estimate your own workload, the interactive cost model on our LLM API optimization page lets you set requests, model and context levers and watch the bill move.
How do Claude Code usage limits work — and why do you hit them so fast?
Limits are two-tier: a rolling 5-hour session window plus a 7-day weekly cap, shared across claude.ai, Claude Code and Desktop. You hit them fast because everything counts against them — including cache reads of context the model already processed.
This is why limits became the loudest complaint of early 2026: in January users measured a ~60% effective reduction after a holiday limit increase reverted, and by March Anthropic publicly called the limits problem its top priority, with Max 5× users reporting windows draining in ~90 minutes.
The part you control: burn rate. A quota consumed by re-reads is a quota you can get back.
What are the hidden token drains?
The three drains most guides skip: file re-reads (the agent rediscovering your codebase every session), a bloated CLAUDE.md (loaded into context at the start of every session — a recurring per-session tax), and full-file reads where a targeted slice would do. None of these are fixed by prompting discipline alone.
A 2,000-line CLAUDE.md costs you on every session forever. So does an agent whose only navigation tools are "list directory" and "read whole file." This is a tooling-layer problem — the agent needs a cheaper way to know your codebase than re-reading it. That is the design goal behind unerr's approach: a local map of the code that serves the agent slices and structure instead of full files.
How do you reduce Claude Code token usage? (ranked by impact)
Highest impact first:
- Fix navigation, not prompts. Reading is ~76% of agent tokens. Giving the agent a structural map of the repo (so it fetches the function, not the file) is the biggest single lever — we measured −86% navigation / −90% read tokens, fidelity-gated.
- Keep sessions short and scoped.
/clearbetween tasks. The re-send tax makes one long session far more expensive than three short ones. - Right-size CLAUDE.md. Keep it to durable invariants. Move task context elsewhere.
- Use
/compactdeliberately (before the auto-trigger) so summaries happen on your terms. - Match the model to the task. Sonnet vs Opus is a 1.7–5× price gap; not every edit needs the frontier model. (Though cost per successful task beats cost per token — a cheap model that fails wastes everything.)
- Be deliberate with subagents. 7× token multiplication should buy real parallelism.
How do you monitor and attribute spend?
Use the built-ins first: /usage and /stats in-session, plus a monthly spend limit on API
billing. For teams, per-project attribution is the view that matters — which repo or workflow is
the quiet budget-burner — and it's exactly what team-level visibility tooling is
for.
The pattern to watch for: one repo, one workflow, or one developer's agent configuration consuming a multiple of the median. That's almost never "they work more" — it's a structural token leak, usually navigation.
Costs and limits cited as of June 2026; plans and prices change — check Anthropic's pricing and cost docs for current figures. For the methodology behind our savings numbers, see the benchmark write-up; for pricing, see plans.