What is context ops?
Context ops is the discipline of controlling what enters an LLM's context window: which files, how much of them, in what form, and for how long they stay. It matters because context size is the multiplier on every other cost lever — each roundtrip re-sends the whole window, so every token admitted taxes every subsequent step of the session.
The scale of the problem is measured: coding agents spend ~76% of their tokens on reading and navigating code, and the default reading unit — the whole file — is almost always larger than the information need. Reading a 500-line file to answer a 20-line question doesn't cost once; it costs on every roundtrip until the session ends.
Context ops in practice:
- Admit slices, not files — serve the function, the callers, the structure; this is where a code graph beats raw reads and where our measured −86%/−90% reductions come from
- Evict deliberately — fresh sessions per task; compaction on your terms, not the auto-trigger
- Keep the prefix stable — so caching fires; see prompt caching vs context trimming
- Right-size the standing load — rules files load every session; bloat is a recurring tax
Context ops is the highest-leverage branch of token ops: rates are set by providers, but the window is yours. The agent-specific playbook is in token optimization for AI coding agents; the four-meter model shows why the multiplier dominates.