What is token ops?

Token ops is the practice of managing the unit economics of LLM token consumption: measuring where tokens go, attributing them to teams and workloads, and engineering them down without losing output quality. It treats tokens the way FinOps treats cloud spend — as a metered resource that nobody owns until someone does.

The founding observation of token ops for coding workloads: tokens don't go where intuition says. Measured on real agent runs, 76.1% of tokens are spent on read-type operations — reading and navigating code — not generating it. Optimizing the model rate while ignoring the read volume tunes the small number.

A working token-ops loop has three parts:

Measure — per-task, per-repo, per-team attribution (not just a monthly total)
Attribute — find the structural leaks: the workflow or repo consuming a multiple of the median
Engineer — fix the dominant meter first; for agents that's context volume, per the four-meter breakdown

Token ops sits inside LLM ops and leans on context ops for its biggest lever. For the coding-agent case — where the discipline matters most per dollar — start with token optimization for AI coding agents, or estimate your own waste directly.

See it on your own repo