What is LLM ops?
LLM ops is the operational discipline of running LLM-powered systems in production: managing the cost, reliability, observability and governance of the model calls your software — and your developers' tools — make every day. Where MLOps managed training models, LLM ops manages consuming them: API spend, token budgets, model routing, caching, quality monitoring and policy.
The discipline exists because LLM consumption became a real budget line: 37% of enterprises now spend over $250K/year on LLM APIs, and 72% expect that to grow. Spend that big, that variable, and that easy for one workload to blow up gets an operations function — the same way cloud spend got FinOps.
In practice LLM ops decomposes into narrower disciplines:
- Token ops — managing the unit economics of token consumption
- Agent ops — operating autonomous agents specifically
- Context ops — controlling what enters the context window
- Memory ops — persisting knowledge across sessions
For engineering teams, the largest and least-governed LLM workload is usually coding agents — see token optimization for AI coding agents for the mechanics, and how unerr runs this loop for one system that does the routing, shrinking and guarding in one place.