What is LLM ops?

LLM ops is the operational discipline of running LLM-powered systems in production: managing the cost, reliability, observability and governance of the model calls your software — and your developers' tools — make every day. Where MLOps managed training models, LLM ops manages consuming them: API spend, token budgets, model routing, caching, quality monitoring and policy.

The discipline exists because LLM consumption became a real budget line: 37% of enterprises now spend over $250K/year on LLM APIs, and 72% expect that to grow. Spend that big, that variable, and that easy for one workload to blow up gets an operations function — the same way cloud spend got FinOps.

In practice LLM ops decomposes into narrower disciplines:

Token ops — managing the unit economics of token consumption
Agent ops — operating autonomous agents specifically
Context ops — controlling what enters the context window
Memory ops — persisting knowledge across sessions

For engineering teams, the largest and least-governed LLM workload is usually coding agents — see token optimization for AI coding agents for the mechanics, and how unerr runs this loop for one system that does the routing, shrinking and guarding in one place.

See it on your own repo