Research · The cost + governance engine
Every cost optimizer reads the words in your request. We read the relations in your data.
LLM spend and LLM risk are the same problem, and they're priced by the same signal: what the work actually touches. unerr is the engine that routes every request, shrinks every context, and checks every change off one relation graph — in one loop.
§1
The four meters
Every product built on an LLM API pays on the same four meters. Know these and the rest of this page is just arithmetic.
Roundtrips
A roundtrip is one request to the model and its reply. An agent does many per task — and the model has no memory between them, so the entire context is re-sent on every single roundtrip.
Which model
Model choice is the widest price lever: the gap between a low-cost and a frontier model is roughly 5–25×, and wider on output tokens.
Cache write vs cache read
A cache read of an unchanged prefix is ~10× cheaper than sending it fresh — but the cache is per model, per exact prefix. Switch models and the saving is gone.
Context size
Context size is the multiplier on every other meter: each roundtrip's bill is proportional to how big the context has grown. Reading a whole file into context taxes every future step, not just this one.
Most teams optimize none of these. Most tools optimize one. The four compound — and that's the engine's job.
§2
A worked example you can audit
One task — “add a remember-me checkbox” — run two ways, every roundtrip on the bill. Same task, same outcome, public API prices. The ledgers are rounded and internally consistent; expand them and check the arithmetic yourself.
Flow A — one frontier model, naive loop
$0.29
13 roundtrips
- The gather is spread over roundtrips 1–4 and 9–10 — the same files re-enter the context twice.
- The break at roundtrip 8 was avoidable: the model edited a function without knowing another file called it, and paid three roundtrips of rework.
- Every roundtrip re-sends everything accumulated so far, at frontier input prices.
Flow B — gather once, route, gate
$0.03
6 roundtrips
- The context is gathered once, in code, before the first model turn — with the dependency warning included, so the third edit happens on purpose, not as rework.
- Mechanical steps run on a low-cost model; anything risky escalates deliberately, with the reason logged.
- The same check that priced the work gates every edit before it lands.
Worth saying plainly: caching is already pulling its weight in Flow A. Uncached, those ~280K re-read tokens would bill at full input price and the task would cost ~$1.02 instead of ~$0.29. That floor exists in every good tool. We keep it — the gap between $0.29 and $0.03 is what comes on top of it.
Same pattern, bigger scale: a real session of this shape ran 158 roundtrips, re-read its prefix ~11×, hit cache 91% of the time, and cost ~$15.19 — it would have been ~$56.69 uncached (ours, measured — session log 7d39c737).
Illustrative and best-case on purpose: this task stayed entirely on the low-cost model, and the model gap was near its widest. The honest blended target is lower. §6 states what we won't claim.
§3
The cost model, with every lever exposed
From 2023 to 2025, the pitch was: it's just an API call. Wrap it and ship. The teams who shipped learned otherwise: a prototype is an afternoon; a production system is a quarter — and the difference is a cockpit's worth of parameters that the demo never showed you. Industry analyses now put the true operating cost of an LLM feature at 2.3–4.1× the raw API price. None of that overhead is on the pricing page.
Below is the calculator — four layers of levers, A through D. Layer A is your numbers: how many requests, what kind of work, which model. Open the rest and watch what each operational decision does to the bill. Every multiplier is cited or labeled as modeled, and the formula is inspectable. This is the complexity the “just call the API” era skipped — and the reason cost optimization is an operational discipline, not a pricing-page choice.
AYour numbers
Workload
Model tier
Naive-wrapper bill / month
$3.6k
Every operational lever at its unmanaged default. This is the bill the “three inputs” mental model predicts.
Sixteen levers, four meters, every one interacting — and this model is the simplified version. Some teams staff this. The rest put an engine in the loop that flies the marked levers continuously, and reviews what it can't.
§4
The problem cheap optimization creates
Optimizing cost on words alone has a failure mode, and it's not a smaller bill.
- (a)
Word-counting optimizers send your request to the cheapest model whose answer looks plausible. “Looks plausible” is judged on the words — not on what the change touches.
- (b)
Cheaper models are cheaper for a reason: they miss more. On easy work that's free money. On work that touches the wrong thing, it's a production incident with a smaller invoice attached.
- (c)
And the failure is silent. The output still reads well. In one study of AI-generated code changes, engineers skipped reviewing advisory warnings 58% of the time when the diff looked clean (CodeCompass, 2025).
- (d)
The expensive part of an LLM mistake was never the tokens. Industry post-incident analyses routinely price a single bad production change in the tens of thousands of dollars — four to five orders of magnitude above the tokens saved on the request that caused it.
- (e)
So a cost optimizer that can't see what the work touches isn't reducing your spend. It's moving it from the invoice you watch to the incident budget you don't.
Easy isn't safe.
§5
One graph, one loop
The engine builds a relation graph of your system — what calls what, what reads what, what a change here reaches over there. Every request then makes three moves off that one graph, in one loop.
1
Route
Before any model is called, the engine asks the graph what this request actually touches. A change that reaches nothing critical goes to a low-cost model. A change whose relations fan out into payment paths or auth goes to a frontier model — deliberately, with the reason logged.
2
Shrink
The graph knows which relations matter for this request, so the context carries those rows — not whole files. Smaller context, every roundtrip, on every meter at once.
3
Guard
After the model answers, the same rows that priced the work check it. If the change touches an edge the request never declared, it's flagged before it lands — not in the postmortem.
One loop
one relation graph — all three moves read the same rows
The point is the row: the same relation that makes routing safe is the one that makes the context small and the check cheap. Spend and risk aren't two products. They're one read of the same data.
That's also why this can't be bolted onto a word-counting tool later. A proxy that never sees your relations has nothing to route on, nothing to shrink with, and nothing to check against. The graph isn't a feature of the engine. It's the doorway the engine walks through.
§6
What we claim, and what we don't
We claim
60–75% reduction in production inference spend where context dominates the bill — modeled, eval-gated before any contract cites it.
20–35% in dev-and-test workloads, where contexts are smaller and caching already helps — modeled, same gate.
Every routing decision logged with the relation rows that made it — auditable per request, by design.
We don't claim
No graph, no edge. On a system we haven't mapped, we are exactly as blind as everyone else — and we'll say so.
On trivial, relation-free tasks, expect parity with table-stakes routing. The engine earns its keep where the work touches things.
No number on this page is a result until our public eval says it is. Until then, every figure is labeled what it is: a model.
The metric that matters isn't cost per token. It's cost per finished outcome — the task done, the change safe, the bill explained. That's the only number we're building toward.
§7
One engine, any relation-shaped domain
The engine doesn't know what a function is. It knows what a relation is. Anywhere your data has relations the words don't carry, the same three moves apply.
| Domain | The relations | What the guard catches |
|---|---|---|
| Code | Calls, imports, deploy paths | A cheap edit that silently reaches a payment path |
| Data | Lineage, schema, downstream jobs | A column rename that breaks nightly pipelines |
| Support | Customer ↔ contract ↔ entitlement | A refund promised outside the customer's actual plan |
| Legal | Clause ↔ precedent ↔ obligation | A summary that drops a cross-referenced obligation |
Code is where we proved it first, because code is where the relations are densest and the failures are loudest. Where your domain has no graph yet, you get table-stakes routing and no worse — and we'll tell you that before you pay us.
The coding-agent product built on this engine lives at unerr.dev.
Bring your domain. We'll wire the engine.
We run a small number of design-partner pilots: your workload, your invoice, our graph. You see every routing decision and every dollar — and you keep the analysis either way.
Numbers on this page are modeled from public API prices and our own measured sessions, and are labeled as such. Ask us for the methodology — we'll send the spreadsheet.