Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →In the next five years, every large enterprise will have a silent war between engineering teams and finance teams over how to measure AI cost. Engineers will say cost-per-token. CFOs will say cost-per-outcome. The framing you choose determines who controls the AI budget and how much the company actually spends on AI.
The Engineer's Frame: Cost Per Token
Cost-per-token is the engineer's natural frame. It's precise, measurable, and vendor-neutral. OpenAI charges $0.10 per 1 million input tokens and $0.30 per 1 million output tokens for GPT-4 Turbo. That's a concrete number. A 1,000-token request costs $0.0001. If you process 1 billion tokens per month, your cost is $100,000. Simple arithmetic.
From an engineering standpoint, cost-per-token is the right metric. It directly aligns with LLM selection, prompt optimization, and model upgrade decisions. If you switch from GPT-4 ($0.03 per 1K tokens) to GPT-4 Turbo ($0.10 per 1K tokens), you know exactly how much more you're paying. If you reduce your average prompt from 2,000 tokens to 1,500 tokens, you know you're cutting cost by 25%.
Cost-per-token is engineer-relevant.
The CFO's Frame: Cost Per Outcome
Cost-per-outcome is the CFO's natural frame. It answers the question: "What did it cost to process this customer ticket, adjudicate this claim, or originate this loan?" It includes not just tokens but integrations, infrastructure, human review, and opportunity cost.
Klarna's AI customer service agent costs $0.19 per resolved ticket. That's cost-per-outcome. It includes API cost, but also:
- The infrastructure to route tickets
- The integrations to look up customer history
- The vector database to search prior solutions
- The human escalation path for complex issues
- The training and evaluation to improve accuracy
You can't break down that $0.19 into constituent parts because they're bound together operationally. The system costs $0.19 to produce one resolved ticket.
Cost-per-outcome is CFO-relevant, operations-relevant, and financially meaningful.
Why the Framing Matters
The framing determines the incentive structure:
Cost-per-token framing:
- Incentive: Minimize tokens per request
- Technique: Reduce prompt length, use cheaper models, limit context
- Result: Engineers optimize for efficiency, sometimes at the expense of quality
- Control: Engineering team owns the budget and optimization
Cost-per-outcome framing:
- Incentive: Minimize cost per resolved ticket, claim, or application
- Technique: Improve accuracy (fewer escalations), reduce retries (better integrations), optimize human review time
- Result: Organization optimizes for business value, not token efficiency
- Control: Finance team owns the budget and ties it to operational outcomes
These are fundamentally different optimization targets. An engineer optimizing for cost-per-token might reduce context length, which hurts accuracy, which increases human review time, which actually increases total cost. But the engineer's dashboard shows lower token cost, so they think they're winning.
A CFO optimizing for cost-per-outcome would increase context length (higher token cost) to improve accuracy (lower human review cost), because the end-to-end math is better.
The Benchmarking Problem
Here's why the framing battle matters in practice:
When Klarna says "our agent costs $0.19 per resolved ticket," they're using cost-per-outcome framing. But they don't break down the $0.19 into components. A CFO wants to know: is $0.19 expensive or cheap? They can't tell without comparable benchmarks.
When an engineer says "our model uses 1,500 tokens per request at $0.10 per 1K tokens, so cost-per-token is $0.15," they're using cost-per-token framing. But that tells you nothing about whether the request actually resolves the customer's problem, whether it requires human escalation, or what the true business cost is.
Klarna: $0.19 per resolved ticket (business outcome) Engineer: $0.15 per token (technical input)
These are incomparable numbers. The CFO thinks the engineer's estimate is conservative; the engineer thinks the CFO's benchmark is inflated. Neither is wrong; they're measuring different things.
The Iceberg Resolves the Conflict
The AI Cost Iceberg provides the bridge between the two framings:
- Cost-per-token: $0.05 per request (visible tip)
- Cost-per-outcome: $0.19 per resolved ticket (full iceberg)
The iceberg explains where the $0.14 difference comes from:
- Integrations: $0.03
- Retries: $0.02
- Human escalation: $0.06
- Infrastructure and observability: $0.03
Once you're explicit about these hidden costs, the engineer and the CFO can have a productive conversation. The engineer might say: "I can reduce token cost from $0.05 to $0.04 by using a cheaper model." The CFO says: "But if that reduces accuracy and increases human escalations from $0.06 to $0.09, the total cost goes up to $0.22. Don't do it."
The Framing Battle in Practice
In a typical enterprise:
Year 1: Engineering team builds an agent. They quote cost-per-token. CFO budgets based on token cost. Agent is deployed.
Year 2: Actual spend is 5–10x higher than budget because of hidden costs. CFO is shocked. Engineering team claims the CFO didn't understand how much infrastructure costs. Finance team claims engineering underestimated.
Year 3: CFO insists on cost-per-outcome reporting. Engineering team resists because it "doesn't account for efficiency gains." Battle ensues.
Year 4: Organization implements work-item-level cost attribution. Every agent outcome is tagged with its full cost. Engineering team and finance team finally have a shared language.
The organization that skips to Year 4 wins. Everyone else wastes time in Years 1–3 arguing about how to measure cost.
The Vendor's Game
Vendors deliberately exploit this framing ambiguity:
- To engineers: "Our agent uses state-of-the-art model optimization. Cost-per-token is $0.02."
- To CFOs: "Our agent resolves customer issues at $0.50 per ticket."
Both statements can be true, but they measure different things. The engineer builds a proof-of-concept at $0.02 cost-per-token. The CFO deploys it at $0.50 cost-per-outcome. Neither party is lied to; they're just measuring different layers of the iceberg.
When evaluating vendors, insist on cost-per-outcome using the exact definition: "What is the all-in cost to produce one resolved outcome, including API, infrastructure, integrations, human review, and compliance overhead?"
The Future: Outcome-Based Pricing
The next generation of AI vendors will price by outcome, not by token. Instead of charging $0.10 per 1M tokens, they'll charge $0.50 per resolved ticket. This forces the vendor to absorb the hidden cost and incentivizes them to optimize the full stack: API, retries, human review, everything.
Outcome-based pricing aligns vendor incentives with CFO incentives: both are optimizing for cost-per-outcome. The vendor that can deliver $0.19 per resolved ticket cheaper than others wins business.
Token-based pricing aligns vendor incentives with engineer incentives but misaligns with CFO incentives. That's why the framing battle exists: different parts of the org are optimizing different metrics.
What to Do Next
When you hear an agent cost estimate, immediately ask: "Is that cost-per-token or cost-per-outcome?" If it's cost-per-token, ask what the outcome is and what the full cost is including all hidden layers. If it's cost-per-outcome, ask what's included and whether human review time is included.
Use the AI Cost Iceberg to translate between the two framings. Once you're explicit about hidden costs, the engineer and the CFO can optimize together instead of at cross-purposes.
For a deeper walkthrough of cost attribution and how to surface cost-per-outcome across your entire agent fleet, request the CFO Field Guide or a demo with Runrate.
Want to see this in your stack?
Book a 30-minute walkthrough with a Runrate founder.
Was this article helpful?