The True Cost of AI Agents (and Why Your Bill Is 10x Your Token Spend)

5 min read · Updated 2026-05-02

Runrate Framework

The AI Cost Iceberg

Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).

Read the full framework →

The gap between what you think you're paying for AI agents and what you're actually paying is the defining financial problem of the agentic enterprise. Vendors quote token cost. CFOs budget based on token cost. Actual spend is 5x to 15x that number. This isn't a rounding error; it's the business model of every vendor pitching you an AI agent.

The Token Cost Illusion

An AI vendor tells you: "Our agent processes claims at $0.02 per API call." That's token cost only. It assumes:

  • Zero failures (no retries)
  • No integrations (no third-party API calls)
  • No storage (no vector DB, no logging)
  • No human review (no labor cost)
  • No evaluation (no testing cost)
  • No infrastructure overhead (no observability, no security)

None of those assumptions hold. In production, true cost is token cost plus all of those things. The gap between assumption and reality is the AI Cost Iceberg.

How the Iceberg Was Discovered

This framework comes from Revenium's research into AI cost reality: tokens are less than 1% of true AI cost for complex agents. The observation: even expensive APIs and massive model runs are dwarfed by human review time, infrastructure overhead, and integration costs.

Klarna's AI customer service agent runs at $0.19 per resolved ticket. If you reverse-engineer this number, you find: Klarna probably spends $0.03–$0.05 on LLM API calls per ticket. The rest—$0.14–$0.16—is infrastructure, integrations, training, and scale-related overhead. The token cost is 25% of the total. Everything else is hidden.

Why the 10x Multiplier Is Real

Take a concrete example: a healthcare organization evaluating an insurance claims agent.

The vendor quotes: "$0.02 per API call. You process 50,000 claims/month. Your monthly cost is $1,000."

The CFO budgets $12,000/year and thinks they're done.

Here's what actually happens:

Visible API cost: The vendor's quote is based on a specific LLM, a specific prompt, a specific input/output length. In production:

  • Some claims need longer context windows (unusual diagnoses, prior denials)
  • Some agents retry failed parses
  • Some need multi-turn reasoning (coverage verification, coordination of benefits)

True API cost: $0.04–$0.08 per claim (2–4x the vendor quote)

Integration costs: The agent needs to call:

  • Your claims database (internal API, free but consumes compute)
  • The insurance company's eligibility API ($0.01–$0.02 per call)
  • An external medical necessity review service ($0.05–$0.10 per call)
  • The company's EHR (internal, but generates logs and audit trail cost)

Integration costs: $0.08–$0.12 per claim

Observability and logging: Every decision needs to be logged for audit and compliance. 50,000 claims/month = 500 GB to 1 TB of logs. Storage, indexing, query cost.

Observability cost: $1,500–$2,500/month = $0.03–$0.05 per claim

Vector database (for historical claims context): Pinecone Pro or similar.

Database cost: $800–$1,200/month = $0.016–$0.024 per claim

Human review: In healthcare, a claims adjudicator reviews edge cases or high-dollar claims before payment. Assume 15–25% escalation rate, 3 minutes per review, $30/hour.

Human review cost: $1,875–$3,125/month = $0.04–$0.06 per claim (on escalated portion) or $0.006–$0.015 per claim (blended across all claims)

Security and compliance: PII redaction, audit logging, encryption, SOC 2.

Compliance cost: 10–15% adder = $0.01–$0.015 per claim

Testing and evaluation: A/B testing prompts, evaluating accuracy, updating rules.

Testing cost: 5–10% adder = $0.005–$0.01 per claim

Sum of hidden costs:

| Layer | Cost Per Claim | | --- | --- | | Token cost (quoted) | $0.02 | | Token cost (actual, with retries) | $0.06 | | Integrations | $0.10 | | Observability | $0.04 | | Vector DB | $0.02 | | Human review | $0.01 | | Compliance | $0.01 | | Testing | $0.01 | | Total | $0.26 |

The vendor quoted $0.02. True cost is $0.26. That's a 13x multiplier.

Across 50,000 claims/month: $13,000/month or $156,000/year. The CFO budgeted $12,000/year. The actual cost is 13x higher.

Why Vendors Quote Token Cost

Vendors quote token cost because it's the only number they can control. They can't control your retry rate, your integration complexity, your regulatory burden, or your willingness to pay for observability. Token cost is predictable and defensible.

The other reason: quoting total cost of ownership would kill the deal. $0.26 per claim sounds expensive. $0.02 per claim sounds cheap. Vendors optimize for the conversation they want to have, not the conversation you need to have.

The Iceberg by Industry

The multiplier varies by industry:

  • Customer service (low stakes, few integrations): 3–5x multiplier
  • Loan origination (moderate stakes, many integrations, compliance): 5–8x multiplier
  • Insurance claims (high stakes, many integrations, mandatory review): 8–12x multiplier
  • Healthcare (highest stakes, most integrations, strictest compliance): 10–15x multiplier
  • Legal review (longest human review, most compliance): 15–20x multiplier

The pattern is clear: regulated industries pay 3–4x more than SaaS because human review and compliance overhead dominate.

The CFO's Response

When you discover the 10x gap, three responses are available:

  1. Accept the cost and optimize within it. Can you reduce the escalation rate by improving the agent? Can you negotiate cheaper integrations? Can you batch process work to reduce API calls?

  2. Reduce the scope. Instead of reviewing 25% of claims, review only 10%. Use a faster, cheaper model for routine decisions. Reduce the complexity of the agent.

  3. Reallocate headcount. If true cost is $0.26 per claim and a claims adjudicator costs $0.40–$0.60 per claim, the AI agent is still 2x cheaper. But the comparison is now honest.

Most CFOs choose a mix: optimize the agent to reduce cost, reduce scope slightly, and reallocate some headcount. None of them accept the vendor's token-cost quote as the actual cost.

The Iceberg Framework in Practice

The AI Cost Iceberg organizes true cost into visible and hidden layers:

  • Visible (10%): Token cost, paid to OpenAI/Anthropic/Google
  • Hidden (90%): Inference at scale, integrations, storage, observability, human review, compliance, testing

When evaluating vendors or building internal agents, use the Iceberg to pressure-test cost estimates. If someone quotes a number without accounting for hidden layers, they don't understand their cost, or they're hiding it intentionally.

What to Do Next

Use the AI Cost Iceberg to build a bottom-up cost model for your agents. Start with token cost, add integration costs, add observability, add human review, add compliance, and add testing. The sum is your true cost per outcome. Compare that to the cost of the human headcount you'd need to do the same work.

For a step-by-step cost attribution framework, see the pillar article on AI agent cost.

Go deeper with the field guide.

A step-by-step PDF for implementing AI cost attribution.

Download the Guide

Was this article helpful?