How Much Does an AI Agent Actually Cost? A Buyer's Deep Dive

8 min read · Updated 2026-05-02

Runrate Framework

The AI Cost Iceberg

Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).

Read the full framework →

Runrate Framework

5-Stage AI Cost Maturity Curve

From Invisible → Tracked → Allocated → Optimized → Governed — where does your org sit?

Read the full framework →

Runrate Framework

AI Workforce P&L

Treat AI agents like employees: cost structure, productivity target, and retirement trigger per agent.

Read the full framework →

The cost of an AI agent looks deceptively simple: check the vendor's per-token price, multiply by your usage, and you're done. In practice, that number is the tip of the iceberg. Most finance teams budgeting for AI agents see only the visible API spend—often less than 10% of the actual cost—and underestimate by a factor of 10 or more.

The Visible Cost: What Vendors Show You

When Klarna, Intercom, Sierra, or any AI agent vendor quotes a price, they're usually quoting the cost of API calls to the underlying large language model (LLM). Klarna's AI customer service agent runs at $0.19 per resolved ticket. Intercom Fin, Intercom's generative AI support assistant, runs around $0.99 per resolution. Sierra, a more complex customer service platform, averages $1.50 per resolution. These numbers sound attractive.

Here's the catch: those prices are the API bill only. They assume perfect efficiency, zero retries, no integrations, and no human review. None of those assumptions hold in production.

The AI Cost Iceberg: 90% Hidden Below the Surface

The AI Cost Iceberg is Runrate's model for understanding true agent cost. It breaks AI spend into visible and hidden layers.

The visible layer is what you see on the vendor bill: API calls to OpenAI, Anthropic, Google, or a self-hosted model. This is typically 10% of your total AI cost.

The hidden layers are what actually drives the budget. They include:

  • Inference at scale — Running models on larger batches, longer contexts, or more complex reasoning tasks than simple API calls. A model that costs $0.02 per 1,000 tokens at baseline might cost $0.15 per 1,000 tokens when you add vision, multi-turn reasoning, or retrieval-augmented generation (RAG).
  • Vector database and storage — Every AI agent that learns from company documents needs a vector database to store embeddings. A Pinecone or Weaviate instance running 24/7 might cost $500–$2,500/month before your first AI agent fires.
  • Retries on failure — When an agent fails to parse an API response or hits a timeout, it retries. Each retry is a new API call. A 5% failure rate that triggers retries can double your token cost.
  • Tool calls to third-party APIs — When an agent calls Stripe to charge a customer or Twilio to send an SMS, those calls cost money. A Stripe API call costs $0.005–$0.01. A Twilio SMS costs $0.0075. Multiply by thousands of daily transactions, and you're talking tens of thousands monthly.
  • Human-in-the-loop review — High-stakes domains (healthcare, insurance, legal, financial services) require a human to review AI decisions before execution. A claims adjudicator reviewing an AI-processed claim might take 2–5 minutes. At $30/hour, that's $1–$2.50 per claim in labor cost.
  • Observability and logging — You need to log every agent action for debugging, audit, and compliance. Logs for 100,000 daily agent runs can exceed 50 GB/month. Storage, indexing, and query time add up.
  • Evaluation and quality assurance — Running test suites to validate agent accuracy, measuring hallucination rates, and A/B testing prompt changes costs time and compute.
  • Security and compliance overhead — PII redaction, data residency constraints, encryption, audit logging, and SOC 2 compliance add engineering and operational cost.
  • Training data licensing — If you're fine-tuning an agent on proprietary data or using vendor-specific training data, you're paying licensing fees on top of the base model cost.

The sum of these hidden layers typically exceeds your visible API spend by 5x to 15x, depending on your use case.

Real-World Benchmark Costs

The table below summarizes publicly disclosed or market-standard cost-per-outcome benchmarks from vendors and early adopters:

| Agent / Use Case | Cost Per Outcome | Notes | | --- | --- | --- | | Klarna AI customer service | $0.19 per resolved ticket | Public; includes API, infrastructure, training | | Intercom Fin (email resolution) | ~$0.99 per resolution | Market estimate; email support, no callbacks | | Sierra (contact center) | ~$1.50 per resolution | Complex workflows, integrations, human-in-loop | | Decagon (loan origination) | ~$0.80–$1.20 per application | Market estimate; includes compliance verification | | Devin (autonomous coding) | ~$2.25 per ticket resolved | Vendor-reported; includes model runtime + infrastructure |

These numbers span a 12x range ($0.19 to $2.25), illustrating how dramatically use case complexity, tooling requirements, and human review processes affect true cost.

Why Hidden Costs Explode: A Concrete Example

Take a mid-market insurance company processing 50,000 claims per month with an AI adjudication agent. The CFO sees the API cost:

  • 50,000 claims × 3,000 tokens per claim (average prompt + response) = 150 million tokens
  • At $0.01 per 1K tokens (a typical Claude or GPT-4 rate): $1,500/month

The CFO budgets $18,000/year and calls it done.

In reality, the true cost is closer to $15,000–$25,000/month:

  • API cost: $1,500
  • Inference at scale (complex routing logic, retrieval, multi-turn reasoning): +$2,000
  • Vector DB and document storage: +$1,200/month (Pinecone Pro)
  • Retries (5% failure rate on integrations): +$750
  • Stripe/integration API calls (payments, status updates): +$800
  • Human review (claims adjudicator, 3 min/claim @ $30/hr): +$12,500
  • Logging and observability (DataDog, custom logs): +$1,800
  • Evaluation, testing, prompt tuning: +$1,500
  • Compliance and audit overhead: +$1,000

Total: $23,150/month, or $277,800 annually. The initial $18,000 estimate was off by more than 15x.

The Maturity Curve: Why Most Finance Teams Get Cost Wrong

The reason so many CFOs miss this is structural: most organizations are at Stage 1 or 2 on the AI Cost Maturity Curve. They have no cost attribution system at all. AI spend is buried in cloud bills, software subscriptions, and headcount. When asked "what did we spend on AI last month?", the CFO can't answer because there's nowhere to look.

Without a cost-tracking system, teams default to the vendor's quoted price and hope for the best. That's why the iceberg exists: cost visibility is an organizational problem, not a math problem.

The Three Questions Every CFO Should Ask

When evaluating an AI agent vendor or building an internal agent, use these three questions to pressure-test the cost estimate:

  1. What is the cost per outcome, and what's included? If the vendor quotes only the API cost, ask for total cost of ownership. If they can't provide it, they don't know it.
  2. What happens when the agent fails? Every failure doubles the token cost (one failed attempt, one retry). What's the expected failure rate? How does the vendor reduce it?
  3. Who reviews the output, and what's the cost? Even high-accuracy agents need human review in regulated domains. A 2-minute review at $30/hour adds $1 per decision. Budget for it.

Unit Economics: Comparing Agents to Headcount

A useful mental model is to treat AI agents like full-time employees (FTEs) on your payroll. If an AI agent processes 10,000 claims per month at $2 per claim, the agent "costs" $20,000/month. That's cheaper than a single claims adjudicator ($4,500–$6,000/month salary + 40% benefits = $6,300–$8,400/month, but only handles 2,000–3,000 claims/month). The agent is 4–5x more efficient.

But efficiency only matters if you measure it. Most finance teams don't have a cost-per-claim KPI yet. They're still operating in the shadow-cost phase, where AI spend exists on a credit card or buried in vendor contracts.

How to Move from API Cost to True Cost

The path from "I don't know" to "I know my cost per outcome" has five steps, corresponding to Runrate's AI Cost Maturity Curve:

  1. Get visibility — Centralize all AI vendor bills and API spend into one P&L line. Use a tool like Runrate, CloudZero, or Vantage to aggregate.
  2. Break it down — Split AI cost by application, vendor, and business unit. This is the showback phase.
  3. Attribute it — Link each unit of work (a claim, a ticket, an application) to its AI cost. This requires logging at the transaction level.
  4. Optimize it — Once you know the cost per outcome, you can optimize. Reduce retries, improve accuracy, negotiate vendor discounts, choose cheaper models for commodity tasks.
  5. Govern it — Set SLOs (service-level objectives) on cost per outcome, implement anomaly detection, and report monthly to the board.

Most enterprise teams are at step 1. Runrate moves them to steps 3 or 4 immediately.

The Economics of Accuracy vs. Cost

Here's a hidden tension: more accurate agents are more expensive agents. A model that hallucinates less requires larger context windows, more reasoning steps, or more expensive model variants. A GPT-4 Turbo call costs 3x more than a GPT-4 call, but it's more accurate. A model fine-tuned on your data costs more upfront but might reduce human review time.

The CFO's job is to optimize the tradeoff. If you can reduce human review time from 3 minutes to 1.5 minutes by upgrading the model, and the model upgrade costs $0.50 more per outcome, but human time costs $1.50 per minute, you save $0.75 per outcome. The math works.

Without cost attribution, you can't do this math. Most teams guess, overspend on safety, and leave efficiency on the table.

What to Do Next

Start with one agent, one use case, and one outcome metric. If you're evaluating a contact center agent, pick "cost per resolved ticket" and measure it for a month. Include API cost, infrastructure, integrations, and an estimate of human review time (even if it's blended into salary cost today). Compare that to the cost of the human headcount you'd need to handle the same volume.

If the number surprises you (it usually does), you've found your iceberg. The next step is to decide whether to optimize the agent, negotiate better vendor pricing, or reallocate headcount. You can't optimize what you don't measure.

The CFO Field Guide to AI Costs walks through the line-item model, the vendor evaluation checklist, and the board-deck talking points for this conversation.

Go deeper with the field guide.

A step-by-step PDF for implementing AI cost attribution.

Download the Guide

Articles in AI Agent Economics