Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →When evaluating AI agents, your procurement team usually sees one number: the price per seat, per conversation, or per resolution. You budget against that one number. By month 3, your actual cost is 3–4x higher because you're paying for 11 other things that weren't on the quote.
This checklist walks through the 12 line items that show up on a true total cost of ownership for AI agents. Use this when evaluating vendors, building business cases, and forecasting budgets. Most organizations capture 2–3 of these 12 and are shocked when the other 9 appear on the bill.
1. Model API Tokens (The Visible Cost)
This is the only cost your CFO usually knows about: the price of renting inference from model providers—OpenAI, Anthropic, Google, Mistral, or others. You pay per token (roughly per word) for input tokens (what you send to the model) and output tokens (what the model generates).
Typical cost: $0.0001–$0.03 per 1K input tokens, depending on model. GPT-4o: $0.005 per 1K input tokens. Claude 3.5 Sonnet: $0.003 per 1K input tokens. Claude 3.7 Thinking: $0.03 per 1K thinking input tokens (100x more expensive for reasoning).
For 1,000 queries/month with 2,000 average tokens per query: (1,000 × 2,000 × $0.003) / 1,000 = $6/month on Claude Sonnet. $60/month on Claude Thinking.
What buyers usually do: They budget for this and only this. They see $6/month, multiply by 12, budget $72/year, and call it good. They miss the other 11 items.
2. Inference Compute at Scale (If You're Self-Hosting)
If you're not using managed APIs (OpenAI, Anthropic), you're running inference on your own infrastructure: AWS SageMaker, Google Vertex, an on-premise GPU cluster, or a specialized inference provider like Together, Replicate, or Baseten.
Typical cost: $3–$15 per GPU-hour, depending on GPU type (H100: $3–$5/hour; A100: $2–$3/hour). A 24/7 agent handling 10,000 queries/day might need 1–2 GPU-hours per day, depending on model size and batch efficiency.
For a full-time agent: 1 GPU × 24 hours × $5/hour × 30 days = $3,600/month.
What buyers usually do: If they choose self-hosting, they often underestimate utilization and think "one agent = 0.5 GPU-hour per day," underbudgeting by 50–70%. They also don't account for GPU over-provisioning (you can't run at 100% utilization constantly without hitting latency issues).
3. Vector Database Storage and Retrieval
If your agent retrieves documents (customer history, policy documents, knowledge articles), it uses a vector database to store embeddings and perform semantic search. Examples: Pinecone, Weaviate, Milvus, Elasticsearch.
Typical cost: $0.10–$0.25 per million vectors per month for managed services, plus per-query read costs. Embedding generation costs $0.0001 per 1K tokens.
For a knowledge base of 10,000 documents with 5 embeddings each (100,000 vectors): Storage $1–$2.50/month. Embedding generation (if refreshing 10% of documents weekly): ~2,000 embeddings/week × 1,000 tokens/embedding × $0.0001 / 1K tokens = $2/week in embedding cost = $8/month. Total: $10/month. At 1,000 queries/month with 3 document retrievals per query, retrieval cost is typically $0.01–$0.05 per query.
What buyers usually do: They assume vector database cost is negligible and ignore it. Then they're surprised by $200–$500/month bills for knowledge base storage and retrieval, especially after adding multiple embedding models or refreshing documents frequently.
4. Embedding Generation (If Separate from Vector DB)
Some organizations use third-party embedding services (OpenAI Embeddings API, Anthropic Embeddings, Cohere Embeddings) instead of self-hosting embeddings. This is separate from vector database cost.
Typical cost: $0.0001–$0.001 per 1K tokens, depending on provider. OpenAI: $0.02 per 1M tokens. Anthropic: $0.1 per 1M tokens.
For 50,000 tokens of embedding per month (typical for an agent with a growing knowledge base): $0.0001 × 50 = $5/month.
What buyers usually do: They often don't separate this cost from vector database cost and miss it entirely. Or they choose the embedding provider based on quality, then are surprised by the monthly bill.
5. Retries on API Failure and Fallback Logic
Agents fail. APIs timeout, return stale data, or throw errors. When they fail, agents retry. Each retry is another API call, another set of tokens, another chance to fail.
Typical retry rate: 2–5% of queries trigger retries, depending on integration reliability. If 3% of your queries retry (which is typical for agents with 3–5 backend integrations), you're paying 3% extra on top of your base token cost.
For the $6/month token cost agent: Add 3% = $0.18/month extra. At scale (100 agents, 1,000 queries/month each), retry cost adds up to $216/year.
What buyers usually do: They don't track retry rate and don't forecast it. Then they're shocked when the bill is 15% higher than expected due to integration flakiness or complex edge cases that require retries.
6. Tool Calls to Third-Party APIs (Stripe, Twilio, Plaid, etc.)
Agents that do real work call third-party APIs. A refund agent calls Stripe. A notification agent calls Twilio. A KYC agent calls Plaid. A data agent calls your internal API.
Typical cost: Highly variable. Stripe refunds: 2.2% + $0.30 per transaction. Twilio SMS: $0.0075 per message. Plaid verification: $2–$5 per verification. Internal API calls: often free, but you're paying for the infrastructure.
For an agent issuing 50 refunds/month, sending 200 SMS, and running 30 verifications: (50 × average $10 refund × 2.2%) + ($50 + $0.30) + (200 × $0.0075) + (30 × $3) = $11 + $50.30 + $1.50 + $90 = $152.80/month.
What buyers usually do: They focus on the agent vendor's pricing and forget that the agent is just an orchestration layer. The real cost is in the backend API calls. A vendor might charge $0.99/resolved ticket, but if each ticket requires a $2 Plaid verification and a $1 SMS notification, the real cost per ticket is $3.99, and the vendor's cut is only 25%.
7. Human-in-the-Loop Review and Escalation
Agents rarely achieve 100% confidence or 100% accuracy. At some threshold, they escalate to a human for review.
Typical cost: Depends on salary and time. A $50/hour claims adjudicator spending 3 minutes per escalation costs $2.50 per escalation. A $20/hour customer service rep spending 30 seconds per escalation costs $0.17 per escalation.
For an agent with 10% escalation rate processing 500 claims/month (50 escalations × $2.50): $125/month. At scale (10,000 claims/month), that's $2,500/month in human review cost.
What buyers usually do: They underestimate escalation rate (assume 5%, when it's actually 10–15%). They also don't budget for the management of escalations—someone has to triage them, prioritize them, and ensure quality. That's 0.25–0.5 FTE of overhead per agent.
8. Evaluation Runs and Test Data Infrastructure
Before deploying an agent to production, you need to evaluate it. Evaluation means running the agent against a test dataset, measuring accuracy, and iterating.
Typical cost: Test data management ($50–$200/month), evaluation orchestration ($100–$500/month), metrics visualization ($50–$200/month), and the raw inference cost of running evaluations (typically 2–3x the production cost of inference during development, because you're running more frequent experiments).
For an agent in development: Add 50–100% to inference cost for evaluation overhead. Once in production, amortize the cumulative evaluation cost across 12 months.
What buyers usually do: They assume evaluation is "free" because it's engineering time. They don't budget the infrastructure or the repeated inference cost of running evaluations. By month 2, they've spent $1,000 on evaluation inference they didn't plan for.
9. Observability, Logging, and Monitoring Infrastructure
You need to know what your agent is doing. Is it hallucinating? Is it calling the wrong APIs? Is it slow? Is it costing more than expected?
Typical cost:
- Log ingestion (Datadog, New Relic, etc.): $300–$1,000/month for heavy logging.
- Trace and span collection: $100–$500/month.
- Custom dashboards and reporting: $50–$200/month.
- Alerting and incident management: $100–$500/month.
Total observability cost: $550–$2,200/month per agent depending on logging volume and platform.
What buyers usually do: They try to use the vendor's built-in logging and assume that's sufficient. When they need to correlate agent cost with business outcomes, they realize the vendor's logging doesn't give them what they need. They then add a second observability platform, doubling cost.
10. AI Gateway and Rate-Limiting Infrastructure
To prevent agents from making unlimited API calls (and spiraling your cost), you deploy an AI gateway: Helicone, Baseten, or a custom service that enforces quota limits, rate limits, and cost guardrails.
Typical cost: $50–$500/month depending on query volume and sophistication of controls.
What buyers usually do: They skip this initially, then add it after an incident where an agent went haywire and cost $5,000 in a single day. By then it's too late.
11. Security and Compliance Review (SOC 2, HIPAA, GDPR, Audit)
Regulated industries require security and compliance infrastructure.
Typical cost:
- SOC 2 audit and certification: $15,000–$50,000 one-time, amortized.
- HIPAA or GDPR compliance infrastructure: $5,000–$20,000 one-time, then $500–$2,000/month operational.
- Bias and fairness testing: $500–$2,000/quarter.
- Model explainability and audit: $1,000–$5,000/year.
- Penetration testing and security reviews: $5,000–$20,000/year.
For a healthcare agent: Budget $8,000–$30,000 first year, then $2,000–$4,000/month ongoing.
What buyers usually do: They deploy the agent, then realize they need SOC 2 or HIPAA compliance and scramble. By then the agent is in production and changing it is expensive.
12. Vendor Management and Ongoing Operations Overhead
Someone needs to manage the agent: update rules, retrain the model, investigate errors, manage vendor escalations, and keep the agent aligned with business changes.
Typical cost: 0.5–1.5 FTE of engineering or operations time. At $100,000/year fully-loaded: $4,167–$12,500/month.
What buyers usually do: They assume "once we deploy, it runs itself." When they realize they need someone dedicated to the agent, they either don't hire for it (and the agent degrades) or they scramble to find budget mid-year.
Putting It All Together: A Real TCO Example
For a mid-market insurance company deploying an AI claims agent processing 1,000 claims/month:
| Item | Cost/Month | | --- | --- | | 1. API tokens (Claude Sonnet, 1,000 claims × 5K tokens) | $6 | | 2. Inference compute at scale (not applicable, using API) | $0 | | 3. Vector DB storage (10K documents) | $15 | | 4. Embedding generation (weekly refresh) | $8 | | 5. Retries (3% failure rate) | $1 | | 6. Tool calls (Stripe, Plaid, SMS for integrations) | $150 | | 7. Human review (10% escalation, $50/hour adjudicator, 3 min/claim) | $250 | | 8. Evaluation (amortized) | $100 | | 9. Observability (Datadog + custom dashboards) | $600 | | 10. AI gateway (rate limiting, cost controls) | $200 | | 11. Security/compliance (amortized HIPAA + audit) | $1,500 | | 12. Vendor management + ops overhead (0.5 FTE) | $5,000 | | Total TCO | $7,830/month |
Per-claim cost: $7.83/claim.
Visible cost (API tokens only): $0.006/claim.
Hidden cost ratio: 1,300x the visible cost.
If you budgeted only on the vendor's quoted price (say, $0.99 per resolved claim) and processed 1,000 claims, you'd budget $990/month and be $6,840/month short of the true cost.
Using This Checklist
When evaluating an AI agent or building your budget:
- Start with the vendor's quoted price. That's line item #1 (API tokens) or the vendor's specific pricing model.
- Walk through items 2–12 with your technical and operations teams.
- For each item, estimate the cost for your specific use case.
- Add them up. That's your true TCO.
- Compare to your current cost (manual labor, outsourced vendor, or status quo).
- Calculate ROI based on true TCO, not just the vendor's price.
Most AI agent deals are economically rational once you see the full cost. Many are not. The difference is visibility. Use this checklist to get there.
For a deeper understanding of how these costs fit into the broader cost structure of agents, see the AI Cost Iceberg. For profitability analysis, check When AI Agents Are Profitable.
The CFO's job is to see the full iceberg, not just the tip. This checklist helps you do that.
Go deeper with the field guide.
A step-by-step PDF for implementing AI cost attribution.
Was this article helpful?