Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →Your OpenAI bill shows $8,000 last month. You're budgeting $100,000 for AI agents next year. But the true cost of running AI agents is not $8,000 or $100,000—it's closer to $80,000–$500,000, depending on how many integrations, verifications, and operational overhead you've layered on top of the raw API cost. The AI Cost Iceberg is the framework for understanding why: the visible tip of AI cost (API tokens from OpenAI, Anthropic, Google) is roughly 10% of the true cost. The hidden 90% lives in infrastructure, integrations, failure recovery, compliance, and human review.
Most CFOs and executive teams are budgeting and forecasting against the tip. They're looking at the visible API line item and assuming that's the full cost. It isn't. This is why AI projects routinely exceed their cost forecasts by 3–5x. The iceberg is there; most organizations just can't see below the waterline.
The Visible 10%: API Tokens and Inference Cost
The visible cost of running AI agents is the cost of renting compute from model providers—OpenAI, Anthropic, Google, Mistral, and others. You pay per token (a token is roughly a word or subword):
- Input tokens (the tokens the model reads): $0.0001 per 1K tokens for Claude 3.5 Sonnet.
- Output tokens (the tokens the model generates): $0.0003 per 1K tokens for Claude 3.5 Sonnet.
- Reasoning tokens (for newer "thinking" models): $0.03 per 1K input tokens for Claude 3.7 Thinking.
For a 1,000-query/month agent with 2,000 input tokens per query and 500 output tokens per query, the visible API cost is:
(1,000 × 2,000 × $0.0001) + (1,000 × 500 × $0.0003) = $0.20 + $0.15 = $0.35/month
This is the number that appears on your OpenAI or Anthropic invoice. It's real. It's the cost of renting the model compute. But it's only the tip of the iceberg. Everything below the waterline—integrations, retries, verification, human review, compliance—is invisible in the invoice but visible in the operational cost.
The Hidden 90%: Eight Cost Layers Below the Waterline
Layer 1: Inference Compute at Scale ($0.50–$5 per 1,000 queries)
If you deploy agents on your own infrastructure instead of using managed APIs, you run inference compute yourself. You might use AWS SageMaker, Google Vertex, an on-premise GPU cluster, or a specialized inference provider like Together, Replicate, or Baseten. The cost of running your own inference at scale is typically $3–$10 per hour of GPU compute.
For a 24/7 agent handling 1,000 queries per day, you're running inference continuously. Even with efficient batching and caching, you're looking at 0.5–2 hours of GPU time per 1,000 queries, depending on model size and latency SLAs. At $5/GPU-hour, that's $2.50–$10 per 1,000 queries, or $60–$300/month for a light-volume agent. This is 170–860x the visible API cost for the same agent run on OpenAI's infrastructure.
Most organizations don't do this. But when they do—especially in regulated industries where data residency or model weights ownership matters—the hidden cost layer is enormous.
Layer 2: Vector Database Storage and Embedding Generation ($0.02–$2 per query)
Agents that retrieve context—insurance agents fetching claim history, legal agents pulling contract snippets, support agents surfacing FAQ answers—use vector databases to store embeddings and retrieve relevant documents.
Embedding generation costs money. Each document you embed costs tokens. A 2,000-word document costs ~1,000 tokens to embed. If you embed 100 documents per day (a reasonable refresh rate for a live knowledge base), that's 100,000 tokens/day in embedding cost, or roughly $3/day in embedding tokens alone ($90/month for one agent's knowledge base).
Storage cost varies. Pinecone charges $0.10 per million vectors per month for standard storage. A knowledge base of 10,000 documents at 1 embedding per document is 10,000 vectors, costing $0.001/month in storage—negligible. But if you're embedding with 5–10 different retrieval strategies (one embedding for the full document, one for the abstract, one for key entities), you're storing 50,000–100,000 vectors, costing $0.5–$1/month.
Per query, if your agent retrieves an average of 3 documents per query (standard for RAG agents), and each retrieval involves one embedding lookup and one vector database query, that's negligible direct cost but significant aggregate cost. At scale, vector database cost is typically $0.02–$0.10 per query ($20–$100/month for 1,000 queries).
Layer 3: Tool Calls to Third-Party APIs ($0.01–$50 per outcome)
Agents that do real work call third-party APIs: Stripe to issue refunds, Twilio to send SMS, Plaid to verify bank accounts, Salesforce to update opportunities, internal systems to query databases or trigger workflows.
These calls have two costs: the direct fee from the vendor, and the operational cost of maintaining the integration.
A refund tool might call Stripe ($0 direct cost for the API call, but the transaction fee is 2.2% + $0.30, so a $100 refund costs $2.50 in Stripe fees). A contact tool might call Twilio (3 cents per SMS). An account verification tool might call Plaid ($2–$5 per verification). An escalation tool might call your internal API (free, but you're paying for the infrastructure).
For an insurance claims agent that processes 100 claims/month and issues 70 payouts, integration cost is:
- 70 payouts × 2.2% + $0.30 average payout fee = $15–$30/month depending on payout size
- 40 verification calls to Plaid × $2.50 = $100/month
- 20 escalation emails via SendGrid = $0.60/month
- Total: $115–$130/month in direct third-party tool costs
For 100 claims, that's $1.15–$1.30 per claim in tool cost, or 3–5x the visible API cost.
Layer 4: Retries on Failure and Fallback Logic ($0.01–$0.50 per outcome)
Agents fail. APIs time out, return stale data, or throw errors. When they fail, agents retry. Each retry is another API call, another set of tokens, another chance at success. At scale, retry logic is expensive.
If your agent has a 2% failure rate and retries once, you've paid for 102% of the queries you actually completed. If your agent has a 5% failure rate and retries twice (exponential backoff), you've paid for 110% of the queries. If your agent has a 10% failure rate and uses 3 retries with exponential backoff and jitter, you've paid for 130% of the queries.
Retry cost typically adds 5–15% to your base inference cost, depending on integration reliability. For the $0.35/month visible cost agent above, retry cost is $0.02–$0.05/month, or roughly $240–$600/month for a production agent at scale (1,000 queries/day across a calendar month).
Layer 5: Human-in-the-Loop Review and Escalation ($1–$10 per outcome)
Agents rarely achieve 100% accuracy or 100% confidence. At some threshold—"the agent is 85% confident" or "the agent encountered an edge case"—agents escalate to a human for review.
Human review cost depends on salary and time. A $60,000/year customer service representative ($30/hour all-in) spending 2 minutes reviewing an agent decision costs $1 per decision. A $120,000/year loan officer ($60/hour) spending 3 minutes reviewing an agent's recommendation costs $3 per decision.
For a claims agent processing 100 claims/month with 10% escalation rate (10 claims escalated for human review, each requiring 5 minutes of review time from a $50/hour claims adjudicator), the cost is:
10 claims × 5 minutes × $50/hour / 60 minutes = $41.67/month, or $0.42 per claim in human review cost.
This is often called "human-in-the-loop" cost, and it's substantial. For high-stakes domains (healthcare, finance, legal), where accuracy is non-negotiable, human review cost can exceed the agent's API cost by 10–100x.
Layer 6: Evaluation and Testing Infrastructure ($0.50–$5 per 1,000 queries)
Before deploying an agent to production, you need to evaluate it. Evaluation means running the agent against a test dataset (100–1,000 examples), measuring accuracy, recall, and precision, and iterating. Each evaluation run is a full inference pass.
If you run 5 evaluation passes during development and refinement before production (reasonable for a claims agent or a loan origination agent), and each pass processes 500 test examples, that's 2,500 test queries. At $0.35 per 1,000 visible API tokens, that's $0.88 in visible cost, but the operational cost of building, maintaining, and running the evaluation pipeline is much higher.
Evaluation infrastructure (test data management, result tracking, accuracy metrics, visualization) costs $500–$2,000/month per agent, split across 5–10 evaluation runs per month per agent. For a single agent, that's $50–$400/month in evaluation overhead. For a portfolio of 20 agents, that's $1,000–$8,000/month in evaluation infrastructure cost.
Layer 7: Observability, Logging, and Monitoring ($200–$2,000 per agent per month)
You need to know what your agent is doing. Is it hallucinating? Is it calling the wrong APIs? Is it slow? Is it costing more than expected? This requires logging every inference, every API call, every decision, and every error.
Logging infrastructure (Datadog, New Relic, Grafana, or a custom pipeline) costs money. For a production agent:
- Log ingestion: ~$5–$20 per 1 GB of logs per month. A busy agent generates 100 MB of logs per day (trace logs from each inference step), or 3 GB per month, costing $15–$60/month.
- Tracing and analytics: $300–$1,000/month for a full observability platform covering 5–10 agents.
- Alerting and incident management: $100–$500/month.
- Custom dashboards and reporting: $0–$500/month depending on complexity.
Total observability cost: $400–$2,000/month per agent, depending on scale and platform.
Layer 8: Security, Compliance, and Audit ($500–$5,000 per agent deployment)
Regulated industries (healthcare, finance, legal) require security and compliance infrastructure:
- SOC 2 certification and audit: $10,000–$50,000 one-time cost, amortized across your agent portfolio.
- HIPAA or GDPR compliance infrastructure (data encryption, access controls, audit logs): $1,000–$5,000/agent one-time cost, $500–$2,000/agent/month operational cost.
- Bias and fairness testing: $500–$2,000 per agent per quarter.
- Model explainability and audit: $1,000–$5,000 per agent per year.
- Regular penetration testing and security reviews: $5,000–$20,000 per year.
For a healthcare claims agent, annual compliance cost is easily $10,000–$30,000, or $800–$2,500 per month.
Layer 9: AI Gateway, Rate Limiting, and Cost Control Infrastructure ($100–$1,000 per month)
To control costs and prevent runaway spending, you deploy an AI gateway (Helicone, Baseten, or custom) or a rate-limiting service. This infrastructure prevents agents from making unlimited API calls, enforces quota limits, and catches anomalies.
Cost: $100–$1,000/month depending on scale and platform.
Putting It Together: A Real Cost Example
For a mid-market insurance company deploying an AI claims agent processing 100 claims/month:
- Visible API cost: 100 claims × 5,000 tokens average × $0.0001 per 1K tokens = $0.05/month
- Vector database (if used): $20/month
- Tool calls (Stripe payouts, verification APIs): $120/month
- Retries (8% retry rate): $10/month
- Human-in-the-loop review (10% escalation, 5 min/claim): $40/month
- Evaluation infrastructure (amortized): $100/month
- Observability and logging: $800/month
- Compliance and audit (amortized): $1,200/month
- AI gateway: $200/month
True cost: $2,490/month Visible cost (API only): $0.05/month Hidden cost ratio: 49,700x the visible API cost
The CFO sees the API bill of $0.05 and budgets $0.10 for claims agent cost. The true cost is $2,490/month—50,000x higher.
How Runrate Operationalizes the Iceberg
Runrate's work-item-level cost attribution breaks the iceberg into visible and hidden layers, assigns each cost layer to a specific work item (a resolved claim, a processed loan, a closed support ticket), and surfaces the true cost-per-outcome. Instead of seeing "AI spend: $2,490/month" and losing visibility to the drivers, your finance team sees:
- Cost per claim: $24.90
- Cost per claim by layer: $0.05 visible API + $0.20 integration + $0.40 review + $8.00 observability + $12.00 compliance + $2.00 gateway + $2.25 amortized evaluation
- Cost trends by layer: "Human review time is up 15% this month—escalation rate or claim complexity?"
- Cost anomalies: "Vector database cost spiked 300% Tuesday—which agent is over-retrieving?"
This granularity is what allows CFOs and operations teams to actually manage AI cost. When you're budgeting against the visible tip, you're flying blind. When you can see and manage the iceberg, cost becomes predictable and controllable.
The AI Cost Iceberg is not a bug—it's an artifact of deploying real, operational agents that integrate with production systems, pass regulatory scrutiny, and serve actual customers. Every hidden layer serves a purpose. The CFO's job is to see them, measure them, and optimize across them.
Where does your team sit on the maturity curve?
Take the 15-question self-assessment and get a personalized report.
Was this article helpful?