Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →When a vendor demos an AI agent to a CFO, they show response times, accuracy, and integration breadth. What they don't show is the cost hidden below the waterline: failed API calls, retry storms, third-party integration costs, human review overhead, and the infrastructure needed just to see what your agents are doing.
The API Call That Costs Twice as Much
An AI agent that fails silently doesn't stay silent for long. When an agent's first attempt fails—a timeout, a parsing error, a rate limit—it retries. That retry is a second API call. If the second attempt also fails, it's a third call. A 10% failure rate that triggers one retry each doubles your cost to "11% of the base cost" isn't accurate; it's actually 20% higher because every failed request is a full retry.
Compound this across a fleet of agents. If you have 10 agents, each with a 5% failure rate, and each failure triggers one retry, your fleet's true token cost is 50% higher than the sum of your successful calls. The CFO sees $10,000/month in API spend but is actually consuming $15,000/month in tokens.
Vendors don't advertise their failure rates. They advertise accuracy on happy-path demos. But in production, network timeouts, rate limits, and parsing errors are endemic. Budget for a minimum 5–10% retry multiplier on top of your baseline token cost.
The API Call That Isn't to OpenAI
When an AI agent executes a decision, it often needs to call external systems: Stripe to process a refund, Twilio to send a notification, Salesforce to update a record, a claims database to fetch a patient's history. Each of these integrations costs money.
A Stripe API call costs $0.005–$0.01 per request. Twilio SMS costs $0.0075 per message. A call to an internal API might not have a direct monetary cost, but it does consume rate limits and infrastructure capacity. At scale, it matters.
Consider a customer service agent handling 10,000 customer inquiries per month. If 30% of resolutions require sending a notification (Twilio SMS or email), that's 3,000 notifications. At $0.01 per notification, that's $30/month in direct API costs. Small. But if 40% of resolutions require a Stripe charge reversal, that's 4,000 Stripe calls at $0.005 each = $20/month. If 50% require updating a CRM, that's 5,000 CRM calls, which might have their own API costs or rate-limit implications.
The sum of integration costs can rival or exceed the LLM API cost. Most CFOs don't budget for this line item because it lives in a different GL code (operations, customer service, infrastructure) and isn't labeled as "AI cost."
The Vector Database That Runs 24/7
Every AI agent that needs to reason about company documents—product manuals, claims history, policy documents, prior legal rulings—needs a vector database. The agent converts documents into embeddings (numerical representations), stores them, and retrieves them during inference using semantic search.
A Pinecone Free tier is limited. Pinecone Pro, which most agents need, costs $600–$1,200/month. A Weaviate deployment costs $500–$2,000/month depending on storage. These databases run 24/7, whether your agents are active or not.
The database cost is fixed; it doesn't scale with usage the way API costs do. A startup with 100 documents and a single agent might pay $600/month for the vector DB but only $50/month in API calls. The database is 12x more expensive. Most CFOs don't realize they've committed to a $7,200/year infrastructure expense before the first agent query runs.
The Observability Bill That Nobody Plans For
You need to see what your agents are doing. Every agent action—every API call, every tool invocation, every decision branch—needs to be logged. A log entry is typically 1–10 KB. If you run 100,000 agent transactions per month, that's 100 GB to 1 TB of logs.
Storing, indexing, and querying those logs costs money. DataDog costs $0.10–$0.30 per GB ingested, plus query costs. Splunk is similar. An internal ELK stack (Elasticsearch, Logstash, Kibana) requires compute and storage.
Rough estimate: observability infrastructure for a fleet of agents processing 100,000+ transactions per month costs $1,000–$3,000/month.
That's on top of the vector DB, on top of the API cost, on top of the integration costs. Most teams discover this cost the hard way: after deploying agents and then realizing they can't debug failures because they didn't budget for observability.
The Human Review Multiplier
In regulated industries, a human has to review some or all agent decisions before they execute. A claims adjudicator reviewing an AI-processed claim might take 2–5 minutes. A loan officer reviewing an AI underwriting recommendation might take 5–10 minutes. A lawyer reviewing an AI contract analysis might take 15–30 minutes.
At $30/hour, a 3-minute review costs $1.50 per decision. At $50/hour, it costs $2.50. Multiply by 10,000 decisions per month and you're spending $15,000–$25,000/month in labor cost. This cost is not infrastructure; it's headcount. But it's 100% driven by the AI agent decision volume.
Most CFOs budget for agents as if they reduce headcount. In practice, the first generation of agents creates a hybrid role: a human reviewer who oversees AI decisions. You haven't eliminated the headcount; you've changed the role and added infrastructure cost on top.
The Testing and Evaluation Cost That Scales
Before you deploy an agent, you test it. Before you optimize it, you evaluate it. Both of these activities consume API calls. You run a prompt A/B test? That's 2x the API calls for the test period. You evaluate hallucination rates by running 1,000 test cases through the model? That's 1,000 extra API calls.
At scale, evaluation and testing cost 5–15% of production API cost. A team with a $50,000/month production API budget might be spending $5,000–$7,500/month on testing and tuning. This cost is invisible in the demo because vendors don't run tests with the customer present.
The Security and Compliance Overhead
If your agents handle PII, PHI, or financial data, you need:
- PII detection and redaction (before the agent sees it)
- Audit logging (immutable records of what the agent saw and did)
- Data residency constraints (some data can't leave a jurisdiction)
- Encryption in transit and at rest
- SOC 2 or FedRAMP compliance (if you're selling to enterprises)
Each of these adds engineering time, infrastructure cost, and operational overhead. A rough estimate: security and compliance infrastructure adds 10–20% to your total AI cost.
The Training Data Licensing Cost
If you fine-tune an agent on proprietary data, or license vendor-specific training datasets, you're paying licensing fees. Some vendors charge per-token for fine-tuning. Others charge per-model-update. These costs are negotiated per deal and often hidden in vendor contracts.
Budget for 5–10% of your API cost as training data and licensing fees if you're customizing agents heavily.
Layering the Iceberg
Stack these hidden costs:
- API call cost: $10,000/month
- Retry multiplier (10%): +$1,000
- Integration costs (Stripe, Twilio, CRM): +$2,000
- Vector database: +$1,200
- Observability: +$2,000
- Human review (if regulated): +$8,000
- Testing and evaluation (10%): +$1,000
- Security and compliance (15%): +$1,500
- Training data licensing: +$500
Visible cost: $10,000. True cost: $27,200. That's a 2.7x multiplier, and it's not unrealistic for a regulated industry with multi-step agents and human review.
What to Do Next
When evaluating an AI agent vendor, ask explicitly about cost of ownership including infrastructure, integrations, human review, and observability. If they give you only the API cost, you're seeing the tip of the iceberg. Request a total cost of ownership estimate that breaks down each hidden layer.
For a framework to build this estimate yourself, see the pillar article on AI agent cost.
Go deeper with the field guide.
A step-by-step PDF for implementing AI cost attribution.
Was this article helpful?