Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →Runrate Framework
AI Workforce P&L
Treat AI agents like employees: cost structure, productivity target, and retirement trigger per agent.
Read the full framework →Sequoia Capital's billion-dollar question haunts every boardroom: "What will be the first $600B AI company?" But before you chase unicorns, your CFO needs to answer a harder question: What is the actual economics of running AI in our business? Most CFOs are looking at API bills and missing the iceberg of hidden costs beneath them. AI economics looks nothing like software economics — and understanding why is the difference between a defensible AI investment and margin erosion you didn't see coming.
The Zero-Margin-Cost Problem That AI Solves (And Creates)
For three decades, software followed a brutal economic truth: once you built it, the marginal cost of serving one more customer was nearly zero. This is what made software so valuable. Microsoft pays the same to license Windows to customer one million and customer ten million. Stripe's marginal cost to process payment one billion is the same as payment one.
This is Aggregation Theory — the insight Ben Thompson at Stratechery uses to explain why the best tech companies become quasi-monopolies. Build once, scale infinitely, capture most of the value. SaaS exploited this ruthlessly. A company with 78% gross margin was not unusual because the cost structure allowed it.
AI reverses this. An AI agent running customer support does not have zero marginal cost. Every ticket it handles — every token it generates, every API call it makes to verify data, every time a human has to review its work — costs real money. This is a return to the economics of services, not software. And it makes the old playbook obsolete.
Why AI Costs More Than Software: The Token Economy
The most visible cost of AI is tokens. Anthropic Claude 3.5 Sonnet costs around $3 per million input tokens and $15 per million output tokens. OpenAI's GPT-4 costs $10 per million input and $30 per million output. These look like rounding errors at first glance.
Until you scale. A single customer support conversation might use 5,000 input tokens and 2,000 output tokens — let's call it $0.08 per conversation. At 100 conversations per day, that's $8. At scale across 50 agents handling 50 tickets each per day, you're looking at a $20,000/month token bill. That's visible.
But token cost is the tip of the iceberg. According to research from Foundation Capital, token costs account for perhaps 10-15% of the true cost of running AI in production. The remaining 85-90% lives in the hidden layers: the infrastructure to run inference at scale, the vector database storage, the observability systems, the human-in-the-loop review, the retries when the model fails, the API calls to third parties, the security and compliance overhead. This is the AI Cost Iceberg, and most CFOs are budgeting against the visible tip.
The Inference Cost Problem
Inference is the actual running of the model — taking your prompt and generating an answer. It does not happen free. Your company is either paying OpenAI, Anthropic, or Google by the token, or you are paying cloud infrastructure costs (AWS, Azure, GCP) to run a self-hosted model on GPUs, which cost $5,000-$15,000 per month for a small inference cluster, 24/7.
Klarna's AI customer service agent, which now handles two-thirds of customer conversations, runs at approximately $0.19 per resolved ticket. Intercom's Fin runs around $0.99 per resolution. Sierra, a competitor, runs around $1.50. These are not token costs. These are full-stack costs: tokens, infrastructure, human review, error handling, everything.
Now scale: a mid-market company with 50,000 support tickets per month at Klarna's unit cost is paying $9,500/month just for AI customer service. At Sierra's cost, $75,000/month. This is orders of magnitude larger than the token bill alone.
The AI Cost Iceberg Framework
To understand the full cost structure, use the AI Cost Iceberg framework. It divides AI spend into two layers:
Visible AI spend (roughly 10% of total): API bills from vendors — OpenAI, Anthropic, Google Cloud Vertex, Llama via Together.ai. This is the line item you see on the bill.
Hidden AI spend (roughly 90% of total): everything else. Inference infrastructure (GPU clusters, or margin paid to cloud vendors for inference). Vector database storage ($1,500-$20,000/month, depending on volume). Observability and monitoring ($5,000-$15,000/month). Retries and failure handling. Tool-use API calls (each time your agent calls Stripe, Twilio, or a custom backend, that's a billable event). Human-in-the-loop review time (a compliance officer spending 10 minutes reviewing an AI output has just added $100-$300 to the cost of that work). Security and compliance infrastructure (logging, encryption, audit trails). Training data licensing and curation. Evaluation costs (running A/B tests to measure which model version works better). Prompt caching infrastructure. Rate-limit and gateway infrastructure.
One Foundation Capital analysis examined a customer-support AI deployment and found that tokens accounted for $15,000/month but the full stack — including retries, vector storage, observability, human review, and integration work — came to $187,000/month. Tokens were 8% of the true cost.
How McKinsey's Data Shows the Adoption-Impact Gap
According to McKinsey's State of AI 2025 (surveying 1,993 organizations), 88% of enterprises now use AI in at least one business function. This is near-universal adoption. But only 39% report any measurable impact on EBIT, and only 5.5% classify themselves as "AI high performers."
This gap — nearly 9 in 10 companies using AI but only 1 in 20 seeing real financial benefit — reveals the structural challenge. Many companies are running AI pilots that do not hit the unit economics needed to justify scale. They are treating AI like a cost center (a cool new tool) rather than a profit center (a work item with a cost per outcome).
The difference between the 88% using AI and the 5.5% succeeding is usually cost attribution. High performers know exactly what an AI agent or model deployment costs to run, per unit of work. They have work-item-level cost attribution. Most do not.
AI as Payroll: The AI Workforce P&L
Think of an AI agent the way you think of a headcount hire. Each employee has a W-2 cost, a benefits cost, a workspace, a supervisor, and a performance metric. Each AI agent needs equivalent infrastructure: a "token budget" (the cost per query), a "full-stack cost" attribution (what P&L line does this agent serve), a deployment model classification (is it a contractor using third-party APIs, or a W-2 equivalent self-hosted model), and a clear retirement trigger (when does this agent get deprecated).
This is the AI Workforce P&L framework. It forces the question: Are we treating AI like a tool, or like we treat labor? Because if AI is doing work-item-level labor, it needs to be on a payroll. And your CFO does not have a payroll system for it yet.
A mid-market insurance company might run an AI agent that adjudicates claims. The agent handles 500 claims per day, at a cost of $8 per claim (full-stack). That's $4,000/day or $80,000/month. If it replaces a 1.5-person claims adjustment team that cost $150,000/month, it is a $70,000/month win. But if the CFO is only budgeting the $8,000/month token bill, and the hidden $72,000/month in infrastructure is buried across IT, vendor management, and support, then the ROI looks opaque. The AI Workforce P&L makes it visible: this agent costs $80,000/month and generates $150,000/month in labor savings. Hire it.
The Bessemer Thesis: Outcome-Based Pricing
Bessemer Venture Partners has argued that AI pricing models are shifting away from per-seat (how SaaS priced for three decades) and toward outcome-based pricing: you pay for results, not for access.
This matters because it forces vendors to own the cost structure. If Stripe charges you $0.50 per transaction AI processed, Stripe has to absorb the inference cost, the retries, the human review. Stripe now cares deeply about the cost iceberg, because it is eating the cost. This is different from OpenAI pricing at $3 per million tokens — OpenAI collects revenue but does not own whether your deployment is economically viable.
For a CFO, outcome-based pricing flips the unit economics question. Instead of asking "how many tokens do we use," you ask "what is the per-resolution fee," or "per claim adjudicated," or "per conversation completed." This is the language of unit economics, and it is how mature AI vendors will price.
Why AI Adoption Stumbles: The ROI Realization Gap
Deployment costs for AI are deceptive. Building a prototype is cheap. Running it in production at scale is not.
The MIT NANDA "GenAI Divide" finding (cited widely in HBR and MIT Sloan Review) states that 95% of AI pilots fail to deliver P&L impact within 18 months of launch. Why? Most companies deploy the model and assume it will work. They do not:
- Budget for the full cost iceberg. They see the token bill, not the infrastructure bill.
- Instrument cost-per-outcome metrics. They cannot measure whether the AI is economically viable because they do not know what it costs per ticket, per claim, per transaction.
- Plan for human-in-the-loop review. Compliance and quality control steps explode the cost structure. No one budgeted for it.
- Upgrade to outcome-based contracts. They keep paying per-token, which incentivizes the wrong behavior (burn tokens, collect vendor revenue).
High-performing companies do the opposite. They treat AI like a capital investment or a headcount hire: define the target cost per unit of work, evaluate vendors against that metric, and instrument cost attribution from day one.
The Vendor Landscape: Who Bears the Cost?
The shape of the vendor landscape tells you something about how cost structure is shifting. When the category was new, everyone charged by the token because it was easiest. This meant customers bore the cost uncertainty.
Now, we are seeing specialist vendors emerge that own different parts of the cost iceberg:
- Klarna, Intercom Fin, Sierra — customer-service AI agents that own the full stack and quote outcome-based pricing ($0.19-$1.50 per resolution). The vendor bears the cost of inference, retries, and human review.
- Decagon, Devin — task-specific agents that quote per-conversation or per-task pricing.
- CloudZero, Vantage — cost observability for engineers (do not own the cost, just measure it).
- Runrate — work-item-level cost attribution for finance and operations, letting companies measure cost per outcome across any AI vendor.
The vendors that will win are the ones that own the full-stack cost and can guarantee a unit cost per outcome. This forces alignment: if you quote $0.50 per ticket handled, you have to make the entire machine (inference, retries, human review, integration) fit that budget.
What This Means for the CFO Function
The playbook for software was: unit economics are good when gross margin is above 75%. The playbook for AI is different. It is: unit economics are good when cost per outcome is competitive with the manual labor it replaces, and the margin per outcome can support the R&D and platform costs of running the AI.
A $0.19-per-ticket cost structure only works if the ticket is worth more than $0.19 in value to the customer (either in customer happiness, retention, or operational efficiency). If you are handling a $5 value ticket, $0.19 is a win. If you are handling a $0.50 value ticket, it is a loss.
This shifts how CFOs evaluate AI investment. Not "is this cool," but "what is the fully-loaded cost per outcome, and does it beat the labor alternative." This requires work-item-level cost attribution — knowing what an AI agent actually costs to run, per unit of work. Most CFOs do not have this yet. Building it is the difference between AI as a cost center and AI as a profit center.
Build the cost attribution first. Everything else follows.
Go deeper with the field guide.
A step-by-step PDF for implementing AI cost attribution.