Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →Runrate Framework
5-Stage AI Cost Maturity Curve
From Invisible → Tracked → Allocated → Optimized → Governed — where does your org sit?
Read the full framework →The FinOps Foundation spent the last five years teaching the cloud industry how to talk about infrastructure cost in business terms—moving away from "optimize your instances" and toward "what does this business capability cost per unit of work?" Now that discipline applies to AI. FinOps for AI means taking those same operational and financial rigor principles and extending them down to the work-item layer: answering what a single resolved support ticket, adjudicated insurance claim, or processed loan application actually costs in AI dollars. Most CFOs are still looking at the API bill tip of the iceberg and budgeting against hidden complexity they can't see. Runrate operationalizes the FinOps Foundation principle for the agentic enterprise—making AI cost visible, allocated, and governed the same way you manage headcount.
What FinOps Foundation got right (and why you need it for AI)
The FinOps Foundation defined FinOps as "the discipline of financial operations and cost management for cloud." Their State of FinOps 2026 report shows that 98% of organizations now actively manage AI spend, up from 31% two years ago. That's not hype—it reflects real executive pressure. But the question that changed is simpler: it's no longer "how do we lower cloud bills?" It's now "what did we actually get for that spend?"
The Foundation's breakthrough insight was introducing the "Cost Per Unit of Work" KPI. Instead of measuring cloud cost in gigabytes of compute or storage, they reframed it as cost per unit of business outcome: cost per transaction, cost per user, cost per feature shipped. That discipline became the operating standard. For CFOs, this meant cloud cost became a business conversation, not a pure engineering conversation. Your CFO and COO could now argue about whether an extra $50,000/month in compute spend was justified if it reduced transaction latency by 200ms and lifted conversion by 1.2%.
AI agents demand the same discipline—only more so. A support AI agent that costs $0.19 per resolved ticket (like Klarna's system) is an entirely different business conversation than one costing $1.50 per ticket (like Sierra's offer). An insurance claims processor running at $5 per claim is either a margin compressor or a margin creator, depending on what the claim's baseline manual cost was. Yet 8 out of 10 CFOs still don't know what their AI agents cost per work item. They know the monthly API bill. They don't know why.
The FinOps Foundation phases, translated for AI
The Foundation defines four phases of FinOps maturity: Inform, Optimize, Operate, and Govern (the last added in 2024). Here's what each phase looks like for AI.
Inform: You have visibility into AI spend. Most organizations start here. Your bill from OpenAI, Anthropic, and Google sits on your tech expense line. You know it's growing—often 40-80% YoY. You do not know which business units are driving it, which use cases are generating the most cost, or whether your vector database is costing more than your inference. You definitely don't know cost per work item. This is stage 1 of the Maturity Curve: Invisible. The spend exists, but the organizational picture is opaque.
Optimize: You've allocated cost to business units and are applying engineering-led optimization (better prompts, fewer retries, smaller models, caching). CloudZero and Helicone own this mindset. Engineers are chasing token savings. The problem is that tokens are 1-5% of true AI cost. The real cost is in inference infrastructure, retry storms, vector database storage, observability, and human-in-the-loop review steps. Optimizing tokens without seeing the iceberg is like cutting your cloud instances and wondering why your AWS bill barely moved—because 70% of the cost was actually data transfer, managed services, and observability.
Operate: You've moved cost attribution to the work-item level. You know that your contact-center AI agent costs $0.47 per resolved ticket, and you've tied that cost to your P&L (cost of revenue, support operations, COGS). You've set cost targets: "we want to run this agent at $0.35 per ticket within six months." Now you're managing AI the way you manage customer acquisition cost (CAC) or cost per covered life in insurance. This is stage 4 of the Maturity Curve. This is where Runrate operates.
Govern: You've automated anomaly detection (alert when a single ticket costs 3x the historical average), built SLOs (service-level objectives) around cost per unit, published board-grade cost-governance reports, and hardened access controls. Stage 5 of the Maturity Curve. CFOs with AI cost SLOs and anomaly detection budgets are still rare—maybe 12-15% of mid-market companies. But they're the ones who survive AI margin compression.
The AI Cost Iceberg: What you're actually paying for
API tokens are the visible 10% of AI cost. The other 90% is hidden. The AI Cost Iceberg looks like this:
Visible (the tip): Your monthly API bill from OpenAI GPT-4, Anthropic Claude, Google Gemini. This is what most CFOs budget against. It's rarely more than 10% of total AI spend.
Hidden (the iceberg):
- Inference infrastructure (self-hosted models, edge deployment, inference endpoints)
- Vector database storage (embedding vectors grow continuously; retrieval-augmented generation databases are expensive to scale)
- Observability and monitoring (LLM observability platforms, logging, tracing)
- Retries and failure recovery (when an API call fails, your system retries; each retry costs money)
- Tool calls and third-party API integrations (your AI agent calls Stripe, Twilio, your CRM; each call has a cost)
- Human-in-the-loop review and correction (a claims processor that hands ambiguous cases to humans adds salary cost, QA cost, and latency)
- Security, compliance, and audit infrastructure (SOC 2 compliance, data residency, encryption, audit logging)
- Training data licensing and curation (if you fine-tune or train your own models)
- Prompt caching and context-window overhead (longer prompts cost more; caching reduces this but adds infrastructure)
- Rate limiting and gateway infrastructure (if you build your own API gateway to prevent runaway costs)
A typical mid-market company deploying an AI agent discovers that the API bill is 5-15% of total cost. The rest is infrastructure, people, and operational overhead. CFOs who budget only against the API bill systematize underfunding.
Why traditional FinOps tools can't see agent cost
Your existing FinOps tool (or your cloud cost platform) was built for cloud infrastructure: compute, storage, networking. The dimensions of cost are machines, regions, and services. You can drill from "total cloud cost" down to "EC2 cost in us-west-2" to "t3.large instances in us-west-2" to a specific instance ID.
AI agents don't map to that hierarchy. An AI agent costs money across five different vendors and three different cost categories (API calls, vector DB, observability), and the same work item (a single resolved ticket) may touch all three. Your cloud cost platform has no concept of "resolved ticket" or "claim adjudicated" or "contract reviewed." It sees API calls. It sees database queries. It doesn't see what work was actually done, or whether that work was done correctly.
This is the gap Runrate fills. We work backward from the work item. We connect inference cost to observability data to vector DB queries to human review steps, and we attribute the total spend to the outcome. That's work-item-level cost attribution. It's the operational layer that FinOps Foundation principles require but most CFOs don't have yet.
The four phases, operationalized: What to build quarter by quarter
Q1 — Inform: Get AI cost visible. Your first goal is to centralize AI spend in one place where your CFO can see it. This means API keys from OpenAI, Anthropic, Google, and any self-hosted inference platforms flow into a cost aggregation layer (Runrate, Vantage, or CloudZero). You want one number: "our AI spend is $X/month." You also want to see the top 10 use cases: which teams, which agents, which models are burning the most budget. You don't need to be perfect. You need visibility. This phase takes 2-4 weeks.
Q2 — Allocate: Charge AI cost back to business units. Now that you can see total AI spend, allocate it backward to the teams driving it. Sales team using an AI SDR? They get charged the cost. Finance team using an AI analyst? They see it on their P&L. Customer success using an AI escalation manager? They own that cost. This creates accountability. It also reveals which use cases are margin-accretive and which are margin-destructive. This phase takes 4-6 weeks and requires finance and ops collaboration.
Q3 — Attribute to work items: Build cost per outcome. Now move from "cost per team" to "cost per unit of work." This is where the real insight emerges. You can answer: "our support AI agent costs $0.38 per ticket resolved, with an average handling time of 3.2 minutes." You can compare that to your manual baseline: "our human agents cost $1.80 per ticket with 8.5 minutes handling time." Now you have an ROI conversation, not a budget conversation. You can also spot broken agents: "this claims processor is costing $12 per claim—what's wrong?" This phase takes 8-12 weeks because it requires engineering and finance to align on cost attribution logic.
Q4 — Govern: Automate and harden. Set SLOs. An SLO is a service-level objective—in this case, for cost. Example: "our contact-center agent runs at $0.40 ± 15% per ticket, flagged if it drifts above $0.46." Automate anomaly detection. Build a weekly cost-governance report for the board. Lock down API key rotation. Create approval workflows for high-cost experiments. This phase is continuous improvement; it doesn't have a fixed endpoint.
Cost Per Unit of Work is the KPI that matters
FinOps Foundation calls this "Cost Per Unit of Work." Runrate calls it the same thing. It means: cost per business outcome. Not cost per token. Not cost per API call. Cost per ticket, per claim, per transaction, per loan application, per contract reviewed. Pick the unit of work your business cares about.
Here's why this KPI changes behavior. If you manage AI as "tokens cost $0.003 and we need to optimize," you get engineering debates about prompt length and model selection. If you manage AI as "we target $0.35 per ticket resolved, and every ticket below that is margin-accretive," you get business conversations: "which use cases hit the target? Which ones don't? Why?" You also get alignment: engineers want to hit the SLO, finance wants to hit the SLO, the board wants to see progress.
The formula is simple: Total AI cost for work item / Number of units completed = Cost per unit.
Example: Your AI claims processor runs at $120,000/month. Last month it processed 24,000 claims. Cost per claim is $120,000 ÷ 24,000 = $5.00 per claim. Your manual baseline is $8.50 per claim. You're saving $3.50 in labor per claim. That's a $84,000/month margin improvement on baseline volume. That's the conversation a CFO can take to the board.
Budgeting AI spend: Why your old model breaks
You've probably budgeted AI spend one of three ways:
- Head count + contingency: "We'll spend roughly 20% of what we'd pay humans." (Doesn't work because you're looking at the API tip and hidden cost is 10x higher.)
- Percentage of cloud budget: "AI will be 5-10% of total cloud." (Breaks because AI infrastructure and cost drivers are different from cloud.)
- Per-model per-month: "GPT-4 API at $50k/month, fine-tuning at $15k/month, etc." (Breaks because you're not tying cost to business outcome.)
The right model for mid-market is outcome-based budgeting: "We're launching three AI agents in Q2. Agent A will handle support escalations at a target of $0.35 per ticket. Agent B will process claims at a target of $6 per claim. Agent C will qualify leads at a target of $12 per lead. Total estimated cost is $240,000/month for 48,000 work items." Then you track actuals against those targets. You course-correct mid-quarter if one agent is running hot.
Staffing and governance: Who owns AI cost?
In traditional cloud FinOps, the CFO owns the policy, engineering owns the tools, and the product team owns the tradeoff decisions. For AI cost, the structure is similar but the flavor changes.
CFO: Sets the budget, the SLOs, and the anomaly thresholds. Owns board reporting. Asks the hard questions: "why did our cost per claim jump 30% this week?" Decides governance policy: "any experiment requesting >$50k/month in new AI spend requires CFO approval."
COO / Ops Leader: Owns the definition of "work item" and the P&L attribution. If you're a claims company, the COO decides whether a claim that's rejected or kicked back to manual review counts as a "processed claim" (it shouldn't). They also own the cost SLOs and what happens when agents violate them.
Head of AI / AI Engineering: Owns prompt optimization, model selection, and the technical root causes of cost anomalies. When the claims processor is running hot, they diagnose: is it a prompt change? A model update? A retry storm? A cache miss?
Finance / FP&A: Tracks cost actuals against budget, manages the monthly board report, and ensures cost allocation logic stays consistent. They also often own the "what to do if we blow the budget" conversation.
In early-stage AI deployments, these roles might be two people. In mature deployments, they're a whole function. But the governance pattern holds.
Why the three FinOps frameworks matter for your AI strategy
This section ties back to Runrate's three brand frameworks, which serve as the vocabulary for AI cost conversations.
The AI Cost Iceberg forces you to see the full picture. It prevents the trap of "we'll save money by switching models" when the real costs (infrastructure, human review, observability) dwarf the API bill. When you present cost data to the board, frame it using the iceberg. "Our AI spend is $100k/month. The visible API cost is $12k. The hidden cost of infrastructure ($40k), observability ($18k), vector DB ($15k), and human review ($15k) is where the real story is." This changes how leadership thinks about optimization.
The 5-Stage AI Cost Maturity Curve gives you a roadmap for investment. You don't jump from stage 1 (Invisible) to stage 5 (Governed) overnight. You move progressively: visibility, allocation, attribution, SLOs, governance. Each stage requires different tools and team investment. Each stage unlocks different capabilities. Stage 2 (Tracked) is where most companies are today. Stage 4 (Optimized) is where you need to be to protect margin. Stage 5 (Governed) is where you stop getting surprised by cost spikes.
The AI Workforce P&L reframes AI agents as payroll. This is psychologically powerful with CFOs. Instead of "we're spending $250k/month on AI," you say "we have 3 AI agents on the payroll costing $250k/month, replacing 5 human FTEs that would cost $400k/month." Now it's a headcount conversation, which CFOs understand. You're building the "AI org chart" and the "AI salary budget." This framework also makes it clear: if an AI agent isn't worth what you're paying for it, you should decommission it—the same way you'd lay off an underperforming employee. That discipline prevents zombie agents that run forever because "someone built it."
The 30-60-90 day roadmap to FinOps for AI
Most organizations try to implement FinOps all at once and get paralyzed. Here's a phased approach.
Days 1-30 (Inform): Get visible. Centralize AI cost. One dashboard. One number. "We spend $X/month on AI." You'll have 70-80% accuracy. That's fine. The goal is not perfection; it's visibility.
Days 31-60 (Allocate): Distribute cost. Charge it back to teams. "Sales spent $40k on the SDR agent. Support spent $60k on escalation." This creates accountability. You'll discover which teams are biggest spenders, which use cases are most expensive, and where you need deeper investigation.
Days 61-90 (Operate): Build work-item attribution. "Each support ticket cost $0.42. Each claim cost $5.80." Now you have economics. You can do ROI analysis. You can compare to manual baseline. You can make margin decisions with data instead of intuition.
By day 90, you've moved three stages up the maturity curve. You're at stage 3-4 (Allocated/Optimized). You have the infrastructure to say no to experiments that don't have margin math. You can optimize smartly because you see the iceberg, not just the tip. You can talk to your board about AI with confidence instead of guessing.
The companies that move fast through these 90 days are the ones that get cost discipline before cost explodes. The companies that skip these steps are the ones that discover a 150% cost overrun at the end of the year.
AI cost governance checklist: Five things you need to ship
-
Cost aggregation and visibility. One dashboard where your CFO can see total AI spend by model, by team, by use case. Updated daily or weekly. No surprises on the monthly bill.
-
Work-item attribution. A documented process that ties every dollar of AI cost to a specific business outcome. "This support ticket cost $0.38. This claim cost $6.12." No ambiguity.
-
Cost SLOs and anomaly detection. Service-level objectives for cost (we run at $X ± Y%). Automated alerts when a work item costs 3x the norm. A weekly anomaly report for the CFO.
-
Board-grade reporting. A one-page monthly report for the board showing: total AI cost, cost as % of revenue, cost per unit (ticket, claim, etc.), YoY change, and anything that breached SLOs. This is the "AI line item" on the board agenda.
-
Approval workflows for new spend. If an engineer wants to run an experiment that might cost $100k/month, there's a documented ask-and-answer process. CFO approves. You avoid surprise overages.
Most mid-market companies have 1 of these 5. Best-in-class have all 5. Runrate is built to accelerate you from 1 to all 5.
Managing hidden cost: The vector database problem
One of the easiest places to get blindsided is vector database cost. When you build a retrieval-augmented generation (RAG) system, you take your proprietary knowledge base, convert it into vectors (embeddings), and store it in a vector database like Pinecone, Weaviate, or Milvus. The magic: your AI agent can retrieve relevant context before answering a question, which means it gives better answers and needs fewer retries.
The catch: vectors are continuous data. You generate embeddings once per document, but you store them forever (until you delete the data). Your vector database bill is proportional to stored vectors, not queries. A company with 5M documents in their knowledge base might have 50M-100M vectors (depending on chunking strategy). At Pinecone's pricing, that's $3,000-$6,000/month in storage alone, regardless of how much you query it.
This is where the iceberg bites. A CFO running cost optimization on "tokens" misses the vector database as an optimization target. Better places to look: (1) Chunk aggressively—fewer, better-targeted chunks reduce vector count by 30-50%. (2) Cull old data—do you really need 10 years of archived documents in the vector DB? Delete what you don't query. (3) Shop vector providers—prices vary 3-5x across platforms. (4) Move to hybrid search—combine vector search (expensive) with keyword search (cheap) to reduce reliance on vectors.
Your CFO's job: ask about vector database cost every quarter. Engineering usually has it buried in a line item. Expose it. Manage it like a line of business.
The human-in-the-loop tax
The other hidden cost that blindsides teams is human review. When you deploy an AI agent to do important work (claims adjudication, underwriting, contract review), you don't run it 100% autonomous. You have humans review a percentage of decisions. Maybe 10%. Maybe 30%. That human review has cost: salary, training, quality assurance, latency.
If your claims processor runs at $5 per claim in direct AI cost but 20% of claims require human review at an average cost of $8 per review, your actual cost per claim is: $5 + (0.20 * $8) = $6.60. That $1.60 in human cost is hidden.
The CFO's move: model human-in-the-loop cost as a line item. "The agent runs at $5 in AI cost, plus $1.60 in human review cost, equals $6.60 total cost per claim." Then optimize: can you improve the agent so it requires less human review? Can you find higher-confidence signals so you review fewer cases? Can you reduce review cost (batch review instead of real-time)?
This is where AI economics gets real. It's not just about the agent cost. It's about the total cost of the decision, including human touch.
Margin math: The conversation that matters
Here's the conversation every CFO should be having, but most aren't.
You're a mid-market insurance company. Your claims team has 15 people. They process 200 claims per day. Annual salary cost for the team: $1.2M (including benefits, training, overhead, about $80k per person). Cost per claim: $1.2M / 50,000 claims per year = $24 per claim.
You deploy an AI claims processor. It runs at $5 per claim in AI cost, plus $1.60 in human review cost (20% of claims need a human to double-check), equals $6.60 total cost per claim. You cut it to 12 people (3 people redeployed, 1 person hired for AI QA). New annual cost: $900k. New cost per claim: $18. Savings per claim: $24 - $18 = $6. On 50,000 claims, that's $300,000 in margin improvement.
AI cost: $5 * 50,000 = $250,000/year. Savings: $300,000. Net margin benefit: +$50,000. ROIC: $300k / $250k = 120% return on the AI spend alone, not counting the salary savings.
That's the conversation the board cares about. That's the narrative that justifies the FinOps investment.
Operationalizing FinOps Foundation for agents
The FinOps Foundation provided the vocabulary and the KPIs. Runrate provides the operational layer. Here's what that means.
The Foundation says: "Measure Cost Per Unit of Work." But how do you measure it when a single work item (a claim) touches 5 vendors? Runrate connects those dots. We log each claim's path through the agent, calculate its total cost across all vendors and infrastructure, and attribute it back to that one claim. We automate it so you don't have spreadsheets and manual reconciliation.
The Foundation says: "Set SLOs and manage to them." But how do you set an SLO when you don't know the variance distribution of your costs? Runrate observes three months of data, calculates the baseline, suggests a reasonable SLO band, and then flags violations automatically. We send you a Slack alert when an agent drifts. You don't have to manually check a dashboard.
The Foundation says: "Build governance." But how do you govern when there's no infrastructure for it? Runrate provides the dashboard, the anomaly detection, the board report, and the approval workflows. We're the execution layer that turns FinOps Foundation principles into operational discipline.
What to ship in your next quarter
If you're starting from scratch, here's the roadmap: Weeks 1-3: Get AI cost centralized. Connect your API keys (OpenAI, Anthropic, Google, self-hosted), pick a cost aggregation tool, and get your first "total monthly AI spend" number. Weeks 4-6: Define your core work items (resolved ticket, processed claim, qualified lead, reviewed contract). Build the attribution logic so every work item has a cost. Weeks 7-9: Calculate cost per unit for Q1. What's your baseline? Which agents hit margin targets? Which ones are outliers? Weeks 10-12: Set SLOs for Q2. "We want to run support at $0.40 ± 12% per ticket." Build the anomaly detection rules. Schedule a monthly cost-governance board report.
By end of quarter, you'll have moved from "we spend $X/month on AI" (stage 1, Invisible) to "we spend $X/month and run each work item at $Y cost with an SLO of $Z" (stage 4, Operated). That's a 3-stage maturity jump in 90 days.
The FinOps Foundation set the playbook for cloud. Now it's time to run the same playbook for AI. If you're building the CFO's case for AI cost attribution, the 40-page CFO Field Guide to AI Costs walks through the line-item model and the board-deck talking points.
Go deeper with the field guide.
A step-by-step PDF for implementing AI cost attribution.