Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →Runrate Framework
5-Stage AI Cost Maturity Curve
From Invisible → Tracked → Allocated → Optimized → Governed — where does your org sit?
Read the full framework →Runrate Framework
AI Workforce P&L
Treat AI agents like employees: cost structure, productivity target, and retirement trigger per agent.
Read the full framework →If you lead a finance or operations team, your board is asking about AI return on investment, your CEO is making agent-driven spending decisions, and you're being asked to explain concepts like "LLM," "inference," and "AI agent" without sounding like you swallowed a machine learning textbook. This guide breaks down the AI landscape in plain English—no prerequisites, no jargon crutches, just the mental models that let you make real decisions.
What "AI" Actually Means (and Why Your CEO Keeps Using the Word Wrong)
Most of the time, when a business leader says "AI," they're actually describing one of three different technologies layered on top of each other. Understanding the difference between artificial intelligence, machine learning, and generative AI is the first step to having a real conversation about cost, risk, and return.
Artificial intelligence is the broadest umbrella. It simply means a computer system that performs tasks that normally require human judgment. A spam filter is AI. An algorithm that decides which ads to show you is AI. A system that approves a loan application is AI. None of these require anything fancy—they're just rules encoded by a programmer, or patterns learned from data. The key insight: "artificial intelligence" is a label for any system that mimics human decision-making, whether it's a 1990s decision tree or 2025's large language model.
Machine learning is a subset of AI. Instead of humans writing rules ("if the customer's credit score is above 700, approve the loan"), the system learns rules from historical data. You feed it 10,000 past loan approvals and denials, and it finds the patterns. This is more flexible than hard-coded rules because it adapts as your business changes. Most AI systems deployed today use some form of machine learning.
Generative AI is a newer capability within machine learning. It can create new content: text, images, code, audio. ChatGPT is generative AI. An image generator is generative AI. A system that writes contract summaries is generative AI. What makes it "generative" is that it produces something novel, not just classifying or predicting something that already exists. Before 2022, generative AI was a research curiosity. Now it's the flashpoint for every board conversation about AI spend.
The mental model: AI is the category. Machine learning is the technique. Generative AI is the recent capability that's making CFOs nervous about budgets.
Why Generative AI Is Expensive (And Why It Isn't Your Fault for Not Knowing)
Generative AI—particularly large language models like GPT-4, Claude, and Gemini—seems cheap at first glance. You sign up for an API, you get a usage-based bill, and it looks like a rounding error in your software budget. This is where most finance teams make their first mistake.
According to McKinsey's State of AI 2025, 88% of organizations use AI in at least one business function, but only 39% see measurable EBIT impact. The gap isn't usually about whether the technology works. It's about the gap between the visible bill and the actual cost. When you run an AI agent at scale—say, 500 customer service conversations a day—the true cost includes not just the API bill (the visible tip of the iceberg) but also infrastructure to handle failures, human review time, data storage, compliance overhead, and retries when the model makes mistakes.
This is the AI Cost Iceberg: the part you see on your AWS bill is roughly 10% of the true operational cost.
Here's why. Every conversation with an LLM costs money in three ways: first, there's the upfront cost of the API call itself. Second, there's the cost of running the infrastructure around it—the servers, the caching layers, the databases that store previous conversations so the model doesn't repeat itself, the monitoring systems that catch when things go wrong. Third, there's the human cost: someone has to review the model's work to catch mistakes, especially in high-stakes domains like claims processing or loan approval. A $0.05 API call can easily become a $0.50 or $1.00 work item when you include review time and failure recovery.
This is why Klarna's AI customer service runs at roughly $0.19 per resolved ticket, while Intercom's Fin runs around $0.99, and Sierra around $1.50. The API cost is nearly identical. The difference is infrastructure, reliability, and human review.
Large Language Models: The Engine Under the Hood
An LLM (large language model) is a type of AI trained on enormous amounts of text—books, websites, articles, conversations—to predict the next word in a sentence. That's it. It's a word-prediction machine. You give it a prompt ("summarize this contract"), and it predicts the next word, then the next, generating a response one word at a time.
Here's why this matters for a business leader. An LLM doesn't understand anything. It has no internal representation of meaning. It's finding patterns in language at a statistical level. This is why LLMs sometimes hallucinate (invent citations, make up facts), why they can be biased if the training data was biased, and why they fail predictably on tasks that require novel reasoning that wasn't in their training set.
When you deploy an LLM for business, you're betting on three things: first, that the task is common enough to exist in the training data; second, that the model's statistical patterns are good enough for your use case; and third, that human review can catch the failures. None of these are guaranteed.
The cost structure of an LLM is also different from traditional software. Traditional software has a fixed cost: you build it once, you pay for servers, done. LLMs charge by the token—essentially, by the word. Every time someone uses the model, you pay. At scale, this compounds. If you're running 1,000 customer service conversations a day, and each conversation uses 2,000 tokens on average, and you pay $0.02 per 1,000 tokens, you're spending $40 per day on API costs alone. But that's just the iceberg tip.
What's Actually Meant by "Inference" and Why It Shows Up on Your Bills
Inference is just the operational running of the model. When someone types a question into ChatGPT and gets an answer, that's inference. When your support system uses Claude to triage an incoming ticket, that's inference. Inference is when the model is actually working, as opposed to being trained.
The reason "inference" shows up on your bills is because that's when you pay. You don't usually pay for training—cloud AI vendors like OpenAI, Anthropic, and Google do the training for you and amortize the cost across their customers. You pay for inference: every time the model runs and generates an output, you incur a cost.
This has a direct implication for your budgeting. If you're building an AI agent that processes 100 insurance claims per hour, you're paying for 100 inferences per hour, not a flat annual license. As your usage grows, your bill grows linearly. This is fundamentally different from traditional software, where you might negotiate an annual contract and the marginal cost of additional users is near zero.
AI Agents: The Jump from "Chatbot" to "Worker"
An AI agent is a system that can take actions, not just answer questions. A chatbot answers your question. An agent answers your question, then takes the action you asked for, and then reports back on the outcome.
Here's the distinction that matters for business leaders: a chatbot can tell you how to reset your password. An agent can log into your account, reset your password, and send you a confirmation email. A chatbot can explain a loan denial. An agent can re-evaluate your loan application, ask for additional documentation, resubmit, and notify you of the outcome.
The reason this distinction matters is cost and risk. An agent is more expensive because it's doing more work—it's not just generating text, it's making decisions that affect your business. An agent that's making loan decisions, claims adjudications, or customer support resolutions needs more oversight, more reliability infrastructure, and more human review than a chatbot. An agent that hallucmates or makes a mistake has higher business impact.
Agents are the fastest-growing category of AI deployment in 2025-2026, and they're driving the cost conversation. A chatbot might cost $0.01 per conversation. An agent might cost $0.50 to $2.00 per work item, depending on the domain and failure tolerance.
"Agentic AI" vs. "AI Agents" (Or: Why Search Volume on This Term Tripled)
This is the source of confusion that's driving the most Google searches in 2025. "AI agents" and "agentic AI" are not the same thing, and the conflation is driving real business decisions off the rails.
An AI agent is a discrete system—a worker. You can point to it: "that's our claims-processing agent." It has a clear input (an insurance claim), a clear output (approved or denied), clear costs (tokens, review time, infrastructure), and a clear P&L impact.
Agentic AI is a design philosophy or mode of operation where any AI system—an LLM, a workflow, a multi-step process—is architected to act autonomously, plan its own steps, and iterate toward a goal. Rather than answering a single question, an agentic system breaks a large problem into smaller steps, executes them, checks the results, and adapts. This can be applied to a chatbot, to an internal business process, to a data analysis workflow, or to a customer-facing system.
The business implication: an AI agent is a thing you buy or build. Agentic AI is a way of designing any system to be more autonomous. If your vendor says they're "building agentic AI," they mean they're designing systems that can handle complex multi-step problems, not just single-turn interactions. This usually means higher cost, because more autonomy usually means more tokens, more API calls, and more failure recovery.
How AI Gets the Answers Wrong (And Why It Matters for Your P&L)
LLMs are statistical engines. They're very good at predicting what word comes next based on patterns in training data, which makes them great at generating plausible-sounding text. They're terrible at novel reasoning, factual accuracy when the facts are outside their training data, and logical consistency when the logic is complex.
Here's why this matters. When you deploy an AI agent to approve a loan, or adjudicate a claim, or write a contract, you're staking your company on a system that:
- Sometimes hallucinates (makes up facts). This happens randomly; you can't predict when.
- Sometimes fails on edge cases that are common in your business but rare in the training data.
- Sometimes produces biased outputs if the training data was biased.
- Has no way to know when it doesn't know. It will confidently give you a wrong answer.
This is why every serious AI deployment in high-stakes domains (finance, healthcare, legal) includes human review. An agent processes a claim, a human reviews it, and if the human spots an error, they fix it or reject the AI decision. This review step is not optional—it's mandatory infrastructure.
The cost implication: if you're building an agent to process claims, and 10% of claims need human review because the AI wasn't confident enough, then human review time is part of your cost per outcome. If your service team is processing 100 claims a day, and each requires 5 minutes of review time at $50/hour labor cost, that's an additional $41.67 per day in human review cost, on top of the API cost.
The Real Cost Model: From Tokens to Cost Per Work Item
This is the shift that separates amateur AI deployments from professional ones. Amateurs cost AI by the token. Professionals cost AI by the outcome.
A token is roughly a word, or a fraction of a word. When you call an LLM API, you're charged per token. If your model is GPT-4 and you're using it for customer support, you might pay $0.03 per 1,000 input tokens and $0.06 per 1,000 output tokens. This is how the API vendor bills you. But this isn't how you should budget AI from a business perspective.
Here's why. A customer support ticket might use 500 input tokens (the customer's message plus context) and 200 output tokens (the AI's response), costing you roughly $0.02 in API cost. But the true cost includes infrastructure ($0.10), error handling and retries ($0.05), human review time ($0.50), and data storage ($0.02). The actual cost per ticket is $0.69, not $0.02.
The modern AI finance practice is to ignore token costs and instead calculate cost per work item: cost per resolved support ticket, cost per adjudicated claim, cost per processed loan application, cost per contract reviewed. This number is what goes into your P&L, and it's the only number that lets you actually measure AI ROI.
According to FinOps Foundation research, this shift—from token tracking to cost-per-outcome KPIs—is what separates organizations at the "Optimized" stage of the AI Cost Maturity Curve from those stuck at "Tracked."
Why Your Current Cost Tools Are Blind to AI Spend
Your cloud cost tool (Datadog, CloudZero, Apptio, Vantage, or others) can tell you what you're spending on servers, databases, and APIs. But most of these tools have a blind spot: they can't see AI spend at the work-item level. They can tell you that you're spending $50,000 a month on API calls, but they can't tell you whether that $50,000 is generating $500,000 in business value or $50,000.
This is because AI spend is embedded in your infrastructure bill in ways that traditional FinOps can't untangle. The observability tool that watches your LLM inferences can tell you how many inferences you ran and how many tokens you used. But it can't connect those tokens to business outcomes. The cost allocation tool can tell you how much Stripe, Twilio, and OpenAI charged you, but it can't tell you which business unit each charge served.
This is the gap that creates the unowned middle of AI economics. Your CFO is looking at the API bill. Your engineering team is tracking token usage. No one is answering the question: "What does this AI agent actually cost to run, in terms of real business outcomes?"
Solving this problem—adding AI cost attribution at the work-item level—is what moves organizations from the "Tracked" stage of the Maturity Curve to the "Allocated" and "Optimized" stages.
What to Do Next
The mental models in this article are the foundation for every AI decision you need to make: whether to build or buy an agent, whether a particular AI tool is worth the cost, what risks you need to mitigate, and how to explain AI spend to your board.
Start by asking your team three questions. First: "Where is our AI spend actually happening—just the API bills, or are we also counting infrastructure, human review, and data storage?" Second: "Can we break down our AI costs by the business outcomes they drive—cost per support ticket, cost per claim processed, cost per work item?" Third: "Which of our AI projects are actually measuring ROI, and which ones are we funding on faith?" These questions should reveal where your organization sits on the 5-Stage AI Cost Maturity Curve. Most mid-market companies are at stage 1 or 2. The path to maturity—and to confident AI investment decisions—runs through work-item-level cost attribution.
Curious where your team sits? Take the 15-question self-assessment and get a personalized report on your AI cost maturity.
Where does your team sit on the maturity curve?
Take the 15-question self-assessment and get a personalized report.