Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →Your company probably won't build its own LLM. What you will do is run an existing model — GPT-4, Claude 3, Gemini — on your proprietary data, thousands of times per day. The economics of AI cost are fundamentally shaped by the difference between training (building or customizing a model once) and inference (running that model repeatedly). Understanding this distinction is the foundation of every legitimate AI cost conversation.
Training: The Upfront Cost You Might Actually Pay
Training is the expensive, one-time act of teaching a model to understand patterns in data. When OpenAI trained GPT-4, they spent tens of millions in compute to process trillions of tokens of public text. When you fine-tune a model on your own claims data or customer conversations, you're paying for GPU hours to adjust the weights of an existing model to your specific domain.
For most CFOs, training costs take one of three forms. First, there's the hosted fine-tuning you buy from OpenAI, Anthropic, or Google — you upload your data, they run the training job, you get a custom model back. OpenAI charges roughly $0.03 per 1K input tokens and $0.12 per 1K output tokens for training on GPT-4; Anthropic's Claude fine-tuning is similarly priced. A typical fine-tuning job on 10 million tokens of claims data might cost $300-$600. Second, there's open-source model training on your own infrastructure — download Llama, set up a cloud GPU cluster, run it for hours or days. This is cheaper per token but requires engineering effort and infrastructure overhead. Third, there's the training work you're already paying for: data labeling, curation, evaluation. These hidden training costs often exceed the compute cost itself.
The critical insight: most enterprises never pay the headline training cost. You're buying access to GPT-4 or Claude that Anthropic already trained. Your training spend, if any, is on customization — fine-tuning to your domain — and that's optional. The dominant cost driver across your organization is inference.
Inference: The Recurring Cost That Scales With Every Use
Inference is the actual running of the model on new data to produce an output. Every time a customer service agent uses Claude to draft an email, every time a loan officer runs documents through an AI underwriter, every time a claims adjuster prompts GPT-4 for a summary — that's inference. You're executing the model weights to generate a completion.
Inference is priced per token at the vendor level (OpenAI charges $0.003 per 1K input tokens for GPT-4o; Claude 3.5 Sonnet costs $0.003 per input, $0.015 per output). But from a finance operations standpoint, this is where the AI Cost Iceberg lives. The visible cost is the tokens you run through the API. The hidden cost is everything downstream: retries when the model fails, tool calls to third-party APIs (Stripe charge, Twilio SMS, Salesforce lookup), human-in-the-loop review time, observability infrastructure to track what the model did, prompt caching layers to avoid re-processing, API gateway overhead. A claims team running 10,000 inferences per day at $0.02 per inference might report a $200/day token cost to finance, but once you account for retry logic (failed inferences that run again), human review (claims examiner spends 3 minutes on each to verify), and LLM observability infrastructure, the true cost is closer to $500-$700 per day.
The key distinction: training cost is a one-time delta to the model. Inference cost scales linearly (or worse) with every transaction your company processes.
Why Your CFO Sees Only 10% of the Bill
This is the core insight behind the AI Cost Iceberg. Finance teams budget against the API invoice — the tokenized cost of inference. But that invoice is the visible 10% of the true cost. The remaining 90% lives in the hidden layers: retries, human review, observability, vector databases storing embeddings, fine-tuning data preparation, evaluation, and vendor integration costs.
CloudZero's 2025 benchmark found that enterprises report an average AI spend of $85,521 per month, but only 51% can confidently calculate AI ROI. The gap isn't because the math is hard — it's because most teams are only budgeting token cost and calling it AI spend. They're not accounting for the systems cost of running inference at scale.
In a typical insurance underwriting workflow, your visible cost might be $15 per policy (GPT-4 inference on documents). Your hidden cost is $40 (human underwriter verification time at $50/hour × 48 minutes per policy, observability infrastructure, retries). The true cost per policy is $55, not $15. If you budget $15, you're planning for a negative margin from day one.
Training Becomes Strategic When You Own the Data Advantage
There is one scenario where training cost becomes significant: when you control proprietary data that competitors don't have access to. A healthcare network that has 50 years of claims and outcomes data, fine-tuned into a predictive model, has competitive advantage. An insurance underwriter with 20 years of policy outcomes in the model weights has something hard to replicate. In these cases, the training investment — both in compute and in data curation — is a strategic line item, not a rounding error.
But this is rare. Most enterprises don't have data that justifies custom training. What they have is proprietary workflows and proprietary context — the specific rules, guardrails, and business logic that makes the model useful in your domain. That's not training. That's prompt engineering and fine-tuning. And fine-tuning is a feature of inference cost, not a separate line item.
The Cost Attribution Principle: You Budget for the Recurring Cost
Here's the practical implication: when you're building a financial case for AI, you should assume training cost is zero or one-time, and your budget should be inference-forward.
If you're evaluating a vendor (Klarna's AI customer service at $0.19 per resolved ticket, Intercom Fin at $0.99, Sierra at $1.50), you're evaluating inference cost. The vendor built the model once; they're amortizing training over thousands of customers. If you're evaluating building internal AI (buying seats for your claims team to use GPT-4 or Claude), you're evaluating inference cost. If you're evaluating fine-tuning a model on your proprietary insurance data, you're paying for training (one-time, a few thousand dollars) plus inference (ongoing, per transaction).
The financial model is simple: training cost is a CapEx line item (one-time cost, amortized). Inference cost is OpEx (recurring, variable with volume). Most AI teams get their budgets rejected because they underprice inference and ignore the 90% hidden cost.
To move from Token Cost to Cost Per Outcome (the Runrate framework principle), you need to account for both. But inference is where you'll find the margin.
Curious where your team sits on the 5-Stage AI Cost Maturity Curve? Take the 15-question self-assessment and get a personalized report on your path to work-item-level cost attribution.
Where does your team sit on the maturity curve?
Take the 15-question self-assessment and get a personalized report.
Was this article helpful?