Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →Understanding how AI actually works—at a high level—removes the magic and lets you evaluate AI projects with real judgment instead of hope. This walkthrough covers how machine learning systems learn, how large language models generate text, and why understanding the mechanics matters for your cost model.
The Core Insight: AI Learns From Examples
All modern AI systems work the same way: they learn patterns from examples. You don't program the system with rules. You feed it data, and it finds patterns on its own.
Here's a concrete example. You're building a system to approve loan applications. You could hard-code rules: "if credit score > 700, approve; if debt-to-income ratio > 40%, deny." But what if your business has learned that some people with credit scores of 680 actually have lower default rates, because their profile shows a recent recovery from hardship? What if debt-to-income is less predictive than payment consistency? Hard-coded rules can't adapt to this nuance.
Instead, you feed the system 50,000 historical loan approvals and denials. You tell it: "These 50,000 applications were approved, and these 30,000 were denied. Figure out what separates them." The system then learns patterns. It might discover: "applications with credit scores between 680-720 that show recent payment recovery have a 5% default rate, while applications with credit scores above 720 but recent late payments have a 12% default rate." The system discovered this by analyzing the patterns. You didn't program it.
This is machine learning in a nutshell: instead of coding rules, you feed examples and let the system learn patterns from the examples.
How a Machine Learning System Learns
The learning process has a simple structure, but it's worth understanding because it explains why data quality matters so much to AI projects.
Step 1: Feed training data. You give the system 80,000 historical loan decisions. For each application, you provide the inputs (credit score, debt-to-income, payment history, income, employment tenure, etc.) and the output (approved or denied).
Step 2: Initialize with random weights. The system starts with a guess. Imagine it randomly picks: "If credit score > 650, approve." This is obviously wrong.
Step 3: Test on a few examples. "According to my rule, this application should be approved. The actual outcome was approved. Correct!" "This application should be denied. The actual outcome was approved. Wrong."
Step 4: Adjust the rule. "My rule is wrong. Maybe it's not just credit score. Maybe I need to also look at debt-to-income ratio. Let me adjust." The system tweaks its internal rules slightly.
Step 5: Repeat thousands of times. The system keeps testing, failing, and adjusting. Each time it fails, it learns a tiny bit. After 10,000 iterations, its accuracy improves from 50% (random guessing) to 80% to 90%.
Step 6: Stop when accuracy plateaus. At some point, the accuracy tops out. Adding more iterations doesn't help—you've found the best patterns that exist in the data.
This process is called "training," and the system that emerges is a "trained model." The model isn't a set of explicit rules. It's a mathematical function learned from examples.
Why This Matters: Garbage In, Garbage Out
If your training data is biased, your system will learn biased patterns. If your training data is incomplete or outdated, your system will miss important patterns.
Here's a real example. A bank trains a loan approval system on 10 years of historical data. During those 10 years, the bank approved mortgages to almost everyone—it was a lending boom. The system learns: "if they're asking for a mortgage, approve it." Now it's 2026 and the economy has shifted. The system is still approving every mortgage because it learned from an era of unlimited lending. Its training data wasn't representative of current reality.
This is why data quality and data governance are non-negotiable for AI projects. If your training data is:
- Biased: underrepresenting a demographic that applies for loans, your model will be biased against that demographic.
- Outdated: learned from an old era, your model will fail in a new era.
- Incomplete: missing important features, your model will miss important patterns.
- Mislabeled: wrong labels on the training examples, your model will learn wrong patterns.
A system trained on bad data will fail at runtime, or produce biased/inaccurate results. The cost implication: good AI is not cheap to build, because it requires good data. Many AI projects fail not because the technology is bad, but because the team didn't invest in data quality.
How LLMs Work (And Why They Hallucinate)
Large language models like GPT-4 and Claude work by predicting the next word in a sequence. That's genuinely it. You give it text, and it predicts word by word.
Here's how. The model was trained on billions of words of text (books, websites, code, articles). During training, it learned patterns: "when people write 'The Eiffel Tower is in,' the next word is usually 'Paris.'" "When someone writes 'Once upon a,' the next word is usually 'time.'" It learned millions of these statistical associations.
When you ask an LLM to write something, it starts predicting:
- You: "Write a summary of the American Civil War."
- Model thinks: "The most likely next word after 'War' in my training data is… 'was.' Or 'resulted.' Or 'occurred.' I'll pick 'was.'"
- Model outputs: "The American Civil War was"
- You: [continuing with what the model just output] "The American Civil War was a conflict between the north and south from 1861 to [the model now predicts the next word]…"
- Model thinks: "After '1861 to' the next word is usually '1865.'"
- Model outputs: "1865"
This continues until the model reaches a stopping point (usually a period or a token limit). The entire response is generated word by word, each word chosen to be statistically likely given all the previous words.
Now here's why hallucination happens. The model is predicting based on statistical patterns, not based on truth. If the training data contained the false statement "George Washington was the second president," the model learned that pattern. When it predicts the next word, it might generate that false statement because it's statistically likely based on its training data, even though it's factually wrong.
The model has no concept of truth. It's a pattern-matching engine that generates plausible-sounding text. This is why LLMs are great at generating text that reads well, but terrible at guaranteeing accuracy.
The Inference Pipeline (And Why It Costs Money at Scale)
This is where most executives get confused about AI cost. They see the API bill ($0.02 per inference) and think that's the cost. It's not.
Here's what actually happens when you run an LLM inference at scale:
- Request arrives. An API call comes in with your prompt.
- Authentication and logging. The system verifies your API key and logs the request.
- Tokenization. Your prompt is converted from words to tokens. This is computational.
- Cache check. The system checks if it's seen this exact prompt before and can return a cached result. If yes, you pay less or nothing. If no, continue.
- Model inference. The model runs on a GPU (graphics processor). This is the expensive part. Depending on model size and your request complexity, a GPU might be used for a fraction of a second, and a GPU costs hundreds of dollars per hour to run.
- Token generation. The model generates output, one token at a time. This also uses the GPU.
- Detoxification and filtering. The system checks if the output is appropriate (no sexual content, no violence, no violations of usage policy). This is computational.
- Logging and analytics. The system logs the request and response for auditing, compliance, and analytics.
- Return to user. The response is sent back.
Every one of these steps has a cost. The sum of all these costs is why a "$0.02 API call" really costs $0.10-$0.20 once you account for the infrastructure layer.
At scale, the cost multiplies. If you're running 1 million inferences per month, that's 1 million tokenization events, 1 million cache checks, 1 million compliance checks, 1 million logging events. The infrastructure cost becomes massive.
This is the AI Cost Iceberg: the visible API cost is the tip; all the infrastructure underneath is 5-10x larger than the visible cost.
Why Understanding the Mechanics Helps With Decision-Making
Many AI projects fail because teams don't understand these mechanics. They run a pilot, see success, and then scale it expecting the same unit economics. But at scale:
- Training data biases that didn't affect the pilot suddenly become liability risks.
- Hallucinations that were caught in manual review become operational problems at scale.
- Infrastructure costs that were negligible in the pilot become the dominant cost factor.
- Latency requirements that were fine for internal pilots become unacceptable for customer-facing systems.
Understanding how AI works tells you where the risks are. A system that learns from data will fail if the data is bad. A system that predicts word-by-word will hallucinate if not designed with hallucination detection. A system that scales will hit infrastructure bottlenecks and cost multipliers.
The best AI projects are led by people who understand these mechanics and design around them from the start. The worst AI projects are led by people who see a successful demo and try to scale it without understanding what will break.
What to Do Next
Pick one AI system your organization is running or considering. Walk through the mechanical questions: What data is it trained on? Is that data representative and unbiased? How does it generate output? Does it require accuracy (like loan approval) or is approximation okay (like brainstorming)? At what scale does it run? What infrastructure does it need? What are the failure modes at scale?
These questions will reveal where your real costs and risks are. More importantly, they'll reveal where you need to invest (data quality) and where you can cut (unnecessary infrastructure) before you build.
For the full context on how these mechanics translate to business cost and organizational maturity, see the pillar article on AI for business leaders.
Where does your team sit on the maturity curve?
Take the 15-question self-assessment and get a personalized report.
Was this article helpful?