What is an LLM (Large Language Model)?

An LLM (large language model) is a machine learning system trained on vast amounts of text to predict the next word in a sequence. GPT-4, Claude, Gemini, and Llama are all LLMs. Understanding how LLMs work—and specifically, how they charge for usage—is essential for any CFO managing AI budgets.

How LLMs Actually Work (The Finance-Grade Version)

An LLM works by predicting the next word, given all the words that came before. You give it a prompt: "Summarize this contract in three sentences:" plus the full contract text. The model looks at all those words and predicts the most likely next word. Then it predicts the word after that, and the word after that, until it reaches a stopping point. The output—the sequence of predicted words—is the model's response to your prompt.

This is fundamentally a statistical pattern-matching engine. The model learned by reading billions of words of text, finding patterns in language at a statistical level. It doesn't understand anything. It has no internal representation of meaning, no world model, no reasoning. It's finding which sequences of words typically co-occur, and using that to predict the next word.

Here's why this matters for finance. An LLM can write plausible-sounding text that matches the statistical patterns in its training data. It's excellent at generating content that looks like it came from a human. It's terrible at novel reasoning, factual accuracy on information outside its training set, and mathematical calculation (ironically, predicting words is harder for math than for prose). This means LLMs are great at summarization, research, draft writing, and customer communication. They're risky for technical analysis, regulatory compliance without review, and financial calculations without verification.

The Token Economics

LLM pricing is based on tokens. A token is roughly a word, though technically it's a subword unit. The word "hello" is one token. The word "transportation" might be 2-3 tokens. The model counts input tokens (everything you send to the model) and output tokens (everything the model generates), and charges you for both.

Here's where the unit economics matter for your budget. If you're running a customer service agent and each conversation averages 1,500 input tokens and 300 output tokens (a 5:1 ratio—customers tend to write more than agents respond), and you have 500 conversations per day, you're processing 900,000 tokens per day. At $0.025 per 1,000 input tokens and $0.07 per 1,000 output tokens, that's roughly $35 per day in API costs.

Scale that to a month (22 business days): $770 per month in token costs. But—and this is critical—the token cost is only the visible tip. You also need to account for:

Inference infrastructure: the servers and services that run the model, usually $0.10-$0.50 per inference on top of token cost.
Context caching and retrieval: storing and retrieving relevant background information for each request (contracts, policies, past cases), typically adding $0.05-$0.20 per request.
Monitoring and observability: tracking model performance, catching failures, logging responses, usually adding 10-20% on top of infrastructure.
Retries and error handling: when a request fails, the system retries, doubling the token cost.

In practice, the true cost per inference is typically 3-5x the token cost. A $35/day token bill becomes a $105-$175/day true cost. That's $2,310-$3,850 per month, not $770.

Context Windows and Why They Matter to Your Bottom Line

Every LLM has a context window—the maximum amount of text it can "see" at once. GPT-4's context window is 128,000 tokens. Claude's is 200,000 tokens. Cheaper models like GPT-3.5 have 4,000-16,000 token windows. A token budget of 200,000 words is enormous until you start adding real business context.

Here's a concrete example. You're running a claims processing agent. For each claim, the agent needs to see: the claim form (500 tokens), the customer's full claim history (2,000 tokens), relevant policy documents (3,000 tokens), past similar claims for reference (1,000 tokens), compliance guidelines (1,500 tokens). That's 8,000 tokens just for context, before the agent writes anything.

If you're processing 100 claims per day, you're using 800,000 context tokens per day just for background information. At higher-token pricing for context ($0.005 per 1,000 context tokens), that's $4 per day in context costs. That doesn't sound like much until you realize you also pay for the model's response, infrastructure, monitoring, and human review.

The finance implication: longer context windows sound like a benefit ("see more information"), but they're also a cost multiplier. An agent with a 200,000-token context window costs more to run than an agent with a 4,000-token window, all else equal. Your choice of model is a cost decision, not just a capability decision.

Hallucination and the Cost of Mistakes

LLMs hallucinate—they generate information that isn't true. They cite sources that don't exist. They make up facts. This isn't a bug. It's a statistical property: the model is predicting the most likely next word based on patterns in training data, which doesn't guarantee truth.

The cost implication: you cannot deploy an LLM in a high-stakes scenario (claims approval, financial advice, legal analysis) without human review. That review step is non-negotiable. It's also a real cost. If your operations team is reviewing claims, and each review takes 5 minutes at $50/hour labor, that's $4.17 per claim in review cost. If 20% of your AI-processed claims get escalated for review due to hallucination risk, that's a $0.83 addition to cost per claim, plus the infrastructure and token cost.

According to research cited in the MIT "GenAI Divide" report, 95% of AI pilots fail to deliver P&L impact. A large portion of those failures trace back to hallucination and accuracy costs that weren't factored into the ROI model.

Why More Expensive Models Aren't Always More Expensive to Run

This is counterintuitive but important. GPT-4 costs more per token than GPT-3.5. But GPT-4 is often more accurate and requires less human review. A claim that costs $0.10 in GPT-4 API tokens but requires 2 minutes of review ($1.67) is more expensive than a claim that costs $0.05 in GPT-3.5 tokens but requires 10 minutes of review ($8.33).

The cost optimization is not just about cheaper APIs. It's about total cost per outcome. The more capable model might be the cheaper choice when you account for all costs.

This is why finance leaders need to stop thinking about "LLM costs" and start thinking about "cost per work item"—the total cost, including tokens, infrastructure, and human review, to process one claim, one support ticket, or one contract.

What to Do Next

For each LLM-based application your team is running or evaluating, calculate the full cost: tokens plus infrastructure plus monitoring plus expected human review time. Compare that cost per unit of work output to the business value the application generates. If your customer service agent costs $0.50 per ticket but customers who get served by the agent spend 20% more over their lifetime (worth roughly $15 in incremental LTV per ticket), the math works. If the agent costs $0.50 per ticket and has no measurable impact on customer behavior, it doesn't.

For more detail on the economics layer, see the full pillar on AI for business leaders.

Where does your team sit on the maturity curve?

Take the 15-question self-assessment and get a personalized report.

Start the Assessment

Was this article helpful?

Related in this cluster

AI Fundamentals

AI for Business Leaders, Explained Without the Jargon

All12 min read

AI Fundamentals

What is Artificial Intelligence? A Non-Technical Guide for Executives

CFOCOO5 min read

AI Fundamentals

What is Generative AI and Why Does It Cost Money?

All6 min read

What is an LLM (In Plain English, for Finance Leaders)

How LLMs Actually Work (The Finance-Grade Version)

The Token Economics

Context Windows and Why They Matter to Your Bottom Line

Hallucination and the Cost of Mistakes

Why More Expensive Models Aren't Always More Expensive to Run

What to Do Next

Where does your team sit on the maturity curve?

Related in this cluster

AI for Business Leaders, Explained Without the Jargon

What is Artificial Intelligence? A Non-Technical Guide for Executives

What is Generative AI and Why Does It Cost Money?