Tokens, Prompts, and Context Windows: The Language of AI Billing

5 min read · Updated 2026-05-02

Runrate Framework

The AI Cost Iceberg

Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).

Read the full framework →

Your vendor's invoice says you spent $12,450 on Claude this month. But you have no idea what you actually ran. That's because AI billing is denominated in a unit that almost no finance team understands: tokens. A token is roughly one-fifth of a word — the granular building block of how AI APIs charge. Understanding tokens, prompts, and context windows is the foundation of reading an AI bill with confidence.

What a Token Actually Is

A token is a unit of text that an AI model processes. Roughly, one token equals 0.75 words, so 1,000 tokens ≈ 750 words. In practice, one token might be a word ("hello"), or part of a word ("##ing"), or whitespace, or punctuation. Different models tokenize slightly differently — GPT-4 and Claude tokenize the same English text into slightly different token counts because they use different tokenization schemes. But the order of magnitude is the same: a typical page of text is 500-750 tokens.

Here's what matters for billing: vendors charge separately for input tokens (the text you send to the model) and output tokens (the text the model generates). As of May 2026, OpenAI's GPT-4o costs $0.005 per 1K input tokens and $0.015 per 1K output tokens. Claude 3.5 Sonnet costs $0.003 per input and $0.015 per output. Gemini 2.0 costs $0.075 per 1M input and $0.30 per 1M output (cheaper for longer context windows). The math is simple: if you run a 2,000-token prompt through GPT-4o and get a 500-token response, you're charged (2,000 × $0.005 / 1,000) + (500 × $0.015 / 1,000) = $0.010 + $0.0075 = $0.0175.

What complicates the picture: output tokens cost 3x more than input tokens for most models. When you're evaluating inference cost, you need to know the token count of your typical output, not just your input.

Prompts and Prompt Engineering

A prompt is the instruction plus context you send to the model. "Write a customer support response" is a 4-token prompt. "You are a financial claims adjudicator with 20 years of experience. Analyze the following insurance claim, extract the key facts, flag any irregularities, and recommend approval or denial with reasoning. Claim data: [claim details]" is a 60-token prompt.

The hidden cost driver in token budgeting is prompt size. If you're using the same prompt thousands of times per day — "you are a support agent, here's the system context, here are the rules, here's the customer message" — you're paying for that system prompt on every single inference. At scale, this adds up. A 500-token prompt run 10,000 times per day costs 5 million tokens, or $25/day just for the repeated context.

This is where prompt caching becomes a cost lever. Some vendors offer prompt caching (OpenAI's $0.90 per 1M cached input tokens, Anthropic's cache at variable rates): you pay a cheaper rate for the first 1,024 tokens of a prompt, then a discounted rate for cached context. If your standard system prompt is 200 tokens and you run it 10,000 times per day, caching can cut that cost in half.

The principle: every character in your prompt adds to cost. Prompt engineering — stripping unnecessary instructions, templating context, caching reused sections — is a legitimate cost optimization lever.

Context Windows and the Hidden Cost of Long Documents

A context window is the maximum amount of text a model can see at once. GPT-4's context window is 128K tokens (roughly 96,000 words, a 200-page book). Claude 3.5 Sonnet's is 200K tokens (150,000 words). Gemini 2.0's is 1 million tokens. To a finance team, this looks like a feature. To a CFO, it's a cost disaster waiting to happen.

Why? Because you pay for every token in your context window, whether you use it or not. If you send a 50,000-token document to Claude to extract one fact, you're paying for all 50,000 tokens of input. If you've built a system that sends the entire customer account history (100K tokens) to the AI to make a decision, you're paying for that entire context on every decision.

At Klarna's scale (millions of customer service conversations per month), running large context windows adds up fast. Klarna reports achieving customer service resolution at $0.19 per ticket. If that system is using 50K-token context windows at scale, even a 20% reduction in context length could cut cost per resolution from $0.19 to $0.15 — a significant margin lever.

The CFO principle: a larger context window is only valuable if you're actually using the information in it. Sending unnecessary data (old interaction history, irrelevant documents, verbose system prompts) is pure cost with no revenue benefit.

How to Read Your AI Bill

When you pull an invoice from OpenAI, Anthropic, or Google, here's what you're looking at:

Total spend = (input tokens × input rate) + (output tokens × output rate).

If your invoice says $12,450 and you have no idea where it came from, you need three things. First, ask your vendor for token breakdown: how many input tokens, how many output tokens? (Some vendors provide this in dashboard logs; some require support tickets.) Second, calculate your average cost per transaction: if you ran 100,000 inferences and spent $12,450, your cost per inference is $0.1245. Is that reasonable? Third, ask: what's my cost per outcome? If each inference produces a customer service response that has a known value (customer satisfaction, churn avoidance, operational efficiency), you can work backward to ROI.

This is the first step toward the financial OS for AI. Most CFOs don't have this breakdown. They see "API charges $12,450" on a line item and move on. The math is simple, but the transparency is rare.

Understanding tokens, prompts, and context windows is the language that lets you negotiate with vendors, build realistic budgets, and avoid AI bill shock. Curious where your team sits on the 5-Stage AI Cost Maturity Curve? Take the 15-question self-assessment and get a personalized report on your path to work-item-level cost attribution.

Where does your team sit on the maturity curve?

Take the 15-question self-assessment and get a personalized report.

Start the Assessment

Was this article helpful?