Token Economics for Finance Leaders

5 min read · Updated 2026-05-02

Your AI vendor charges by tokens. Your engineers talk about tokens. Your CFO needs to understand what a token is and how token costs translate into actual budget line items. Without this translation, you are budgeting blind.

A token is a discrete unit of text that an AI model processes. One token is roughly 4 characters of English text (a rough heuristic: 1,000 tokens is approximately 750 words). Tokens are priced in two tiers: input tokens (the text you send to the model) and output tokens (the text the model generates back to you). Output tokens cost more because generating them requires more compute.

How Token Pricing Works

Anthropic Claude 3.5 Sonnet (as of 2026):

  • Input: $3 per million tokens
  • Output: $15 per million tokens

In plain English: if you send Claude 1 million input tokens (about 750,000 words of English text) and get back 1 million output tokens (another 750,000 words), you pay $3 + $15 = $18.

OpenAI GPT-4:

  • Input: $10 per million tokens
  • Output: $30 per million tokens
  • Total for 1 million + 1 million: $40

Google Gemini Pro (cloud):

  • Input: $0.125 per million tokens
  • Output: $0.375 per million tokens
  • Total for 1 million + 1 million: $0.50

Gemini is dramatically cheaper. But "cheaper tokens" does not mean "cheaper AI deployment." Gemini is less capable on complex tasks, so many companies use it for simple work (summarization, classification) and Claude or GPT-4 for complex work (reasoning, code generation).

Converting Token Costs to Recognizable Budget Terms

Let us convert token costs into business metrics. A customer support conversation typically uses:

  • Input: 5,000 tokens (your customer's question, your system prompts, some context)
  • Output: 2,000 tokens (the AI's response)
  • Total: 7,000 tokens per conversation

At Claude pricing: (5,000 × $3 / 1M) + (2,000 × $15 / 1M) = $0.045 per conversation

At GPT-4 pricing: (5,000 × $10 / 1M) + (2,000 × $30 / 1M) = $0.110 per conversation

A mid-market company with 100 support tickets per day, 22 business days per month:

Claude: 100 × $0.045 × 22 = $99/month in tokens. On an annual basis, $1,188.

GPT-4: 100 × $0.110 × 22 = $242/month in tokens. On an annual basis, $2,904.

These sound like rounding errors. And they are — if you are only budgeting tokens. But at scale:

A large company with 50,000 tickets per day:

Claude: 50,000 × $0.045 × 22 = $49,500/month. Annualized: $594,000.

GPT-4: 50,000 × $0.110 × 22 = $121,000/month. Annualized: $1,452,000.

Now token cost becomes visible. But even $594k/year in tokens for a Fortune 500 company is manageable.

The problem is that tokens are only 10-15% of the total AI cost. The remaining 85-90% lives in infrastructure, human review, observability, and integration work.

Context Window and Long Documents

Some AI models charge based on context window — the amount of text you can feed to the model at once. Claude has a 200k token context window (about 150,000 words). GPT-4 also has a 128k-200k window (depending on the version).

If you are doing document analysis (reading contracts, claims, mortgage applications), you might send 100k tokens of document + 2k tokens of instructions = 102k tokens of input for a single request.

At Claude pricing: 102k × $3 / 1M = $0.31 per document analyzed.

For a loan origination system processing 1,000 applications per day:

Monthly token cost: 1,000 × $0.31 × 22 = $6,820.

Again, this is manageable. Until you layer in infrastructure, retries, human review, and integration costs.

The Batch Processing Economics

Some vendors offer batch processing APIs at a discount. OpenAI's Batch API charges $0.50 per million tokens (input and output combined) — 95% cheaper than on-demand. The trade-off is latency: batch jobs run asynchronously and complete in 24 hours, not 10 seconds.

For asynchronous work (data classification, document analysis, content generation), batch processing is a 50x cost win:

  • On-demand: 1M tokens = $30-40
  • Batch: 1M tokens = $0.50

A document classification job that processes 500,000 tokens per day:

  • On-demand: 500k × $30/M = $15/day = $330/month
  • Batch: 500k × $0.50/M = $0.25/day = $5.50/month

The batch approach saves $324/month, or $3,888/year. That is real savings, but only if you do not need real-time latency.

What CFOs Should Demand From Finance Teams

  1. A token budget by use case. Not "we have a $100k/month AI budget." But "customer support uses 500M tokens/month at $8k, data classification uses 100M tokens/month at $1.5k, internal tooling uses 50M tokens/month at $1k." This lets you see which use cases are token-efficient and which are burning money.

  2. Token cost per unit of business value. Not "we spent $100k on tokens last month." But "customer support costs $0.04 per conversation" or "document processing costs $0.22 per claim." This is the translation from tokens to business metrics.

  3. A model-selection logic. Why are you using GPT-4 for this task instead of Claude or Gemini? Is it because GPT-4 is actually better, or because someone defaulted to OpenAI? Running a 10-task sample through multiple models and measuring cost-per-outcome quality will often reveal that a cheaper model is just as good.

  4. A roadmap to reduce token cost. As models improve and get cheaper, your cost per query should fall. If you are using the same model and the same prompt in month 12 as month 1, you are leaving money on the table. Model optimization (better prompts, different model, batch processing) should be a continuous effort.

Token economics are not the full AI cost story. But they are the visible line item, and getting them right is the foundation for the hidden cost conversation to follow.

Where does your team sit on the maturity curve?

Take the 15-question self-assessment and get a personalized report.

Start the Assessment

Was this article helpful?