Why AI Is More Expensive Than Software (And Why That's Permanent)

6 min read · Updated 2026-05-02

Runrate Framework

The AI Cost Iceberg

Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).

Read the full framework →

For 30 years, software followed a single economic rule: build once, scale infinitely. Microsoft does not pay more to add customer 10 million. Stripe's marginal cost to process payment one billion is the same as payment one. This near-zero marginal cost made software the most valuable category in business — you capture most revenue as gross margin.

AI breaks that rule entirely. An AI agent handling a support ticket, adjudicating an insurance claim, or processing a loan application costs money every single time it runs. This is not temporary. It is not a problem to optimize away. It is fundamental to how large language models work.

The Core Problem: Tokens Cost Money

Every time an AI model generates a response, it consumes tokens — discrete units of text that the model processes. Anthropic Claude costs $3 per million input tokens and $15 per million output tokens. OpenAI GPT-4 costs $10 per million input and $30 per million output.

These numbers sound small until you run the math. A single customer support conversation — the customer's message, the AI's response, maybe a follow-up or two — uses roughly 5,000 input tokens and 2,000 output tokens. At Claude's pricing, that is about $0.08 per conversation. At GPT-4's pricing, it is $0.20.

A mid-market contact center handling 1,000 customer conversations per day pays $80-$200 per day just for tokens. Over a month, that is $2,400-$6,000. Over a year, $29,000-$72,000. Scale that to a company with 50 agents and you are looking at $1.5M-$3.6M per year in token costs alone.

But that is only the visible cost. The hidden cost — infrastructure, retries, human review, vector databases, observability — is much larger.

Why Software's Playbook Breaks

Software economics work because you encode human knowledge into code once, then serve that code to many users at near-zero incremental cost. The engineering work happened before anyone paid. The delivery mechanism (running the software) costs almost nothing.

AI inverts this. The "knowledge" in an AI model is not code, it is billions of parameters learned from training data. Every time someone uses that model, it has to run the model — process the input, generate output token by token. This running-the-model step costs money. You cannot avoid it or optimize it away by getting clever with caching or clever architecture. Running the model costs inference.

This is why Stripe can afford a $0.29 per-transaction pricing model and still make 80% gross margins. Stripe built the payment network once, and now the marginal cost is noise. But Anthropic Claude, if it quoted $0.29 per transaction for AI-powered underwriting, would go bankrupt — because each token Anthropic serves costs Anthropic money, and Anthropic passes that cost to customers.

Some argue this will change. Maybe self-hosted models will get cheap enough that inference costs nothing. Maybe the laws of physics will let you run GPT-6 on a phone for free. These are nice hopes, but they misunderstand the constraint: the cost of inference is not a software problem to solve with better engineering, it is a physics problem. Running a neural network requires matrix multiplication, and matrix multiplication requires compute. Compute costs money. It costs less than it did in 2015, but it will never cost nothing.

A Worked Example: Customer Support at Two Scales

Let us walk through two customer support deployments to see where costs live.

Small company: 100 tickets per day, 22 working days per month

  • Tokens: 100 tickets × 7,000 tokens per ticket × ($0.003 per 1k input + $0.015 per 1k output) = $100,100/month.
  • Vector database storage (searching past conversations for context): $1,000/month.
  • Observability and logging: $2,000/month.
  • Human review (10% of conversations reviewed for compliance): $10,000/month in labor.
  • API integrations (to Zendesk, your backend, etc.): $500/month.
  • Total: $13,600/month. Tokens are 32% of the cost.

Large company: 50,000 tickets per day, 22 working days per month

  • Tokens: 50,000 × 7,000 × ($0.003 input + $0.015 output) = $5M/month.
  • Vector database (millions of stored conversations): $20,000/month.
  • Observability and logging (critical at scale): $25,000/month.
  • Human review (1% of conversations for compliance): $500,000/month.
  • Infrastructure (inference cluster redundancy, load balancing): $100,000/month.
  • API integrations and integrations team support: $50,000/month.
  • Total: $5.7M/month. Tokens are now 88% of the cost (because human review scales sublinearly and fixed infrastructure amortizes).

The point: token cost scales linearly with volume. But the full-cost structure — the AI Cost Iceberg — does not simplify as you scale. Human review, compliance, observability, and infrastructure are sticky costs.

Why This Cost Structure Is Permanent

Three economic facts make high marginal cost permanent for AI:

First: inference is compute-intensive. Processing tokens requires matrix multiplications on accelerated hardware. This is not a software problem that clever engineering solves. It is a hardware problem. The cost of GPU compute has fallen steadily, but it will not fall to zero. AWS charges $0.01 per token for Anthropic Claude on-demand for a reason: Claude has to pay for the infrastructure.

Second: models are not reusable like software code. Software code is a one-time fixed cost that scales infinitely. A trained model is also a one-time fixed cost, but it cannot be reused for arbitrary tasks. If you have a model trained for customer support, it does not work for claims adjudication. You have to either fine-tune it (more training cost, more compute) or use a different model. This means cost does not accumulate the way software revenue does.

Third: safety and compliance are not optional at scale. As AI does more valuable work, the cost of getting it wrong explodes. A model that is 95% accurate in customer support handles 5% wrong answers. In insurance claims, 5% wrong answers might trigger regulatory action or litigation. Reducing error rates to 99%+ usually requires human review of edge cases, which kills the margin. This is not new — it is why high-stakes software (banking, healthcare, defense) costs more than consumer software. AI doing high-stakes work will cost more.

None of these constraints are going away. A startup could build a better inference engine tomorrow, but it would still have to charge per token. The cost structure is the shape of the economics, not an implementation detail to optimize.

What This Means for Your AI Investment

If you are evaluating an AI deployment (a customer support agent, a claims processor, an underwriting assistant), do not assume the cost will fall like software did. Price it at the marginal cost it has today, then build your unit economics around that.

A $0.19-per-ticket AI system only makes sense if tickets are worth more than $0.19 to your business. A $0.50-per-claim cost only works if the claim is worth more than $0.50 in labor savings. AI is a cost-per-unit business, like services. The revenue has to justify the unit cost.

This is the permanent shape of AI economics, and every CFO needs to build their financial model around it from day one.

Where does your team sit on the maturity curve?

Take the 15-question self-assessment and get a personalized report.

Start the Assessment

Was this article helpful?