Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →Most contact centers treat AI as a labor replacement play: hire fewer agents, deploy a chatbot, save money. The numbers tell a different story. Yes, Klarna runs its AI customer service at $0.19 per resolved ticket while legacy human support costs $5–$15 per ticket. But that figure hides the AI Cost Iceberg—the visible API spend is just the tip.
The work-item economics of customer service
A customer support ticket is your unit of work. A human agent resolves one in 4–8 minutes and costs your contact center approximately $5–$15 per ticket once you factor in salary, benefits, tooling, and occupancy. That $10 median hides wide variation: tier-1 offshore support runs closer to $3; tier-1 US-based closer to $20.
AI vendors price per resolution or per conversation. Klarna's internal AI handles customer inquiries at $0.19 per resolved ticket. Intercom Fin (Intercom's generative support tool) runs at roughly $0.99 per resolution. Sierra, the AI copilot for support teams, operates at around $1.50 per conversation. At face value, this looks like a 94% cost reduction. Then you look at the rest of the iceberg.
The hidden costs that most finance teams miss: human review of edge cases (15–30% of tickets), customer escalation when AI fails (another 8–15%), retry costs when the model produces a wrong answer and needs a second attempt, API calls to your third-party tools (Stripe for payment info, Twilio for SMS verification), real-time observability infrastructure to catch failure, and the salary cost of a human agent or supervisor who monitors AI quality. If Klarna's true all-in cost is $0.19, assume that $0.19 includes an embedded assumption about supervision ratios and failure rates.
For a mid-market company with 50 support agents handling 10,000 tickets per month, the delta looks like this: fully human support costs $100,000–$150,000/month. AI-assisted support with 60% deflection and human review built in runs $15,000–$25,000/month. The real gain is in the middle 60%: the routine, repetitive, low-judgment tickets that don't need a human.
Where AI actually pays back
AI customer service ROI appears in three clear places: high-volume, low-complexity ticket types (password resets, billing inquiries, order status), 24/7 coverage where human staffing is expensive, and geographies where labor arbitrage is shrinking (Eastern Europe, India wages rising; US remote labor plentiful but unionizing).
The ticket complexity axis matters. Routine inquiries: AI wins decisively. "Where's my order?" costs $0.19 to resolve. A user credential reset costs $0.30 in API calls and review time. A billing dispute with three price tiers and a custom promotion code? That stays human. A customer angry about a refund who needs empathy and a judgment call? Still human.
The hidden variable is first-contact resolution rate (FCR). A human agent resolves issues on first contact 70–80% of the time. An AI agent on routine tickets (because it's restricted to low-complexity only) resolves 85–92% on first contact. But the denominator matters: if you're routing the hard 15% of tickets to human agents anyway, your average AI FCR collapses to 55–65% because the agent still handles escalations, and escalations often require a second or third touch.
Vendor pitch: "95% deflection." Reality: 95% of simple stuff goes to AI. Then customers still need human agents for the other 15% of issues, and those cost more per ticket because they're harder.
The vendor landscape for AI customer service
Klarna (public case study, undisclosed terms but estimated $10M+ annual) uses a proprietary in-house LLM trained on 270,000+ previous chat conversations, deployed on its own infrastructure. Not a realistic model for mid-market.
Intercom Fin integrates with Intercom's full helpdesk suite; it's priced per monthly active conversation, roughly $500–$5,000/month depending on volume. Decagon sells on a per-conversation basis ($0.10–$0.25 per conversation, depending on complexity). Ada and Forethought both operate on similar per-conversation or per-resolution metrics. Sierra works as a copilot—agents use it to draft responses, so cost scales with agent productivity gains (12–18% faster resolution time reported by some customers).
The competitive axis is not just cost. It's observability: can you see why the AI failed on a specific ticket, or do you just see aggregate deflection metrics? Can you set business rules (don't offer a discount over $50, always ask for account number before refunding)? How quickly can you retrain the model on your specific product knowledge?
Most vendors lock you into their infrastructure and their pricing model. Decagon or Ada for high-volume simple tickets; Sierra if you want to augment existing agents; Intercom Fin if you're already in Intercom's ecosystem.
The cost attribution problem in customer service
Finance teams often buried cost-per-ticket metrics deep in operational dashboards, if they measure them at all. Here's the friction:
First: vendors report metrics in their own units (conversations, resolutions, sessions). Intercom reports "conversations handled," Ada reports "deflected interactions," Decagon "completed conversations." Marketing copies the highest number into the pitch deck.
Second: human team cost sits in multiple budget lines. Agent salary is COGS. Supervision and QA is overhead. Training is headcount. Your finance team can't isolate true support team cost without a forensic audit.
Third: hidden cost in the new vendor relationship. Setting up API keys to your CRM, vetting the vendor's data handling practices for GDPR/CCPA, training supervisors on new QA workflows, six weeks of shadow mode before live deployment. A $5,000/month vendor might cost $40,000 to integrate properly.
The bridge: build a cost-per-ticket baseline (total support team spend ÷ annual tickets handled). Then build a forecast that includes vendor cost, marginal human review cost (supervisors spend 15 minutes per day reviewing 20 flagged tickets), and integration overhead amortized over 18 months. That's your all-in AI cost, not the vendor's cherry-picked number.
Customer service AI cost benchmark table
| Metric | Human support (US, tier-1) | AI (Klarna-style) | AI (Intercom/Ada-style) | Deflection rate | FCR after escalation | | --- | --- | --- | --- | --- | --- | | Cost per resolved ticket | $10–$15 | $0.19 | $0.99–$1.50 | 50–70% | 65–75% | | Cost per conversation | $8–$12 | $0.19 | $0.15–$0.30 | — | — | | Integration cost (6-month amortized) | — | — | $500–$3,000/mo | — | — | | Escalation rate | 5–10% (human misjudgment) | 15–25% (confidence-based) | 20–30% (stricter rules) | — | — | | Avg resolution time | 6–8 min | 90 sec | 3–4 min | — | — | | Supervision cost per agent FTE | $60k salary | — | $20k/year (QA) | — | — |
The COO playbook for AI customer service
-
Establish your baseline. Take your last 12 months of support spend (all-in: agents, tools, facilities) and divide by resolved tickets. That's your $5–$15 anchor. If you can't compute it, you can't measure ROI.
-
Map ticket complexity tiers. Segment your ticket backlog into three buckets: routine (password reset, FAQ, order status), moderate (billing adjustments, refunds under $50, simple account changes), and complex (disputes, escalations, custom solutions). Measure what percentage falls into each bucket. AI pays back on tier-1 (usually 40–60% of volume) and tier-2 (20–30%).
-
Run a 60-day pilot. Deploy AI on tier-1 tickets only. Measure: deflection rate (tickets resolved without escalation), FCR of tier-1 subset, human review time per ticket, escalation time to complex queue, and customer satisfaction. Don't measure company-wide CSAT; measure CSAT on tier-1 tickets only (AI should be higher than human agents on simple stuff).
-
Calculate true vendor cost. Take the vendor's per-ticket cost and add: (1) your own supervision cost (assume 5–10% of resolution time for QA review), (2) API call costs to your CRM/payment system, (3) integration labor (one 6-month contract engineer, $25k), (4) training for supervisors (assume 40 hours). Add these line-by-line to the vendor's advertised cost.
-
Model the staffing delta. If you deflect 50% of tier-1 tickets (your largest bucket), how many agents can you shed or reallocate? One agent handles 200–300 tickets per month. If you remove 5,000 tier-1 tickets from the human queue with AI, you eliminate 15–25 FTEs. Each FTE costs your company $60k–$100k all-in. That's your payback axis: $900k–$2.5M in headcount savings annually, minus vendor cost ($50k–$300k), minus integration cost.
-
Lock in observability. Require the vendor to expose: failure reasons (confidence score too low, knowledge base miss, rules triggered), customer satisfaction per ticket, escalation rate by ticket type. Weekly dashboards, not quarterly reports. This is how you catch deterioration before it becomes a board problem.
-
Set a clear escalation trigger. If deflection drops below 40%, or FCR drops below 60%, or escalation rate climbs above 30%, you pause the vendor and re-baseline. AI customer service is not a set-it-and-forget-it system; it decays with product changes, customer base shift, and model drift.
For mid-market COOs building the business case, the honest story is: AI saves 40–60% of tier-1 support cost and 0% of tier-3 cost. If your support team is 200 people and you can shift 30–50 of them to more complex work or eliminate them, the payback is clean. If you're a small contact center where agents are already fully loaded and you're not hiring, AI is a margin story, not a headcount story—expect 8–12% net savings.
When you're ready to model this in your own stack with work-item-level accuracy, talk to Runrate to see how much of your support spend is truly attributable to AI versus human labor.
Want to see this in your stack?
Book a 30-minute walkthrough with a Runrate founder.
Was this article helpful?