AI Cost Economics by Industry: The CFO's Cross-Vertical Playbook

17 min read · Updated 2026-05-02

Runrate Framework

The AI Cost Iceberg

Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).

Read the full framework →

Runrate Framework

AI Workforce P&L

Treat AI agents like employees: cost structure, productivity target, and retirement trigger per agent.

Read the full framework →

Runrate Framework

5-Stage AI Cost Maturity Curve

From Invisible → Tracked → Allocated → Optimized → Governed — where does your org sit?

Read the full framework →

Most AI cost conversations happen at the wrong altitude. Vendors quote per-token pricing. Operating partners quote per-headcount savings. Neither answer is wrong, but neither is the answer a CFO can defend in a board meeting. The number that matters is the cost-per-work-item economics in your specific vertical — because the unit of work, the human baseline, the failure tolerance, and the regulatory overhead are wildly different in customer service than in healthcare claims, and different again in legal review or RCM. This pillar is the cross-vertical playbook: what an AI agent actually costs to run in each major industry, where the ROI shows up, and where it quietly does not.

Why "AI cost" depends entirely on the vertical

A common mistake in 2025 was treating AI deployment as a horizontal infrastructure decision. CIOs negotiated platform contracts. CFOs benchmarked token spend. Procurement teams ran RFPs against generic feature checklists. The result: the same Claude Sonnet API call costs the same $0.003 in every vertical, but the all-in cost per business outcome ranges from $0.19 per resolved support ticket (Klarna) to $300+ per finalized commercial loan application — a 1,500x spread.

The reason is structural. Each vertical has a different unit of work, a different human baseline cost, a different failure tolerance, and a different regulatory tax. A customer service ticket can fail and be re-routed at low cost. A claim adjudication that fails costs a regulator's attention and a member's care. A contract clause hallucination costs litigation exposure. The same model, deployed in two industries, produces fundamentally different cost economics — and only one of them shows up on the AWS bill.

According to McKinsey's State of AI 2025, 88% of organizations now use AI in at least one business function, but only 39% report measurable EBIT impact, and only 5.5% qualify as high-performers. The gap isn't about model selection. It's about whether the deployment was sized to the actual cost-per-outcome economics of the vertical it was deployed into. Most pilots pencil at the API layer and quietly fail at the work-item layer.

The CFO's job in 2026 isn't to approve AI projects in the abstract. It's to ask: in this vertical, with our volume, our failure tolerance, and our regulatory profile, what is the loaded cost per finished work item, and how does that compare to the human baseline? Until that question is answered, the project shouldn't get past the budget gate.

The cross-vertical pattern: where AI wins and where it doesn't

Across the verticals Runrate's customers operate in, four patterns repeat with near-perfect regularity. Knowing them is what separates a confident AI portfolio decision from a hopeful one.

Pattern 1: Volume × repetition × low judgment = AI wins decisively. High-volume, low-complexity work items (password resets, order status, simple billing inquiries, claims status checks, basic eligibility screens, lead qualification) are the slam-dunk cases. The human baseline is $3–$15 per item; the all-in AI cost is $0.20–$1.50; deflection rates run 50–75%; payback is measured in weeks, not quarters. The ROI math is so lopsided it's nearly impossible to lose money. Customer service and SDR/lead qualification are the canonical examples (Article 98: AI for Customer Service, Article 104: AI for Sales/SDR).

Pattern 2: Medium volume × medium complexity × high regulatory tax = AI wins, but the iceberg is bigger. Healthcare claims, insurance underwriting, RCM denials, basic compliance review, and contract review at scale all fit here. Human baselines are $30–$300 per work item; AI baselines are $5–$30; deflection on the simple subset runs 40–60%; payback is 3–9 months. But the failure cost is real — a wrong denial gets a regulator's letter, a wrong eligibility check gets a member's complaint — so review rates run 15–35% of all AI-handled work, and that review time is the single largest hidden cost (Article 99: Healthcare Claims, Article 100: Insurance Claims & Underwriting, Article 102: RCM, Article 106: Compliance).

Pattern 3: Low volume × high complexity × high judgment = AI assists, doesn't replace. Commercial loan origination, M&A document review, complex litigation discovery, executive recruiting, strategic finance work. Human baselines are $200–$2,000 per work item; AI alone can't get to acceptable quality. The pattern that works is human-in-the-loop with AI accelerating the lawyer/banker/recruiter — cost-per-item drops 25–50%, but headcount doesn't (Article 103: Legal Document Review, Article 101: AI for Finance Teams).

Pattern 4: Internal-facing administrative work = AI wins quietly, no one writes the press release. HR onboarding emails, AP invoice coding, expense approval routing, compliance attestations, IT ticket triage, contract renewal tracking, payroll exception handling. Per-item cost savings are modest ($2–$10 per item), but volumes are huge and the work is genuinely repetitive. This is where mid-market companies see their largest aggregate AI savings — and where the savings most reliably persist beyond the pilot honeymoon (Article 105: AI for HR, Article 107: Back Office Operations).

The verticals that don't fit any of these patterns — strategy, M&A, litigation strategy, executive coaching, true product development — are exactly where 2025's most-publicized AI failures happened. The pattern was the same every time: a high-judgment, low-volume, high-stakes domain was given to an LLM with no human-in-the-loop, the LLM hallucinated, the work product had to be redone, and the project quietly disappeared from the next earnings call.

The work-item economics framework that travels across verticals

Despite the vertical-specific cost levels, the framework that produces a defensible number is the same in every industry. Runrate calls it the work-item economics stack, and it has five layers.

Layer 1: The visible API cost. Tokens × price-per-token. This is what shows up on your OpenAI, Anthropic, or Bedrock bill. For a customer service ticket: ~$0.015. For a healthcare claim: ~$0.10–$0.25. For a legal contract review: ~$1.00–$3.00. This is roughly 5–15% of the true cost.

Layer 2: Inference and orchestration overhead. Vector database storage, retrieval-augmented generation lookups, retry costs, retries on rate limits, multi-step agent loops. Typically adds 30–80% on top of Layer 1. A customer service workflow with 3 retrieval steps and 1 retry adds ~$0.012 to the $0.015 baseline.

Layer 3: Integration tax. Stripe lookups, Twilio sends, Salesforce updates, EHR queries, claims clearinghouse calls. Every external API call your agent makes costs money the LLM bill doesn't show. In claims processing, integration tax often exceeds the LLM cost. In customer service, it's typically 10–30% on top.

Layer 4: Human review and exception handling. This is the largest hidden cost in regulated verticals. Healthcare claims: 25–35% of AI-handled claims are reviewed by a human, at $0.50–$2.00 per claim in loaded review time. Insurance underwriting: 30–45% review rate. Customer service: 12–20% review rate. Legal review: nearly 100% review rate (AI is the first pass, the lawyer is the second). For most regulated verticals, this layer is 50–150% of the API cost.

Layer 5: Failure cost amortization. When the agent gets it wrong and the wrong answer reaches the customer, member, claimant, or regulator, what does that cost in remediation, reputation, or litigation exposure? In customer service: a wrong refund is $20–$200. In claims: a wrong denial reversal is $300–$3,000 plus regulatory scrutiny. In legal: a missed clause is potentially seven figures. Amortized across all work items, this layer is small (usually 1–5% of total cost) but it's the layer that determines whether the deployment is even allowed.

Add the five layers, divide by work items completed, and you get cost-per-outcome — the single number that should be the basis for every vertical AI decision. Per-token cost is for engineers. Cost-per-outcome is for CFOs.

Customer service: the most-studied AI vertical

Customer service is where AI has the deepest deployment data, and the work-item economics are now well understood (Article 98 goes deep on this). The human baseline is $5–$15 per resolved ticket once you load benefits, tooling, occupancy, and supervisor overhead. The visible AI baseline runs $0.15–$0.35 per ticket on tier-1 inquiries. Klarna publicly reports $0.19 per resolved ticket; Intercom Fin runs ~$0.99 per resolution; Sierra runs ~$1.50 per conversation.

The work-item economics by tier are revealing:

| Ticket type | Human baseline | AI loaded cost | Deflection rate | Net savings/ticket | |-------------|---------------|----------------|-----------------|---------------------| | Order status / shipping | $4.50 | $0.22 | 75–85% | $4.28 | | Password reset | $5.10 | $0.30 | 80–90% | $4.80 | | Billing inquiry (simple) | $7.20 | $0.65 | 55–70% | $6.55 | | Refund request (rules-based) | $9.40 | $1.10 | 40–55% | $8.30 | | Subscription cancellation | $11.50 | $1.85 | 25–40% | $9.65 | | Complex billing dispute | $14.00 | $4.20 | 10–20% | Variable | | Empathy / complaint resolution | $18.00 | N/A | <5% | N/A — keep human |

The strategic decision a contact center CFO should make isn't "should we deploy AI?" — it's "where is the deflection threshold beyond which net savings are negative?" The answer is almost always around $4–6 per ticket of unit savings; below that, the integration cost amortization eats the margin. Above it, deploy aggressively.

Healthcare claims: where the iceberg is at its largest

Claims processing is the vertical where the iceberg metaphor most accurately describes reality. The visible per-claim AI cost can be 5–8% of the true loaded cost (Article 99 details this). A simple eligibility check claim runs $0.18 in API cost, plus $0.25 in clearinghouse lookups, plus $0.40 in retrieval against the policy database, plus $0.65 amortized human review (because 30% of claims are reviewed at $1.85 review cost) = ~$1.48 loaded. The human baseline for the same claim is $11–$18.

Where it gets interesting:

  • Tier-1 claims (eligibility, status, basic adjudication): AI loaded cost ~$1.50, human baseline ~$12, net savings ~$10.50/claim. At 50,000 claims/month, that's $525,000/month of savings.
  • Tier-2 claims (denials, appeals, complex coordination of benefits): AI loaded cost ~$8.50 (higher review rate, more retrieval), human baseline ~$45, net savings ~$36/claim. At 8,000 claims/month, that's $288,000/month of savings.
  • Tier-3 claims (high-dollar adjudications, fraud-suspected, regulator-flagged): Stay human. The failure cost dominates everything else.

The regulated cost layer is what most pilots underestimate. CMS audits require explanation of every denial. State insurance departments require complaint logs. HIPAA compliance adds a documentation tax. Every AI-touched claim needs an audit trail, which means logging, retention, and ability to reproduce the decision. Runrate's customers in healthcare regularly find that the audit trail infrastructure is 8–15% of their loaded AI cost — and it has to be built before the first claim is processed, not after.

Insurance: claims, underwriting, and the documentation problem

Insurance is similar to healthcare in cost structure but with a meaningful twist: underwriting is a forward-looking judgment problem, not a retrospective adjudication problem. That changes the AI math (Article 100 covers this in depth).

For claims: AI loaded cost is $4–$15 depending on complexity, against a human baseline of $35–$180. ROI is strong, payback is 3–6 months, but the review rate stays at 30–45% indefinitely because of regulatory exposure.

For underwriting: AI loaded cost is $8–$30, against a human baseline of $80–$400 for personal lines and $200–$2,000 for commercial. The ROI looks great on paper. In practice, the productive deployment is human-in-the-loop — AI does the document review, classification, and risk-flag identification; the underwriter reviews the AI summary and makes the bind/no-bind decision. Headcount typically doesn't drop; throughput per underwriter goes up 30–60%, and the cost per policy bound drops 18–35%.

The structural challenge in insurance is documentation density. A commercial property submission can have 300–800 pages across loss runs, financial statements, building schedules, and prior carrier correspondence. The token cost of feeding that to an LLM dominates everything else — easily $4–$12 per submission just in input tokens. Compression strategies (RAG, summarization layers, selective extraction) are the difference between an underwriting AI deployment that's profitable and one that isn't.

Finance teams: where AI assists more than replaces

The expectation in 2024 was that AI would replace junior analysts. The reality two years in is that AI accelerates junior and mid-level analysts and barely touches senior work (Article 101 examines where this lands). Where the ROI shows up:

  • Account reconciliations and variance explanations: AI-loaded cost ~$0.40 per reconciliation, human baseline ~$8.00, deflection rate 60–75%. This is the strongest AI ROI in the finance function.
  • Invoice coding and AP routing: AI-loaded cost ~$0.18 per invoice, human baseline ~$2.40, near-100% AI handling on routine SKUs. Mid-market companies routinely see $300K–$1.2M annual savings on AP automation alone.
  • Expense report review: AI-loaded cost ~$0.25, human baseline ~$3.10, AI flags exceptions and humans review only the flagged ones. 80%+ throughput gain on the human side.
  • FP&A narrative generation: AI generates the first draft of MD&A, board commentary, and variance narratives. Human edits. Time-to-deliver drops 40–60%, headcount stays.
  • Audit prep and SOX testing: AI does sample selection and exception identification; humans review. Big-4 firms have driven this to ~$1.20 per testing iteration vs ~$25 for the manual baseline.

What doesn't work: M&A models, board-facing strategic finance, treasury investment decisions, executive financial communication. These are low-volume, high-judgment, high-stakes domains where AI is at best a junior assistant.

Legal is the vertical where the work-item economics force a different deployment shape (Article 103 covers this). The human baseline for contract review is $200–$800 per contract for outside counsel, $80–$300 in-house. AI-only review costs ~$1.50–$8.00 per contract, but the failure cost is so high that no production deployment in legal is AI-only.

The pattern that works: AI runs the first-pass review, flags clauses for attention, surfaces deviations from playbook templates, and produces a summary memo. The lawyer reviews the AI output, makes the substantive judgments, and signs off. Time-per-contract drops from 60–120 minutes to 15–30 minutes. Cost-per-contract drops 35–55%. Headcount typically doesn't change in mature law firms; capacity per lawyer goes up.

The interesting CFO question in legal is throughput vs. headcount. If your firm bills hourly, AI-driven throughput improvement compresses revenue per matter unless you reprice. If your firm bills fixed-fee, AI is pure margin expansion — and that's where most of the ROI in legal AI is going.

Revenue cycle management: where unit economics are most measurable

RCM is one of the cleanest verticals to measure AI ROI because the unit of work (a claim, a denial, a posted payment) is well-defined and the human baseline is transparent (Article 102 details this). The economics:

  • Initial claim submission: AI loaded cost ~$0.50, human baseline ~$6, deflection rate 70%+, net savings ~$4 per claim.
  • Denial appeals (tier 1): AI loaded cost ~$3.50, human baseline ~$22, deflection ~40%, net savings ~$11 per appeal.
  • Payment posting and reconciliation: AI loaded cost ~$0.15 per posting, human baseline ~$1.20, deflection 85%+. This is the highest-ROI RCM workflow.
  • Patient billing inquiries: Looks like customer service economics — see Pattern 1.

Where RCM gets interesting is in the integration tax. Every claim touches multiple systems: EHR, practice management, clearinghouse, payer portal, patient billing. A typical AI agent in RCM makes 4–8 external API calls per work item. Integration cost is often 60–110% of the LLM cost. Mid-market hospitals deploying RCM AI commonly see net savings of $1.5M–$8M annually, but only after integration cost is properly accounted for.

Sales and SDR: the cost-per-qualified-lead game

Sales development is the vertical where the unit of work changed most dramatically with AI (Article 104 goes deep). The human SDR baseline is $50K–$80K loaded, generating 10–25 qualified leads per month. Cost-per-qualified-lead is $200–$650.

AI-driven outbound, when deployed well, runs $4–$20 per qualified lead — but the qualification quality is generally lower. The pattern that works in 2026: AI runs the top-of-funnel volume (research, personalization, sequencing, initial response handling), and human SDRs handle qualification calls and discovery on the leads AI surfaces. Cost-per-meeting-booked drops from $300–$800 (human baseline) to $80–$200 (AI-assisted), and the conversion rate from meeting to opportunity stays roughly flat.

Where it goes wrong: pure-AI SDR with no human qualification step. Reply rates collapse to 0.3–0.8% (vs 2–4% with human SDRs), the brand starts to feel spammy, and cost-per-actual-customer skyrockets because the meetings that book aren't qualified. The ROI math has to include the downstream conversion, not just the cost-per-meeting.

HR: the quiet vertical where AI saves real money

HR is the vertical where AI ROI most reliably persists past the pilot (Article 105 covers this). The work-item economics:

  • Candidate screening: AI loaded cost ~$0.40 per resume, human baseline ~$3 per resume reviewed by a recruiter. At a fund or mid-market company hiring 200/year with 80 applicants per role, that's 16,000 resumes screened, $42,000/year saved on screening alone.
  • Onboarding email and document handling: Near-100% AI automation on routine onboarding tasks. Saves 8–15 hours of HR coordinator time per new hire.
  • Benefits questions: Look like customer service economics. Deflection 60–75%, net savings ~$4 per inquiry, ~$30K–$80K/year for mid-market HR teams.
  • Performance review draft generation: AI generates first-draft performance review based on goals and feedback inputs; manager edits. Saves 30–60 minutes per review.
  • Policy and procedure questions: Internal chatbot deflection rates of 70–85%, saving HR generalist time that scales with company size.

The pattern: HR AI rarely shows up in board-level cost discussions because the savings are spread across many small line items. But aggregate annual savings of $200K–$1.5M for a mid-market HR function are routine, with payback under 6 months in nearly every case.

Compliance and audit: AI as evidence-collection accelerator

Compliance is structurally similar to legal but with a higher tolerance for AI-led work (Article 106 examines this). The human baseline for compliance attestation review, audit evidence collection, and SOX testing is $40–$150 per item depending on complexity. AI loaded cost is $2–$15.

The pattern that works: AI does the evidence collection, anomaly detection, and first-pass review; the compliance officer reviews the exceptions. Volume of items reviewed goes up 3–5x without adding headcount; the cost per attested control drops 40–65%; audit findings drop because the AI catches anomalies the human compliance team would have sampled past.

The trap in compliance is the regulator-acceptability question. Some regulators (FDA, OCC, certain state DOIs) require documentation of who made each decision and on what basis. An AI-generated decision with no human in the loop fails this test. The deployment shape that works is always AI-recommends, human-decides, system-logs-both.

Back-office operations: aggregate savings, modest per-item gains

Back-office operations is the catch-all vertical where AI's compounding effect is largest in aggregate even though no single line item is impressive (Article 107 details this). Procurement contract analysis, vendor master data hygiene, expense management, AP, AR, treasury operations, supply chain documentation — each of these is a $1–$5 per work item savings, but volumes are tens of thousands per month. Mid-market companies routinely find $1M–$5M of annual back-office savings once they deploy AI across the full operations stack.

The CFO's mistake in this vertical is evaluating each project standalone. A $200K AP automation project doesn't excite anyone. A $300K vendor master cleanup project gets killed in budget review. Bundled into a back-office AI program with shared infrastructure, shared procurement, shared compliance overhead, and shared cost attribution, the same projects often produce 3–5x ROI of the standalone analysis.

This is where the AI Workforce P&L framework matters most. Treating these projects as discrete software purchases obscures the aggregate labor displacement. Treating them as labor allocation across a back-office workforce, with cost-per-work-item attribution, makes the bundled ROI visible.

The maturity progression that travels across verticals

A pattern that's now clear across Runrate's customer base: the verticals where AI deployment matures fastest are the ones where work-item economics are easiest to measure. Customer service matured first because tickets are countable. Sales/SDR matured second because meetings booked is countable. Healthcare claims and RCM are in active maturation now. Underwriting, legal, and complex finance are 12–24 months behind because the unit of work is harder to standardize.

The maturity progression — from invisible spend to governed spend — looks the same in every vertical, just shifted in time:

Stage 1 (Invisible): AI spend is buried in IT budget. Per-work-item cost is unknown.

Stage 2 (Tracked): Token cost is being tracked. Per-work-item cost is being estimated, but only the visible layers.

Stage 3 (Allocated): All five iceberg layers are accounted for. Cost-per-work-item is reported by vertical, by team, by customer segment.

Stage 4 (Optimized): Routing decisions (cheap model vs expensive model, AI vs human) are made automatically based on work-item economics. Marginal projects are killed; high-ROI projects are scaled.

Stage 5 (Governed): Vertical AI deployments are part of standard portfolio management. Annual planning includes vertical-by-vertical cost-per-outcome targets and a formal governance review.

Most mid-market companies are at Stage 1 or 2 across all verticals. Most enterprise companies are at Stage 3 in customer service and Stage 1–2 everywhere else. PE portfolio companies that deploy AI cost attribution properly accelerate to Stage 4 in 6–9 months on the high-volume verticals (CS, RCM, AP/AR) and Stage 3 in 12–18 months on the medium-complexity verticals (claims, underwriting, compliance).

What to do next

The unifying lesson across every vertical: the question isn't whether to deploy AI. The question is what cost-per-outcome target your vertical's economics support, and whether the deployment you're planning will hit that target after all five iceberg layers are loaded in. Customer service, RCM, AP/AR, and HR onboarding are nearly always positive ROI under any reasonable assumptions. Healthcare claims, insurance underwriting, and compliance review are positive ROI when integration tax and review rates are sized correctly. Legal, complex finance, M&A, and strategic functions are AI-assisted, not AI-replaced — and the ROI is in throughput per professional, not headcount reduction.

Three actions for a CFO or operating partner reviewing the vertical AI portfolio. First, demand cost-per-work-item economics for every vertical AI deployment, not just token-level cost. Second, classify each deployment into Pattern 1–4 above so the ROI expectation matches the structural reality of the vertical. Third, find out where each vertical sits on the AI Cost Maturity Curve, because most projects don't fail in deployment — they fail in attribution, when nobody can defend the loaded cost number 18 months in.

Curious where your team sits across the verticals you operate in? Take the 15-question self-assessment for a personalized maturity report you can bring to your next portfolio or board review.

Where does your team sit on the maturity curve?

Take the 15-question self-assessment and get a personalized report.

Start the Assessment

Articles in Vertical Playbooks