How to Audit an AI Vendor Invoice

7 min read · Updated 2026-05-02

Runrate Framework

The AI Cost Iceberg

Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).

Read the full framework →

Most CFOs review AI vendor invoices the same way they review AWS bills: check the total, approve it, and move on. This is a mistake. AI vendor pricing is opaque, hidden costs are rampant, and vendors rely on the fact that you won't audit them. A rigorous invoice audit catches errors, catches fraud, and catches vendors who are overcharging you.

This article walks through the audit process step-by-step, with sample red flags and a checklist.

The audit process: Six steps

Step 1: Baseline your expected cost

Before the invoice arrives, build a cost forecast.

Sample calculation:

  • Work items handled this month: 5,200 support tickets
  • Pre-agreed cost per ticket: $0.55
  • Expected cost: 5,200 × $0.55 = $2,860

Now add the hidden costs from the AI Cost Iceberg:

  • Retries (assume 3% of requests fail and retry once): add $85
  • Vector retrieval overhead: add $120
  • Tool calls to CRM (assume 2 calls per ticket, at $0.01 per call): add $104
  • Observability and logging: add $60
  • Prompt caching efficiency credit: subtract $200

Adjusted forecast: $2,860 + $85 + $120 + $104 + $60 - $200 = $3,029

Invoice arrives: $3,187 (6.5% higher than forecast)

This is within reasonable tolerance (set a 10% threshold), but the delta warrants investigation.

Step 2: Request detailed cost logs

Ask the vendor for a cost breakdown, line by line. This is non-negotiable. Your contract should have audit rights.

What to request:

  • Daily cost by work item (or aggregated by category if work-item level isn't available)
  • Cost component breakdown (tokens, API calls, retries, vector retrieval, etc.)
  • Usage metrics (total requests, error rate, retry rate, tool call volume)
  • Cost per work item class (e.g., "password reset tickets cost $0.45 on average; billing disputes cost $0.82")

Sample cost log format:

| Date | Work Item Type | Count | Token Usage | Retries | Tool Calls | Base Cost | Retry Cost | Vector Cost | Tool Call Cost | Total | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 2026-04-01 | Password reset | 156 | 12,480 | 5 | 156 | $62.40 | $2.00 | $6.24 | $1.56 | $72.20 | | 2026-04-01 | Billing inquiry | 89 | 9,890 | 8 | 178 | $49.45 | $4.00 | $4.95 | $1.78 | $60.18 | | 2026-04-02 | Password reset | 142 | 11,360 | 4 | 142 | $56.80 | $1.60 | $5.68 | $1.42 | $65.50 |

If the vendor gives you an invoice with just one line ("Total AI services: $3,187"), they're hiding cost. Push back and request detailed logs.

Step 3: Cross-check against your observability logs

Your infrastructure should have logs of every request made to the vendor. Compare your logs to the vendor's logs.

What to compare:

  1. Request count. Did the vendor process the same number of requests you logged? Discrepancies of >2% warrant investigation.
  2. Token count. Did your observability tool (Langfuse, LangSmith, etc.) record similar token usage? Discrepancies of >5% warrant investigation.
  3. Error and retry rate. Your logs show retry rate; vendor logs should match.

Sample reconciliation:

Your observability logs for April 2026:

  • Total requests: 5,487
  • Average tokens per request: 2,156
  • Total tokens: 11,825,272
  • Estimated cost at $0.05 per 1K tokens: $591

Vendor invoice for April 2026:

  • Total requests: 5,492 (5 more than your log—within 0.1%, OK)
  • Total tokens: 11,890,000 (64,728 more than your log—0.5% higher, acceptable but watch for trend)
  • Cost: $595 (OK)

If vendor tokens are 10%+ higher, ask why. Possible explanations:

  • You undercounted retries in your logs
  • Vendor's observability includes prompt caching credit you didn't account for
  • Vendor is padding token count (unlikely but check)

Step 4: Investigate cost deltas and anomalies

Flag any line item that's significantly different from baseline.

Red flag examples:

  1. Spike in cost per ticket. Your baseline is $0.55/ticket, but on April 15 the cost jumped to $1.20/ticket. Why? Possible explanations:

    • The agent hit a rate limit and retried heavily (legitimate)
    • The agent failed on a batch of tickets and needed manual intervention (legitimate)
    • The agent processed unusually complex tickets (legitimate)
    • The vendor is incorrectly counting tokens (red flag—investigate)
  2. Spike in error rate. Your baseline retry rate is 2%, but one day it spiked to 8%. Why? Possible explanations:

    • Your API was slow and the agent timed out (your problem)
    • Vendor API was degraded (vendor problem—ask for a credit)
    • The vendor changed something in the model or infrastructure (ask for transparency)
  3. Unused service charges. You didn't use a feature (e.g., "premium logging"), but the vendor charged you for it. Red flag—push back immediately.

Step 5: Calculate cost-per-outcome

Calculate your actual cost-per-outcome and compare to baseline.

Sample calculation:

  • Total cost for April: $3,187
  • Total work items resolved: 5,200
  • Cost per outcome: $3,187 / 5,200 = $0.613

Baseline: $0.55 Actual: $0.613 Delta: +11.5%

Is this acceptable? Depends on your contract. If your contract allows a ±15% tolerance band, this is fine. If it allows ±10%, this is a breach and you should request a credit.

Step 6: Document exceptions and request credits

If you find legitimate errors (the vendor overcharged for retries, or a service outage cost you extra), document them and request a credit.

Sample email to vendor:

"Hi [Vendor], I reviewed the April invoice and found the following discrepancies:

  1. On April 15, your retry rate spiked to 8%, which matches our logs showing your API was degraded (AWS status page shows a 2-hour outage). The extra retry cost was $180. We're requesting a credit for this amount.
  2. On April 22, a batch of 47 tickets cost $2.10 each, vs. baseline of $0.55. These are standard password reset tickets, not complex cases. Token count on these tickets is 3.5x our baseline. We're requesting you re-examine the cost and provide explanation or a credit.

Please respond with either (1) explanation for both items, or (2) credit of $XXXX. Per our contract, we have audit rights and discrepancies should be resolved within 30 days."

Most vendors will push back initially, then grant a partial credit to keep the relationship.

Red flags: What to look for

Flag invoices that have any of the following:

| Red Flag | What It Means | Action | | --- | --- | --- | | Single line item ("AI services: $X") | Vendor is hiding cost breakdown | Request detailed logs; escalate if refused | | Cost per item has high variance (range: $0.30–$1.50) | Vendor might be classifying tickets inconsistently | Ask for classification logic and audit a few tickets | | Error/retry rate >5% consistently | Agent is unstable or integration is slow | Request root cause analysis and a credit for the excess cost | | Monthly cost up >15% from previous month without volume change | Pricing drift or model change | Ask vendor what changed and for written explanation | | Charges for services you didn't use (e.g., "premium logging") | Vendor is charging default features | Push back and request removal from next invoice | | Token count doesn't match your observability logs by >5% | Possible undercounting by you, or overcounting by vendor | Investigate both your logs and vendor logs | | Vendor refuses to provide detailed logs | Vendor is hiding something | Escalate to legal; audit rights exist for this reason |

Build a monthly audit template

Create a spreadsheet with columns for:

  • Expected cost (based on forecast)
  • Actual cost (from vendor invoice)
  • Delta (%)
  • Cost per work item
  • Error rate
  • Retry rate
  • Issues/flags

Fill this out every month when the invoice arrives. Track trends. If cost per work item is drifting upward, flag it early.

The full audit checklist

Every month:

  • [ ] Request and receive detailed cost logs from vendor (within 5 days of invoice)
  • [ ] Compare vendor logs to your observability logs (token count within 5%, request count within 2%)
  • [ ] Calculate actual cost-per-outcome
  • [ ] Compare to baseline (flag if outside tolerance band)
  • [ ] Investigate any anomalies (cost spikes, error rate spikes, unused services)
  • [ ] Document discrepancies and request credits if warranted
  • [ ] Update monthly audit spreadsheet
  • [ ] Review trend over 3 months (is cost per outcome stable, improving, or degrading?)

If you're seeing consistent cost overruns without explanation, renegotiate the contract or switch vendors. You have leverage—the vendor depends on you not auditing them.

For contract audit clauses, see "How to Negotiate AI Vendor Contracts in 2026." For the full vendor evaluation process, see "How to Buy AI: The Executive's Vendor Evaluation Guide."

Want to see this in your stack?

Book a 30-minute walkthrough with a Runrate founder.

Get a Demo

Was this article helpful?