Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →Bill shock is the FinOps term for discovering a 40% cost spike on your vendor invoice—too late to do anything about it. Traditional cloud FinOps solved this by monitoring resource consumption (CPU up 2x, flag it). AI agents need the same monitoring, but at the work-item level: when cost per ticket, per claim, or per lead drifts above normal, flag it. Most mid-market companies see anomalies only after the bill arrives. The ones that prevent bill shock catch anomalies within hours using automated detection. Mavvrik made "AI bill shock" a category. Runrate makes it preventable.
The cost anomaly detection framework
An anomaly is a cost per work item that deviates from the baseline by a statistically significant amount. Baseline is usually "the historical median cost for this agent over the last 30-90 days." Deviation is measured as a percentage (e.g., "cost is 30% above baseline").
The framework has three layers:
Layer 1 (Detection): Calculate the baseline and the current deviation automatically. This should run daily or every 6 hours, comparing today's cost per unit against the baseline. No human decision needed yet. Just the math.
Layer 2 (Alert): If deviation exceeds a threshold (e.g., "15% above baseline"), send an alert. The alert goes to the team lead and the CFO. No automatic action. Just notification: "something's off."
Layer 3 (Action): Based on the alert, decide: pause the agent (if the spike is severe), investigate (if it's moderate), or acknowledge and move on (if it's expected, like a known volume spike).
Common cost anomalies and their signatures
Token explosion (retry storm): The agent is making more API calls than usual. Could be a bug (infinite retry loop), a new customer with weird data (the agent can't parse it, keeps retrying), or a model change (new prompt is less confident, asks for clarification more often). Signature: cost per work item is up 40-60%, API call count is up 60-90%, success rate is flat or down.
Vector database explosion: The agent is doing more vector searches than usual, or the vector DB pricing spiked. Signature: cost per work item is up 15-25%, but API cost is flat. Vector DB cost line is spiking. This usually happens when you add a new data source without tuning the retrieval logic.
Cached prompt miss: You're using prompt caching to save money, but a change in the input (new customer, new data format) bypassed the cache. Now you're paying full price for every prompt. Signature: cost per work item up 20-35%, API token cost per call up 40-50%, cache hit rate down 60-80%.
Model change: Engineering switched from GPT-4 to GPT-4 Turbo (cheaper) but the cheaper model kept failing, so the agent now retries more often. Net cost is the same or higher. Signature: cost per API call went down, but success rate went down, and retry rate went up. Cost per work item is flat or up.
Volume spike with sublinear cost scaling: You got a big customer or ran a marketing campaign. Volume is up 50%. Cost is up 35%. This is good news (you have cost economies of scale), but it's still a 35% spike from baseline. Signature: work item volume up 50%, cost up 35%, cost per unit down 30%. This one you want to see.
Observability explosion: You turned on debug logging to troubleshoot an issue and forgot to turn it off. Your Datadog bill tripled. Signature: observability cost spiking, other costs flat. This is usually a one-time spike.
Setting anomaly thresholds
Thresholds are the deviation percentage that triggers an alert. Too loose (30% deviation) and you miss real problems. Too tight (5% deviation) and you have alert fatigue.
Recommended starting points:
- Yellow (investigate): 15-20% above baseline for 1-2 days, or 25%+ for any single day
- Orange (escalate): 25-35% above baseline for 3+ days, or 50%+ for any single day
- Red (pause): 50%+ above baseline for 1+ day, or cost exceeds monthly budget by 20%+ and is trending worse
These should be tuned to your agent. A support agent running at 10M tokens/day has different noise than one running at 100k tokens/day. Use the first 30 days of a new agent to establish a baseline. Don't set tight SLOs until you have 30-90 days of data.
How to automate anomaly detection
Step 1: Calculate daily metrics. For each agent, calculate:
- Cost per work item (average, median, 95th percentile)
- Work item volume
- API cost, infrastructure cost, overhead allocation
- Success rate (% of work items processed successfully vs. escalated or failed)
Do this every 24 hours. Store the results in a time-series database (InfluxDB, Datadog, or your cost aggregation tool).
Step 2: Calculate baseline. For each metric, calculate the 30-day rolling median. This is your baseline. It's more robust than the mean because it's not thrown off by outliers.
Step 3: Calculate deviation. Compare today's metric against the baseline. Deviation = (today - baseline) / baseline * 100%. If today is $5 and baseline is $4, deviation is +25%.
Step 4: Alert. If deviation exceeds your threshold, send a Slack alert to the team lead and the CFO. Include: the metric, the baseline, today's value, the deviation percentage, and a link to the agent's cost dashboard. Example alert: "Support Agent — Cost Anomaly | Cost per ticket is $0.62 (baseline $0.40, +55%). 2,100 tickets processed. Investigate: retry rate up 80%, cache hit rate down, OpenAI API cost up. [View Dashboard]"
Step 5: Reduce noise. Flag anomalies only on sustained deviations (3+ days), not one-day spikes. One day is noise. Three days is a pattern. Also build a schedule of known anomalies: "every Monday cost is 15% higher because of batch processing," or "cost spikes 20% the first week of the month because of payroll reconciliation." Exclude those from alerting.
Investigating an anomaly
When you get an alert, the investigation should follow this path:
-
Is the baseline right? Did you recently add data, change the prompt, or onboard a new customer? The baseline might be outdated. Recalculate and see if the anomaly disappears. If yes, it's not really an anomaly—you just have new normal.
-
Is it a volume effect? Did work item volume increase? If volume is up 30% and cost is up 30%, cost per item is flat—no problem. If volume is flat and cost is up 30%, something else is wrong.
-
Is it an infra cost issue? Drill into the cost breakdown. Is the API cost spiking, or is the vector DB spiking, or is observability spiking? This tells you where to look.
-
Is it a retry storm? Check API call count vs. success count. If you're making 2x API calls but only getting 20% more work items done, you have a retry problem. Common causes: model being less confident, new input type the model hasn't seen, or a downstream service timing out (the agent retries the API call hoping the service recovers).
-
Is it expected? Sometimes cost goes up for good reasons: you launched a new feature, you onboarded a new customer, you made a deliberate trade-off (using a more expensive model for better accuracy). Document it and move on.
-
What's the fix? Depending on the root cause:
- Retry storm: fix the prompt, fix the downstream service, or switch to a model that retries less.
- Cache miss: audit the prompt template and fix the input formatting.
- Observability explosion: turn off debug logging.
- Volume effect with sublinear costs: celebrate, and see if you can scale further.
Building the anomaly detection dashboard
Your dashboard should show:
- Cost per unit over time (last 90 days, with baseline marked as a line). This is the simplest view of anomalies.
- Current deviation (is today's cost above or below baseline, by how much?).
- Active alerts (any agents breaching thresholds right now).
- Anomaly history (what anomalies did we see this month, how were they resolved?).
- Cost breakdown (which cost components are spiking: API, infra, overhead?).
Update this in real-time or daily. Share it with your finance and engineering teams. Review it in weekly standup meetings.
Why anomaly detection prevents bill shock
Without anomaly detection, you see the spike in your monthly invoice. With anomaly detection, you see it Tuesday morning, before the agent runs for 25 more days. You can stop it, fix it, or accept it—your choice. That's the difference between responding to bill shock and preventing it. Anomaly detection moves you from reactive to proactive cost management.
Ready to automate anomaly detection for your AI agents? Book a demo to see how Runrate catches cost anomalies in real time.
Want to see this in your stack?
Book a 30-minute walkthrough with a Runrate founder.
Was this article helpful?