Runrate Framework
AI Workforce P&L
Treat AI agents like employees: cost structure, productivity target, and retirement trigger per agent.
Read the full framework →The CFO's hardest question isn't "how do we lower AI cost?" It's "how do we let teams experiment with AI without losing control of the budget?" Traditional budgeting (annual allocation to each team, live or die by that number) kills innovation. Teams either hoard budget or spend it all in Q1. Outcome-based budgeting (cost per unit of work, with approval workflows and SLOs) creates the space for experimentation while maintaining guardrails. This framework gets teams the resources to innovate and the CFO the control to protect margins.
The innovation trap: Why traditional budgets fail for AI
You allocate $120,000/year to product for AI experiments. The product team runs three experiments in Q1, uses $95,000, and has $25,000 left for the rest of the year. They slow down. Or they run experiments faster to burn through the budget before year-end. Or they build something that works in Q2 but run out of budget to scale it in Q3. None of these are optimal for business.
AI doesn't map to traditional annual budgets because the outcomes are unknown. A prompt-tuning experiment might cost $500/month and return $200k/month in margin (a sales team SDR that closes more deals). Or it might cost $500/month and return zero (it doesn't work). Traditional budgeting assumes cost and benefit are correlated and predictable. AI benefits are lumpy and binary: either the agent works at acceptable cost, or it doesn't. Budget discipline shouldn't mean killing the experiment that works. But it should mean not keeping the one that doesn't.
The guardrails framework: Three tiers
Set up a three-tier approval framework based on estimated monthly cost impact.
Tier 1: Small experiments ($0-$10,000/month). Auto-approved. Any engineer or product manager can spin up an agent, fine-tune a model, or run a small pilot. No CFO approval needed. The bar is: "document the experiment (what are we testing?), estimate the cost (when do we expect to hit $10k?), and commit to a decision point (when will we know if it works?)." That's it. These are learning expenses. They should be plentiful. If your org is only running Tier 1 experiments, you're not taking enough risk.
Tier 2: Medium experiments ($10,000-$50,000/month). Requires product/engineering manager approval + finance sign-off. The experiment needs a hypothesis, a success metric, a cost target, and a timeline. Example: "We're testing a claims-classification agent. Hypothesis: it can pre-filter 20% of simple claims at $2 per claim, freeing up manual reviewers for complex work. Cost estimate: $35k/month at current volume. Success metric: achieves 20% filter rate at <= $2.50/claim cost. Timeline: 8-week pilot. Decision point: week 8." Finance approves the cost allocation and the timeline. If it doesn't hit the metric by week 8, you kill it.
Tier 3: Large initiatives ($50,000+/month). Requires CFO approval. These are bets: a new product line, a major feature rollout, a vendor partnership. They go to the board budget process. They have quarterly targets and are tracked against KPIs. Example: "We're deploying an AI contact-center agent for customer escalation. Estimated cost: $80k/month. Expected impact: reduce escalation rate by 40%, save 2 FTEs, net margin of $120k/month." This is part of your operating plan, not a side experiment.
Cost per unit of work as the guardrail
Once an experiment is approved, you need a target. That's your cost per unit of work SLO. Example: "Our support escalation agent targets $0.40 per ticket resolved, with a range of $0.34-$0.46 (±15%)."
Every week, track actual cost per unit. If you hit the target, green. If you miss, yellow (flag it for investigation). If you miss by 20%+, red (pause the agent and debug). This is margin discipline, not headcount discipline. You're saying: "this agent has a business model, and it has to hit that model."
What makes this work is transparency. Your team sees the cost target. They see actuals. They see the gap. They know what to optimize (and it's usually not tokens—it's model selection, work-item routing, or human-review overhead).
Escalation policy: What happens when an agent runs hot
Define what triggers escalation and what the escalation path is. Example:
Yellow alert: Agent cost per unit exceeds SLO by 10-15%. Finance investigates. Usually the answer is: "we had a retry storm Tuesday," or "we onboarded a new customer with a weird document type." Acknowledge, document, move on.
Red alert: Agent cost per unit exceeds SLO by 20%+ for two consecutive weeks. Escalate to the product lead and the CFO. Diagnostics required: Is this a prompt problem? A model problem? A task definition problem? Is the baseline SLO itself wrong (do we need to retarget it)? This should trigger a 48-hour root-cause analysis and a decision: tune and continue, or pause and redesign.
Critical alert: Agent cost per unit exceeds SLO by 50%+ or predicted monthly cost exceeds tier 2 limit. Immediate pause. The agent doesn't run until you fix it. This protects the broader team's budget.
This escalation policy should be documented, automated (your cost aggregation tool should flag SLO violations automatically), and reviewed monthly. It's not about blame. It's about fast feedback and course correction.
Budget reallocation: The other side of the guardrail
If an experiment works and hits its cost per unit target, you have margin to reinvest. Real-world example:
You run a lead-qualification AI for sales at a budget of $25k/month. The target was cost per qualified lead of $12. It hits $11. You save $1 per lead. On 2,000 leads per month, that's $2,000/month margin. Do you pocket it? Usually not. You reinvest: "we saved $2k this month. Let's use it to test a second AI, for SDR follow-up." Healthy organizations redeploy savings into adjacent opportunities.
CFOs should actively manage this reallocation. Your Tier 2 and Tier 3 budgets should have a "reallocation pool" where savings from successful experiments go. This drives innovation culture: teams know that hitting their targets unlocks budget for the next bet.
Headcount considerations
If an AI agent is replacing headcount, the guardrail should include the headcount offset. Example: "Our claims processor costs $200k/month in AI spend. It replaces 2.5 FTEs, which would have cost $400k/year ($33.3k/month). Net monthly savings: $200k - $33.3k = $166.7k." This is the margin story. The AI spend is real, but so are the savings.
The tricky part: what happens to those 2.5 FTEs? If you keep them on the payroll, you haven't saved the cost, so the AI spend looks more expensive. If you redeploy them, you need to account for retraining, productivity ramp-up, and the new role's cost. If you right-size the team, you save the cost but face retention and morale issues.
The guardrail here is transparency. Make it explicit: "the AI agent cost is $200k. The headcount savings target is $X. The actual realization is $Y. The gap is $Y-X, and here's why." This keeps the CFO honest and the CFO keeps the business honest.
Three rules for guardrails that work
Rule 1: The guardrail has to be simple. If your budget rule is "Tier 1 is <$10k/month, Tier 2 is $10-50k/month with product approval, Tier 3 is $50k+ with CFO approval," everyone understands it. If it's 27 rules with exceptions and waiver processes, teams will circumvent it.
Rule 2: The guardrail has to be fair. If you approve big bets from one team and deny them from another, you kill culture. The bar should be the same. Different teams may have different risk profiles (product can take more risk than finance operations), but the approval process should be transparent and consistent.
Rule 3: The guardrail has to allow failure. Tier 1 experiments should have a "don't work" death rate of 30-40%. If all your small experiments succeed, you're not taking enough risk. If all fail, you're not learning. Guardrails exist to prevent catastrophic failure, not to prevent any failure.
Explore the full FinOps for AI framework in the pillar article.
Go deeper with the field guide.
A step-by-step PDF for implementing AI cost attribution.
Was this article helpful?