The New CFO Playbook for AI Labor

7 min read · Updated 2026-05-02

Runrate Framework

AI Workforce P&L

Treat AI agents like employees: cost structure, productivity target, and retirement trigger per agent.

Read the full framework →

Runrate Framework

5-Stage AI Cost Maturity Curve

From Invisible → Tracked → Allocated → Optimized → Governed — where does your org sit?

Read the full framework →

Most CFOs inherited their playbook for managing IT spend from decades of legacy software buying. That playbook breaks down the moment you deploy AI agents at scale. Here's the five-step playbook that works for the new agentic enterprise.

Step 1: Establish visibility on AI labor cost by work item

Before you can manage AI cost, you have to see it.

The mistake is thinking visibility means "check your OpenAI bill." That tells you the API spend ($85,521/month, up 36% YoY according to CloudZero). It doesn't tell you what that spend is actually buying you.

Real visibility means: I can tell you the fully-loaded cost of every work item my AI agents touched this month. Support ticket #42521? Cost: $1.37 (includes Claude API, human review, vector search, retry overhead). Insurance claim #78203? Cost: $2.14. Loan application #5401? Cost: $0.98.

This requires one piece of infrastructure: a cost attribution system that ingests API logs and matches them to business events (tickets, claims, applications, trades, underwriting decisions). Most CFOs try to build this in-house with SQL queries against their API gateway logs. After six months and $180K in engineering time, they give up because the complexity is massive: associating logs to work items, tracking human review time, allocating infrastructure overhead.

The faster path is a purpose-built tool. Once you have visibility at the work-item level, two things happen:

  1. You can calculate the true cost per outcome for each agent (e.g., $1.27 per ticket for your main CSR agent).
  2. You can allocate that cost to the P&L line that benefited from it (Customer Success department, Claims Operations, Lending).

Without this step, everything that follows is guessing.

Step 2: Set cost-per-outcome targets and track them monthly

Once you know what an agent costs per work item, the next step is obvious: decide if it's good enough.

If your CSR agent costs $1.42 per ticket and you're paying humans $4.80 per ticket, the 70% cost reduction is worth celebrating. But is $1.42 the right target going forward? Or should it be $1.15 after you optimize prompts?

Set explicit targets using three inputs:

First: benchmark against peers. Klarna runs customer service agents at $0.19 per ticket. Intercom Fin is at $0.99. Sierra is at $1.50. Your baseline should depend on your complexity. If you're a SaaS company with routine support, target $0.80–$1.20. If you're processing complex insurance claims, target $1.50–$2.20. If you're running highly regulated financial services work, budget $2.50–$4.00 per unit.

Second: align with your cost-of-human-labor baseline. If replacing a human costs $5.83 per ticket and you're running agents at $1.42, you're winning. The ROI is clear. If your agent cost is trending toward $5.50, something's broken—you're over-reviewing, or the model's accuracy is degrading, or you're paying too much for infrastructure.

Third: build in margin. Don't target the theoretical minimum. Build in 20–30% margin for underestimated infrastructure cost, human review overages, and unexpected model API price increases.

Once you have targets, track them every month. Most of Runrate's customers review cost per outcome monthly the way they review SG&A or gross margin. It becomes a standard KPI. If an agent drifts above target for two consecutive months, the operations team investigates: Did the model accuracy slip? Did the number of edge cases requiring human review go up? Is the prompt getting stale? What changed?

Step 3: Build the chargeback model for AI labor cost

Now that you're tracking cost per outcome, attribute it to the business unit or customer that benefited.

This is where the payroll analogy becomes real. You don't pay payroll from one corporate bucket and hope it works out. You charge salaries back to projects and business units. The VP of Engineering's team absorbs the cost of her engineers. The Sales VP absorbs the cost of his SDRs. That way, every leader has a clear cost model for the work they're doing.

The same applies to AI labor.

Your customer service team runs 4 AI agents handling 120,000 tickets per month at an average cost of $1.50 per ticket = $180,000/month in agent labor cost. That cost should show up on the Customer Service P&L, not in "miscellaneous cloud spend."

Your Underwriting team runs 2 AI agents processing 8,000 claims per month at $2.30 per claim = $18,400/month in agent labor cost. That's a line item on the Underwriting P&L.

Your Lending team runs 1 agent originating 2,000 applications per month at $1.80 per application = $3,600/month in agent labor cost. That's a line item on the Lending P&L.

Total: $202,000/month in AI labor cost, fully attributable, fully governed, fully auditable.

The benefit of this model is profound. It creates accountability. If the Customer Service VP watches her AI agent cost climb from $1.45 to $1.72 per ticket over two months, she's incentivized to dig into why. Is she being more conservative and requesting more human review? Is the model degrading? Should we migrate to a newer version? Without the chargeback, that cost is invisible, and no one cares.

Step 4: Implement SLOs and automated anomaly detection

Once you have visible, attributed, tracked AI labor cost, the next step is governance at scale.

Set SLOs (service-level objectives) for each agent. For your CSR agent: "Cost per ticket will not exceed $1.60, accuracy will remain above 92%, and human review rate will not exceed 15%." For your claims agent: "Cost per claim stays between $2.00 and $2.50, and accuracy must remain above 94%."

Then implement automated anomaly detection. If the cost per ticket is trending above 5% of target, an alert fires. If accuracy drops below 90%, the agent is flagged for retraining. If human review rate balloons from 12% to 22%, something's wrong.

Most mature organizations (stage 4 or 5 on the AI Cost Maturity Curve) implement this in their BI tool or in Slack. Every morning, the ops team gets a report: "CSR agent drifted 8% above cost target yesterday; Underwriting agent accuracy slipped 2 points. Review required."

This is the difference between management and governance. Management is reactive ("why was our bill so high?"). Governance is proactive ("here's the early warning; address it before it becomes a problem").

Step 5: Plan for model transitions and agent retirement

This is the part most CFOs miss entirely, and it's critical.

Every AI model becomes obsolete or dramatically improves. When OpenAI dropped GPT-4 Turbo in late 2023, teams running GPT-4 had to decide: Do we stay on the old model for stability, or migrate to the new one? When Claude 3 Opus shipped and Claude 2 was sunset, teams running Claude 2 agents had to migrate. When GPT-4o became 50% cheaper than GPT-4 Turbo, cost-per-outcome targets changed overnight.

These aren't engineering decisions. They're CFO decisions. The question is: "If we migrate this agent from Claude 3 Sonnet ($3 per million input tokens) to Claude 3.5 Sonnet ($1.50 per million input tokens), we cut cost by 50%, but we pay $40,000 in migration and testing cost. Payback period is two months. Do we do it?"

That's a CFO-level capital allocation decision.

Build a quarterly model-transition calendar. Look at what agents you're running, on which models, and flag when a new model version is available that could meaningfully change cost or accuracy. Estimate migration cost. Calculate payback period. Prioritize which agents to migrate first. Build it into your capex or opex budget.

Similarly, plan for agent retirement. When do you sun-set the older version of an agent? A team might run parallel agents for a quarter (old and new), then shut down the old one once the new one's stability is proven. That's a planned expense, not a surprise.

Bringing it together: the AI labor budget

Once you've run through all five steps, your AI labor budget looks like this:

Monthly P&L impact by business unit:

| P&L Line | Agents | Avg cost per unit | Volume | Monthly cost | |----------|--------|------------------|--------|--------------| | Customer Service | 4 | $1.50/ticket | 120,000 | $180,000 | | Claims Underwriting | 2 | $2.30/claim | 8,000 | $18,400 | | Loan Origination | 1 | $1.80/app | 2,000 | $3,600 | | Total AI labor | 7 | | 130,000 | $202,000 |

Annual budget drivers:

  • Agent labor cost (variable): $2,424,000
  • Model migrations and testing: $120,000
  • Infrastructure, observability, and vector search: $96,000
  • Prompt engineering and retraining: $84,000
  • Total annual AI labor budget: $2,724,000

This is now a budgetable, governance-able, auditable line item. It's not "AI spend is rising 36% YoY and we don't know why." It's "we're deploying 7 agents handling 130K work items per month, at a blended cost of $1.55 per unit, and we're tracking cost and accuracy against explicit SLOs."

What to do next

Start with one agent. Calculate its fully-loaded cost per outcome for the last month. Set a target cost per outcome based on the peer benchmarks above. Implement a monthly tracking mechanism (even if it's a spreadsheet). Once you have that baseline, the next four steps follow naturally.

When you're ready to see what work-item-level AI cost attribution looks like in your stack, talk to Runrate — 15-minute demo.

Want to see this in your stack?

Book a 30-minute walkthrough with a Runrate founder.

Get a Demo

Was this article helpful?