Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →AI agents come in four flavors, each with radically different cost profiles. The architecture you choose determines not just the visible API cost but the hidden cost burden buried in retries, tool calls, and human review time.
Single-Step Agents: Cheap and Limited
A single-step agent takes one input, runs one model inference, and returns one output. No loops, no retries, no thinking steps. This is a chatbot, a classification engine, or a simple routing system.
Cost profile: $0.01–$0.05 per outcome.
Example: An email classifier that reads an incoming support email, decides whether it's a billing issue or a technical issue, and routes it to the right queue. The model sees the email once, makes a decision, and stops.
The cost is low because the process is linear: one API call, minimal integrations, no failure-retry loops. But single-step agents can't handle complexity. They can't research information, integrate multiple systems, or reason through ambiguity. They're useful for high-volume, low-complexity decisions only.
Multi-Step Agents: Where Complexity Costs Explode
A multi-step agent takes one request and executes multiple actions to satisfy it. It might fetch data from a database, call an API, transform the response, check a rule, and then take an action. Each step is a decision point, and each decision point is a model inference.
Cost profile: $0.20–$1.50 per outcome, depending on the number of steps and failure rates.
Example: A claims adjudication agent that reads a claim, fetches the patient's history from the EHR, checks coverage rules against the policy database, requests medical necessity review from an external service, and then decides to approve, deny, or escalate. That's six decision points, five API integrations, and three databases.
Multi-step agents are expensive because each step is a potential failure point. If the EHR API times out, the agent retries. If the coverage rules engine returns an ambiguous result, the agent loops. A 5% failure rate per step, compounded across six steps, can turn a $0.30 base cost into $0.50 by the time retries are done. Add integrations and you're at $0.80–$1.20.
The iceberg effect is severe here: the visible API cost (maybe $0.25 per claim) hides the integration costs ($0.30–$0.45) and human review time ($0.50–$1.50, depending on domain).
Orchestrators: Expensive and Powerful
An orchestrator is a meta-agent that manages other agents. It decides which specialized agent to call, chains multiple agents together, and decides when to hand off to a human.
Cost profile: $0.50–$2.50+ per outcome.
Example: A loan origination orchestrator that:
- Starts with an AI underwriter (agent 1) who reads the application and extracts financial data
- Passes that to an AI compliance checker (agent 2) who screens for regulatory issues
- Calls an AI fraud detector (agent 3)
- If any agent flags a risk, escalates to human underwriting
The orchestrator itself is a model call. Each sub-agent is another model call. Each handoff is logged and stored. If any sub-agent fails, the orchestrator decides whether to retry that agent or escalate. The cost compounds because failures in sub-agents trigger orchestrator retries.
Orchestrators are expensive because they're meta-work: the orchestrator itself consumes tokens and compute just to decide which agent to use. But they're powerful because they allow specialization. One agent is optimized for compliance, one for fraud, one for underwriting. Each can be smaller, cheaper, and more accurate than a single monolithic agent.
The tradeoff: 3–4x higher per-outcome cost, but higher accuracy and better auditability. For high-stakes decisions, orchestrators are often worth the premium.
Human-in-the-Loop Agents: The Hidden Cost Multiplier
A human-in-the-loop (HITL) agent runs some decisions automatically but escalates judgment calls to a human. The human reviews the agent's recommendation and either approves it, modifies it, or overrides it.
Cost profile: $0.50–$3.00+ per outcome, depending on escalation rate and human time.
Example: An insurance claims agent that:
- Automatically approves straightforward claims (clear diagnosis, standard treatment, in-network provider)
- Escalates edge cases (unusual combinations, high cost, network status unclear) to a claims adjudicator for review
If 70% of claims auto-approve and 30% escalate, the per-claim cost is:
- 70% × $0.10 (API cost for auto-approve) = $0.07
- 30% × ($0.15 API + $1.50 human review time) = $0.50
- Total: $0.57 per claim
The human review cost dominates. A claims adjudicator taking 3 minutes per escalated claim at $30/hour adds $1.50 per escalation. Multiply by 30% escalation rate and you're paying as much in human time as in infrastructure.
Human-in-the-loop is essential in regulated domains (healthcare, insurance, legal, financial services) but it's the most expensive architecture because humans are slow and costly. The goal is to push the escalation rate down as low as possible while maintaining accuracy.
Comparing the Four Architectures
Single-step agents are cheap but dumb. Multi-step agents are expensive and powerful. Orchestrators are very expensive but highly specialized. Human-in-the-loop agents can be cheaper or more expensive depending on escalation rates.
The cost tradeoff is real: you can either invest in a smarter, more capable agent (higher API cost, lower human cost) or a simpler agent with more human review (lower API cost, higher human cost). Most finance teams don't have a clear framework for choosing.
The rule of thumb: if human review time costs more than $0.50 per outcome, upgrade the agent. If the agent is error-prone enough that human review catches important mistakes, keep the human in the loop. Most teams find the sweet spot is a multi-step agent with HITL for 20–40% of decisions.
Failure Rates Multiply Cost
Here's a critical hidden variable: agent failure rates. If your agent has a 5% failure rate—meaning 5% of requests require a retry or escalation—your per-outcome cost is 5% higher than you budgeted.
But compound failure rates across multi-step agents, and the multiplier is severe. A six-step agent with a 2% failure rate per step has a cumulative failure rate of 12% (1 - (0.98^6) = 0.12). That 12% triggers retries, which double the cost for failures. Your per-outcome cost jumps by 12%.
Orchestrators make this worse because each sub-agent can fail independently. A three-sub-agent orchestrator with 2% failure rates per sub-agent has a 6% orchestrator failure rate, then each sub-agent can fail, compounding the problem.
Cutting failure rates by 1% across all your agents is often worth 10x more investment in prompt tuning or model upgrades.
Vertical Variation: Healthcare vs. Finance vs. Customer Service
The same agent architecture costs different amounts in different verticals.
A customer service agent that resolves billing inquiries (low stakes, high volume, few integrations) runs at $0.20–$0.50 per ticket. A healthcare claims agent (high stakes, moderate volume, many integrations, mandatory human review) runs at $0.80–$2.50 per claim. A legal contract review agent (highest stakes, lowest volume, most integrations, lengthy human review) runs at $5–$15 per document.
The difference isn't architecture; it's regulatory burden and integration complexity. Healthcare and legal require more human review, more logging, more audit trails. That drives the hidden cost.
What to Do Next
When evaluating an AI agent vendor or building one internally, classify your use case by agent type first, then estimate cost per architecture. A single-step agent is fast and cheap but limited. A multi-step agent with partial HITL is the sweet spot for most enterprises. An orchestrator is overkill unless you have complex, multi-stage decision logic.
For more on how to measure and allocate these costs across your business units, see the pillar article on AI agent cost.
Go deeper with the field guide.
A step-by-step PDF for implementing AI cost attribution.
Was this article helpful?