The Economics of Multi-Step Agents (When One Query Becomes 47 API Calls)

7 min read · Updated 2026-05-02

Runrate Framework

The AI Cost Iceberg

Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).

Read the full framework →

A customer asks an AI agent a simple question: "What's the status of my loan application?" The user sees one interaction. Behind the scenes, the agent fetches your borrower database (API call 1), queries the loan origination system (call 2), checks the underwriting queue (call 3), calls the credit bureau API (call 4), cross-references with the compliance screening database (call 5), retrieves the last 30 days of communication logs (call 6), and runs three separate inference steps to synthesize a confident answer and flag any red flags (calls 7–9). One customer query just cost you nine API calls, three LLM inferences, two database lookups, and an external service call. That's not simple transaction cost; that's workflow cost amplification.

This is the hidden scourge of multi-step agents. Every agent that does real work—not just pattern-matching, but orchestration, reasoning, and integration across systems—multiplies its base cost by the number of steps required to confidently answer the question.

Why Multi-Step Agents Are Economically Different

Single-step agents are cheap. A classification agent that reads a support ticket and routes it ("escalate to billing" or "resolve with FAQ #7") might make one inference call and cost $0.001 per ticket. A retrieval agent that looks up a customer's account balance might make one inference call and one database call, total cost $0.002 per query. The math is linear: inference + one action = low cost.

Multi-step agents are exponential. A comprehensive customer service agent in insurance might need to:

  1. Query the policy database (retrieve active policy).
  2. Query the claims system (retrieve claims history).
  3. Query the payments system (retrieve payment status).
  4. Call the underwriting system (check if policy is still under review).
  5. Run a reasoning step to synthesize (is the customer eligible for a specific benefit?).
  6. Run a safety check (does this answer comply with regulations?).
  7. Format and send the response.

Each step requires one or more API calls, database queries, or LLM inference steps. A "simple" customer question now triggers 4–5 external service calls, 2 LLM inferences, and 1 safety check. If your agent handles 100 customer questions per day, that's 400–500 external API calls per day before you even count the inference token cost. If you're paying Anthropic Claude for inference and your vector database provider for embeddings, and your CRM vendor for API access, the unit economics balloon.

The problem compounds when agents make mistakes. If the agent's first attempt to answer a question fails (the API returned an error, the data was stale, the reasoning was uncertain), the agent retries. Now you've paid for the original attempt and the retry, but the customer still hasn't gotten an answer.

The Anatomy of a Realistic Multi-Step Workflow

Let's trace a loan origination agent handling a "what's the status of my application" query. This is a real use case, not a toy example:

  1. Identify the applicant. Vector database embedding lookup to find applicant ID (1 embedding call, 1 DB query).
  2. Fetch application metadata. Loan origination system API call (1 LLM inference to parse the response).
  3. Fetch underwriting status. Query underwriting queue (1 API call).
  4. Fetch credit report. Call external credit bureau (1 API call, often charged per call).
  5. Check compliance status. Query sanctions screening (1 API call).
  6. Fetch recent communications. Query CRM for email/SMS history (1 API call).
  7. Synthesize findings. Run multi-step reasoning ("is the applicant at risk of declining?" "what's the next action required?"). This might be 2–3 separate LLM inference calls if the agent uses chain-of-thought reasoning.
  8. Safety check. Verify the answer doesn't violate regulations (1 inference call, or sometimes a specialized compliance model).
  9. Format response. Structure and send the answer (1 final inference call for natural language generation).

Total: 8–9 API/DB calls + 5–6 LLM inferences per customer query. If the agent is confident and doesn't retry, that's 13–15 billable events per query. If there are retries due to API timeouts or uncertainty, you're looking at 20–30 billable events per query.

At $0.01 per API call (a typical SaaS rate) and $0.0002 per LLM token (assuming Claude at ~100 tokens per inference step), the cost per query might be:

  • 8 API calls × $0.01 = $0.08
  • 6 inferences × 100 tokens × $0.0002 = $0.12
  • Total per query: $0.20

That's not "cheap AI." For a loan origination workflow at 200 applications per month, that's $40/month in agent cost per applicant just for status checks, before you count the initial application processing, fraud review, or decision trees. Multiply across your book of business and multi-step agent cost becomes material.

The Invisible Cost Multipliers

Raw API call count is only the surface. Multi-step workflows hide several cost multipliers:

Retries and backpressure. If your underwriting API times out 2% of the time, the agent retries. Each retry is another API call, another set of LLM inferences, another embedding lookup. An agent with 1 retry per 50 queries now costs 2% more than the base model. An agent with 3 retries per 50 queries (due to flaky integrations or uncertain reasoning) costs 6% more.

Hallucination avoidance. Careful agents use extra verification steps. Instead of asking the loan system "is this application approved," a careful agent might ask "is this application approved" and then cross-check with the compliance database and the underwriting queue. That's 3 API calls instead of 1, but the cost is paid in reduced hallucination. The economics trade off raw speed for accuracy.

Chain-of-thought reasoning. Modern agents use multi-step reasoning to increase accuracy: break the problem into sub-steps, reason through each one, then synthesize. That's 3–4 inference steps instead of 1. The accuracy improvement is real, but so is the cost multiplier (3–4x).

Embedding refreshes. If your agent uses vector search to find relevant documents or past interactions, each query needs embedding. Embedding APIs charge per call. A 1,000-word document costs 2,000 tokens to embed. A high-volume agent running 100 queries per hour is embedding 100,000+ tokens per hour just for search, costing $0.20/hour in embedding fees alone.

Error-recovery loops. When the agent is uncertain, it may run additional verification steps. An agent unsure if the customer is eligible might check 3 different eligibility rules, costing 3x the inference steps of a confident agent. This turns "uncertain requests" into disproportionately expensive requests—the exact opposite of what you want economically.

Modeling Multi-Step Cost

To forecast agent cost accurately, stop thinking about "cost per interaction" and start tracking "cost per workflow step":

  • Base inference cost: N tokens × $0.0001 per 1K tokens
  • API/database calls: M calls × average $0.005 per call (some are free, some cost $0.01)
  • Embedding calls: E embeddings × 1,000 tokens per embedding × $0.0001 per 1K tokens
  • External service calls: S calls × average cost (credit bureaus: $2–5 per call; SMS APIs: $0.01; Stripe: free)
  • Retry multiplier: 1 + (expected retry rate)

For the loan status agent above:

  • Inference cost: 6 inferences × 100 tokens × $0.0001 = $0.06
  • API calls: 7 calls × $0.005 = $0.035
  • Embedding (applicant lookup): 1 × 1,000 tokens × $0.0001 = $0.0001
  • External (credit bureau): 1 × $2 = $2.00
  • Retry multiplier: 1.02 (assume 2% retry rate)
  • Total per query: ($0.06 + $0.035 + $0.0001 + $2.00) × 1.02 ≈ $2.10 per query

The credit bureau call dominates the cost. But if you run 200 loan status checks per month, that's $420/month just in status checks. If applicants ask 4 times over the course of processing, it's $1,680/month in agent cost for status inquiries alone—before counting the cost of the initial application processing, fraud review, or underwriting.

Optimizing Multi-Step Economics

The CFO's job is to ensure the agent engineering team understands the cost of extra steps. A few levers:

  1. Cache results. If you query the same applicant's status twice in one day, cache the result instead of re-querying. This is especially valuable for embedding lookups and external API calls.

  2. Batch operations. Instead of one agent answering one question, use batch inference for non-urgent queries. Process 100 status requests in a single batch at 40% lower cost per inference.

  3. Segment by complexity. Route simple queries (order status, account balance) to single-step agents. Route complex queries (loan eligibility, claim adjudication) to multi-step agents. This prevents cost amplification on easy cases.

  4. Reduce external service calls. Many agents call external APIs (credit bureaus, compliance screeners) out of habit, not necessity. Push back: does this integration reduce error rate by enough to justify the cost?

  5. Use reasoning models sparingly. Chain-of-thought reasoning improves accuracy for complex tasks but costs 3–4x more. Use it only for high-stakes decisions, not for every query.

The economics of multi-step agents are fundamentally different from single-step classifiers. Every step you add costs money. Every retry or verification loop costs more money. The agent that seamlessly orchestrates 12 backend systems looks impressive but may be running 47 API calls per customer interaction—each of which costs money and fails independently.

For a deeper understanding of how to track and attribute these costs to business outcomes, see the AI Cost Iceberg, which breaks down the infrastructure, integrations, and failure costs that multi-step agents hide.

Go deeper with the field guide.

A step-by-step PDF for implementing AI cost attribution.

Download the Guide

Was this article helpful?