Runrate Framework
The AI Cost Iceberg
Visible API spend (10%) vs hidden inference, storage, observability, retries, human review (90%).
Read the full framework →When a CFO or finance team starts looking for AI cost attribution solutions, they often encounter tools like Helicone, Langfuse, or LangSmith and think: "This platform logs LLM API calls and calculates cost. Isn't this cost attribution?" The answer is: not quite. These are observability tools, not cost attribution tools. They solve different problems.
The distinction matters because the wrong tool leads to false confidence: you think you have cost attribution, but you actually have only partial visibility into the cost iceberg.
What Observability Tools Actually Do
Helicone, Langfuse, LangSmith, and similar platforms are built for developer observability. They log LLM API calls, track latency, monitor quality, debug errors, and calculate token consumption. Their primary customer is an engineer or product manager who wants to know: "Is my LLM pipeline working correctly? Why did that request fail? How many tokens did it consume?"
They excel at answering:
- "What's the latency of each API call?"
- "How many input vs. output tokens did each call consume?"
- "Which prompts have the highest cost per token?"
- "Did the LLM's response match our quality criteria?"
- "Which requests failed and why?"
They log every API call, every token, every latency measurement. The cost calculation is straightforward: multiply tokens by the provider's pricing table.
What Cost Attribution Tools Actually Do
Cost attribution tools (like Runrate) answer a different set of questions. Their primary customer is a CFO or finance team who wants to know: "What did this resolved ticket actually cost? Is this agent profitable? Which customer is expensive to serve?"
They must answer:
- "What's the total cost per work item (including all hidden costs)?"
- "Which business unit, team, or customer is consuming the most AI spend?"
- "Is this agent saving us money vs. the human baseline?"
- "Can we optimize cost per outcome without sacrificing quality?"
- "What's the ROI of this AI investment?"
These questions require work-item-level granularity, not API-call-level granularity. You need to know the cost of the entire journey (from ticket arrival to resolution), not just the cost of one API call.
The Critical Differences
Granularity: Observability tools measure cost at the API-call level. Attribution tools measure cost at the work-item level. If a support ticket goes through 5 API calls, embeddings, human review, and third-party integrations, observability tools show the cost of each API call ($0.03, $0.02, $0.05, etc.). Attribution tools aggregate all of those into a single line: "This ticket cost $0.47 total."
Hidden costs: Observability tools track API tokens. They don't include embeddings, vector database storage, human review time, third-party integrations, retries, observability overhead itself, or gateway costs. The AI Cost Iceberg shows that these hidden costs are about 90% of the true cost. Observability tools are blind to them.
Business outcome context: Observability tools don't know which ticket, claim, or application the API call is serving. They log "Claude API call consumed 1,500 tokens at 2:15 PM" but they don't log "that API call was part of support ticket SUP-12345 for customer Acme, which was resolved." Attribution tools require this context to connect cost to outcome.
Ownership: Observability tools are owned by engineering teams (or DevOps teams, or MLOps teams). They're part of the engineering stack. Attribution tools are owned by finance teams. The output of an observability tool is a Slack alert ("API latency is high!") sent to engineers. The output of an attribution tool is a board report ("Cost per ticket is up 8% this month, driving 3% margin erosion") sent to the CFO.
An Example: The Difference
Observability view of a support ticket:
Timestamp: 2025-05-02T10:15:32Z
API Call 1: Claude 3.5 Sonnet
Input: 1,200 tokens
Output: 450 tokens
Cost: $0.0256
API Call 2: Claude 3.5 Sonnet
Input: 800 tokens
Output: 200 tokens
Cost: $0.0160
Embedding lookup: 5,000 tokens
Cost: $0.0001
Langfuse logging: (overhead)
Cost: $0.0005
Total observed cost: $0.0422
Attribution view of the same support ticket:
Ticket ID: SUP-12345
Customer: Acme Corp
Work item: Support ticket, resolved
Cost stack:
API (Claude): $0.0416
Embedding lookups: $0.08
Vector storage (allocated): $0.02
Observability (allocated): $0.01
Third-party integrations: $0.03
Human review (5 min): $0.08
Infrastructure (allocated): $0.12
Total attribution cost: $0.375
The observability view shows the API cost was $0.0422. The attribution view shows the true cost of the ticket was $0.375—nearly 9x higher. Why? Because the observability tool is blind to embeddings, storage, human review, integrations, and infrastructure.
Why You Need Both (But Not Just Observability)
A strong engineering organization uses observability tools to monitor the health and performance of LLM systems. An engineer at Helicone/Langfuse sees a spike in API latency and investigates. That's valuable.
But observability tools alone will not tell you whether your AI is profitable or whether you should scale it to more use cases. For that, you need cost attribution.
Many companies try to "repurpose" observability tools for cost attribution. They take Langfuse or Helicone, add some instrumentation, build a dashboard, and call it "cost attribution." But they're only seeing 10% of the cost—the API cost—and missing the 90% (human review, embeddings, infrastructure, integrations). They think the ticket costs $0.04 to resolve when it actually costs $0.40.
This is dangerous because it leads to wrong decisions. A CFO who sees $0.04 per ticket thinks: "Wow, AI is cheap, let's deploy it everywhere." But when the full cost is $0.40, and the customer value is only $0.30, the agent is destroying margin and the CFO makes a bad investment decision.
The Integration Opportunity
While observability and attribution tools have different jobs, they do need to talk to each other. An ideal architecture:
-
Observability layer (Langfuse, Helicone, custom CloudWatch implementation) captures every API call, every token, every latency measurement. This is the raw data.
-
Instrumentation layer (logging, tracing) adds context to those API calls: which work item, which agent, which customer. This turns raw data into actionable data.
-
Attribution layer (Runrate) aggregates the instrumented data into work-item-level costs and business outcomes.
-
Operational layer (your dashboards, BI tools, finance systems) consumes the attribution data and drives decisions.
Companies that get this right don't need to choose between observability and attribution. They use observability for engineering (how fast are the API calls?) and attribution for finance (how much do they cost?).
Positions to Avoid
"Observability tools can do cost attribution if you engineer them right." No. There's a fundamental mismatch between what observability tools measure (API calls) and what cost attribution requires (work items, hidden costs, business outcomes). You can add instrumentation on top, but you're still missing the hidden costs and the business outcome context. You're optimizing the wrong metric.
"You don't need a separate cost attribution tool; just use your cloud cost tool (CloudZero, Apptio)." Cloud cost tools measure infrastructure (compute, storage, data transfer). They can't measure work-item-level AI cost because that requires application-level instrumentation. They're orthogonal problems. Observability tools measure API calls. Cloud tools measure infrastructure. Neither measures work items.
"Roll your own cost attribution using SQL queries against your observability logs." You can try. You'll get stuck when you try to measure the hidden costs (human review time, storage, integrations, infrastructure) because observability tools don't log those. You'll end up building exactly the kind of tool Runrate is—a purpose-built cost attribution layer.
The right approach: use observability tools for engineering observability (monitoring, debugging, alert). Use cloud cost tools for infrastructure cost tracking. Use attribution tools for financial visibility and cost optimization. They're complementary, not substitutes.
For more on how to build a complete cost attribution stack, return to the pillar article on AI cost attribution.
Want to see this in your stack?
Book a 30-minute walkthrough with a Runrate founder.
Was this article helpful?