What you'll pay for AI agents will be wildly variable and unpredictable

2026-05-05 · Source: News and Advice on the World's Latest Innovations | ZDNET · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

A new study by the University of Michigan and collaborators, including Stanford University, Google's DeepMind, Microsoft, and MIT, reveals that AI agents incur significantly higher and unpredictable token costs compared to simple prompt-based chats. The study, titled "How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks," found agents can consume up to 3,500 times more tokens for a task. Furthermore, token consumption varies wildly between different models and even for the same model on identical tasks, with some runs using twice as many tokens. Agents also consistently underestimate their token needs, particularly for input tokens which dominate costs due to repeated context feeding and cache reads. This unpredictability and lack of correlation between token usage and performance pose significant challenges for cost estimation and enterprise adoption.

Key takeaway

For CTOs and VPs of Engineering evaluating AI agent deployments, recognize that current vendor pricing models do not reflect the true, highly variable, and often excessive operational costs. You must demand greater price transparency and performance guarantees from AI providers to mitigate significant budget overruns and ensure task completion, or risk unstable and costly implementations.

Key insights

AI agents incur vastly higher and unpredictable token costs, primarily driven by input tokens and cache reads, without guaranteeing improved performance.

Principles

Agentic tasks are uniquely expensive.
Scaling token usage does not guarantee higher performance.
Models systematically underestimate token needs.

Method

The study used the OpenHands framework to build agents, testing them on the SWE-Bench coding benchmark, which involves tasks derived from GitHub issues, to analyze token consumption.

In practice

Set hard limits on agentic computer use.
Control prompt size and context window width.
Minimize tool calls by agents to reduce input tokens.

Topics

AI Agent Costs
Token Consumption
Cost Variability
Token Estimation
AI Pricing Transparency

Code references

OpenHands/OpenHands

Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, Director of AI/ML, Consultant

Related on AIssential

Counsel's verdict on this

AIssential's Counsel cites this article in its editorial verdict on the decision it informs:

Pay for the 'agentic' tier upgrade — or wait for proof? — Agentic workflows consume 3,500x more tokens, and vendors are abandoning flat-rate subscriptions for usage-based billing, making unit economics critical as costs surge.

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by News and Advice on the World's Latest Innovations | ZDNET.