The LLM Gamble
Summary
The use of Large Language Models (LLMs) often mirrors a slot machine experience, characterized by unpredictable outputs and a "dopamine hit" when successful, despite frequent failures. This "nondeterminism" extends to the financial costs, as users pay per token for both input prompts and LLM-generated responses. For instance, Anthropic's Opus 4.6 charges $5 per million input tokens and $25 per million output tokens, while OpenAI's GPT 5.4 costs $2.50 and $15 respectively. A key issue is that users have limited control over output token length, which is often five times more expensive than input tokens, meaning they pay for responses regardless of utility. While subscriptions offer a flat rate, their usage limits are often opaque, leading to unexpected cut-offs. This pay-per-unpredictable-outcome model poses a significant challenge for the generative AI industry's long-term business sustainability.
Key takeaway
For CTOs and VPs of Engineering evaluating LLM integration, recognize that current pay-per-token models, even with subscriptions, introduce significant cost unpredictability and potential for paying for unusable outputs. Your teams should prioritize robust cost monitoring and explore strategies to constrain output token generation, as the "slot machine" nature of LLM billing could lead to unexpected budget overruns and diminished ROI, particularly with agentic AI applications.
Key insights
LLM usage costs are unpredictable, akin to a slot machine, challenging sustainable business models.
Principles
- Nondeterminism impacts LLM utility and cost.
- Output tokens are significantly more expensive than input tokens.
In practice
- Prompt engineering can reduce input token costs.
- Agentic AI increases prompt complexity and cost unpredictability.
Topics
- LLM Nondeterminism
- Token-based Pricing
- Generative AI Costs
- Subscription Models
- AI Business Sustainability
Best for: CTO, VP of Engineering/Data, AI Architect, Director of AI/ML, AI Product Manager, Investor
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.