Claude Sonnet 5 continues Anthropic's pattern of hiding price increases behind unchanged token rates
Summary
Claude Sonnet 5, released July 1, 2026, achieved fifth place in Artificial Analysis's Intelligence Index v4.1 with 53 points, tying GPT-5.5 (high) and surpassing Opus 4.8 on some agent-based tasks. Despite maintaining token prices at \$3 per million input and \$15 per million output, its actual cost per task has nearly doubled from Sonnet 4.6's \$1.20 to \$2.29, making it more expensive than Opus 4.8 (\$1.97). This increase stems from consuming 40 percent more output tokens and running three times as many agent loops in benchmarks like AA-Briefcase and GDPval-AA. While Sonnet 5 shows solid gains on Terminal-Bench v2.1 (9 points), Humanity's Last Exam (10 points), and SciCode (7 points), it scored only 17 percent on the CritPt physics reasoning test, falling short of larger models. Anthropic has a history of such hidden price increases, previously seen with Opus 4.7's tokenizer changes.
Key takeaway
For AI Product Managers evaluating new LLMs, you must look beyond stated token prices. Sonnet 5's higher task costs, despite flat token rates, highlight a critical need for "cost per standardized task" metrics. Prioritize models with transparent pricing and predictable operational expenses. Your team should benchmark actual task completion costs, especially for agentic workflows, to avoid unexpected budget overruns and ensure competitive total cost of ownership against alternatives like Deepseek V4 Pro.
Key insights
Anthropic's Sonnet 5 offers improved performance but significantly higher real-world task costs due to increased token consumption.
Principles
- Token prices alone do not reflect true model operational costs.
- Agentic model behavior can dramatically inflate token usage.
- Performance gains may come with hidden cost escalations.
In practice
- Evaluate LLM costs based on "cost per task" not just token rates.
- Monitor token consumption for agent-based workflows closely.
- Compare total task costs against competitive models like Deepseek V4 Pro.
Topics
- Claude Sonnet 5
- LLM Pricing Models
- Token Consumption
- Agentic AI
- Performance Benchmarking
- Cost Transparency
Best for: CTO, VP of Engineering/Data, MLOps Engineer, Director of AI/ML, AI Product Manager, Consultant
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.