What's new in Claude Sonnet 5
Summary
Anthropic released Claude Sonnet 5 on June 30, 2026, positioning it with performance near Opus 4.8 but at lower stated prices. The model features a 1 million token context window and 128,000 maximum output tokens, retaining the same tools and platform features as Sonnet 4.6. Notably, sampling parameters like `temperature`, `top_p`, and `top_k` are no longer supported, and "adaptive thinking" is now on by default. While the nominal pricing remains \$3/million input and \$15/million output (with an introductory \$2/\$10 discount until August 31), a new tokenizer effectively increases costs. Tests show the new tokenizer generates approximately 30% more tokens for the same input compared to Sonnet 4.6, translating to a 1.42x price increase for English, 1.33x for Spanish, 1.27x for Python code, and 1.01x for Simplified Mandarin. Its safeguards are similar to Opus 4.7/4.8, making it less capable at cyber tasks than Mythos 5.
Key takeaway
For AI Engineers evaluating new LLM deployments, be aware that Claude Sonnet 5's effective cost is higher than its stated price. You should immediately re-evaluate your token consumption for existing prompts, as the new tokenizer increases token counts by 30% or more for English and Spanish. Factor this into your budget and performance estimates, especially given the removal of `temperature`, `top_p`, and `top_k` parameters, which may require prompt engineering adjustments.
Key insights
Sonnet 5 offers Opus 4.8-like performance at lower nominal prices, but a new tokenizer significantly increases effective token costs.
Principles
- Model pricing can be deceptive due to tokenizer changes.
- Regulatory compliance influences model release capabilities.
- Default settings impact model behavior and user experience.
Method
The article describes a method for evaluating tokenizer efficiency by comparing token counts for identical documents across different model versions using a custom tool.
In practice
- Verify effective token costs when new models are released.
- Adjust API calls for removed sampling parameters.
- Explicitly disable "adaptive thinking" if not desired.
Topics
- Claude Sonnet 5
- LLM Pricing Models
- Tokenization Efficiency
- API Parameter Changes
- Large Language Models
- AI Model Governance
Code references
Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.