What's new in Claude Sonnet 5

2026-06-30 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Intermediate, quick

Summary

Anthropic released Claude Sonnet 5 on June 30, 2026, positioning it with performance near Opus 4.8 but at lower stated prices. The model features a 1 million token context window and 128,000 maximum output tokens, retaining the same tools and platform features as Sonnet 4.6. Notably, sampling parameters like `temperature`, `top_p`, and `top_k` are no longer supported, and "adaptive thinking" is now on by default. While the nominal pricing remains \$3/million input and \$15/million output (with an introductory \$2/\$10 discount until August 31), a new tokenizer effectively increases costs. Tests show the new tokenizer generates approximately 30% more tokens for the same input compared to Sonnet 4.6, translating to a 1.42x price increase for English, 1.33x for Spanish, 1.27x for Python code, and 1.01x for Simplified Mandarin. Its safeguards are similar to Opus 4.7/4.8, making it less capable at cyber tasks than Mythos 5.

Key takeaway

For AI Engineers evaluating new LLM deployments, be aware that Claude Sonnet 5's effective cost is higher than its stated price. You should immediately re-evaluate your token consumption for existing prompts, as the new tokenizer increases token counts by 30% or more for English and Spanish. Factor this into your budget and performance estimates, especially given the removal of `temperature`, `top_p`, and `top_k` parameters, which may require prompt engineering adjustments.

Key insights

Sonnet 5 offers Opus 4.8-like performance at lower nominal prices, but a new tokenizer significantly increases effective token costs.

Principles

Model pricing can be deceptive due to tokenizer changes.
Regulatory compliance influences model release capabilities.
Default settings impact model behavior and user experience.

Method

The article describes a method for evaluating tokenizer efficiency by comparing token counts for identical documents across different model versions using a custom tool.

In practice

Verify effective token costs when new models are released.
Adjust API calls for removed sampling parameters.
Explicitly disable "adaptive thinking" if not desired.

Topics

Claude Sonnet 5
LLM Pricing Models
Tokenization Efficiency
API Parameter Changes
Large Language Models
AI Model Governance

Code references

Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.