Claude Opus 4.8: "a modest but tangible improvement"

2026-05-28 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

Anthropic shipped Claude Opus 4.8 on May 28, 2026, describing it as a "modest but tangible improvement" over its predecessor, with a primary focus on enhanced "honesty." Evaluations indicate Opus 4.8 is four times less likely to overlook code flaws and exhibits the lowest incorrect-rate across benchmarks, largely by abstaining when uncertain rather than answering more questions correctly. Pricing remains \$5/million input and \$25/million output, but "Fast mode" for 4.8 is now \$10/million input and \$50/million output, a significant reduction from the \$30/\$150 for previous models. The model retains a January 2026 knowledge cutoff, a 1,000,000 token context window, and 128,000 max output tokens. New features include mid-conversation system messages for dynamic instruction updates and a reduced prompt cache minimum from 4,096 to 1,024 tokens, improving efficiency for agentic loops.

Key takeaway

For AI Engineers building agentic systems, Claude Opus 4.8 offers tangible improvements in reliability and cost-efficiency. Its enhanced "honesty" and ability to flag uncertainties mean you can trust its outputs more, especially for code generation. Utilize mid-conversation system messages to dynamically steer agents and gain from the lower prompt cache minimum, reducing input costs on long-running tasks. Consider "Fast mode" for applicable use cases to further optimize expenses.

Key insights

Claude Opus 4.8 prioritizes "honesty" and uncertainty flagging, improving reliability by abstaining rather than hallucinating.

Principles

Honesty reduces unsupported claims.
Abstaining improves factual accuracy.
Dynamic prompts enhance agentic loops.

Method

Mid-conversation system messages allow appending updated instructions after a user turn, preserving prompt cache hits and reducing input costs in long-running agentic conversations.

In practice

Utilize mid-conversation system messages.
Benefit from lower prompt cache minimum.
Test "Fast mode" for cost savings.

Topics

Claude Opus 4.8
LLM Honesty
Prompt Engineering
Agentic AI
Cost Optimization
Anthropic

Code references

Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.