Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks
Summary
Anthropic has launched Claude Opus 4.8, its latest flagship AI language model, which the company describes as a "modest but tangible improvement" over its predecessor and competitors. Opus 4.8 reportedly surpasses OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro across most benchmarks, achieving 69.2 percent on agentic coding (SWE-Bench Pro) and 57.9 percent on multidisciplinary reasoning (Humanity's Last Exam) with tools. A key upgrade is improved honesty, with the model four times less likely to let bugs slip without comment than Opus 4.7. New features include "dynamic workflows" for parallel sub-agents, enabling codebase-wide migrations, and an "effort control" for users to adjust AI processing intensity. API pricing remains \$5 per million input tokens and \$25 per million output tokens, while Fast Mode costs \$10 per million input and \$50 per million output, a third of previous rates. Initial analysis suggests Opus 4.8 could reduce operational costs by requiring 15 percent fewer passes and 35 percent fewer output tokens than Opus 4.7 on real-world tasks.
Key takeaway
For Machine Learning Engineers evaluating new LLMs for complex agentic tasks, you should consider testing Anthropic's Claude Opus 4.8. Its improved benchmark performance, particularly in agentic coding and multidisciplinary reasoning, combined with dynamic workflows for parallel sub-agents, could streamline large-scale operations like codebase migrations. Additionally, the new effort control and potential for lower operational costs compared to Opus 4.7 offer opportunities to optimize resource usage and project budgets.
Key insights
Claude Opus 4.8 improves benchmarks, honesty, and introduces dynamic workflows and user effort controls.
Principles
- AI models should flag uncertainties.
- Parallel sub-agents enhance task execution.
- User control over AI effort optimizes cost/quality.
Method
Dynamic workflows enable models to plan tasks and launch hundreds of parallel sub-agents. Effort control allows users to select processing intensity (high, extra, max) for responses.
In practice
- Adjust effort control for speed or depth.
- Utilize dynamic workflows for large codebase migrations.
- Evaluate Opus 4.8 for potential cost reductions.
Topics
- Claude Opus 4.8
- Anthropic
- Large Language Models
- AI Benchmarks
- Dynamic Workflows
- Agentic AI
- API Pricing
Best for: AI Engineer, NLP Engineer, CTO, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.