Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

· Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

Anthropic has launched Claude Opus 4.8, its latest flagship AI language model, which the company describes as a "modest but tangible improvement" over its predecessor and competitors. Opus 4.8 reportedly surpasses OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro across most benchmarks, achieving 69.2 percent on agentic coding (SWE-Bench Pro) and 57.9 percent on multidisciplinary reasoning (Humanity's Last Exam) with tools. A key upgrade is improved honesty, with the model four times less likely to let bugs slip without comment than Opus 4.7. New features include "dynamic workflows" for parallel sub-agents, enabling codebase-wide migrations, and an "effort control" for users to adjust AI processing intensity. API pricing remains \$5 per million input tokens and \$25 per million output tokens, while Fast Mode costs \$10 per million input and \$50 per million output, a third of previous rates. Initial analysis suggests Opus 4.8 could reduce operational costs by requiring 15 percent fewer passes and 35 percent fewer output tokens than Opus 4.7 on real-world tasks.

Key takeaway

For Machine Learning Engineers evaluating new LLMs for complex agentic tasks, you should consider testing Anthropic's Claude Opus 4.8. Its improved benchmark performance, particularly in agentic coding and multidisciplinary reasoning, combined with dynamic workflows for parallel sub-agents, could streamline large-scale operations like codebase migrations. Additionally, the new effort control and potential for lower operational costs compared to Opus 4.7 offer opportunities to optimize resource usage and project budgets.

Key insights

Claude Opus 4.8 improves benchmarks, honesty, and introduces dynamic workflows and user effort controls.

Principles

Method

Dynamic workflows enable models to plan tasks and launch hundreds of parallel sub-agents. Effort control allows users to select processing intensity (high, extra, max) for responses.

In practice

Topics

Best for: AI Engineer, NLP Engineer, CTO, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.