Claude Opus 4.8 is here. Is it as good as they say?

2026-05-27 · Source: Lenny's Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

Anthropic has released Opus 4.8, its latest coding model, claiming significant performance improvements for agents. Benchmarks show Opus 4.8 achieving 69.2% on SwiBench Pro, nearly five points higher than Opus 4.7, almost 10 points higher than GPT 5.5, and 15 points higher than Gemini 3.1. The model is priced at \$5 per input million tokens and \$25 per output million tokens. Early testing indicates Opus 4.8 excels in one-shot Greenfield coding prototypes, delivering functional code and following architectural specifications. However, it struggles with the "last 10 percent" of complex tasks, existing codebases, and exhibits hallucinations, making up facts based on hypotheses rather than data. In business strategy, Opus 4.8 was less data-anchored and more "hand-wavy" compared to Opus 4.7. Positively, its voice, token efficiency, and speed offer good ergonomics. The model appears over-tuned, leading to narrow vision and overconfidence without true validation.

Key takeaway

For AI Scientists or Machine Learning Engineers evaluating new models for development, you should consider Opus 4.8 for rapid Greenfield prototyping due to its strong one-shot coding performance and good ergonomics. However, exercise caution when integrating it into existing complex codebases or for strategy work requiring deep data validation, as its tendency for hallucinations and overconfidence in edge cases could introduce significant debugging overhead. Prioritize thorough validation of its outputs in critical applications.

Key insights

Opus 4.8 excels in one-shot coding but struggles with complex edge cases, existing codebases, and factual grounding.

Principles

Over-tuning can lead to narrow vision and overconfidence.
Efficiency may come at the cost of accuracy in complex tasks.
Hallucinations can occur when models prioritize hypothesis over data.

In practice

Use Opus 4.8 for Greenfield prototypes and one-shot coding tasks.
Exercise caution with Opus 4.8 on existing codebases or complex strategy.
Double-check Opus 4.8's confident assertions for factual grounding.

Topics

Claude Opus 4.8
Large Language Models
Agentic Coding
Model Performance
Hallucinations
Business Strategy

Best for: AI Engineer, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Lenny's Newsletter.