Claude Opus 4.8 is here. Is it as good as they say?

· Source: Lenny's Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

Anthropic has released Opus 4.8, its latest coding model, claiming significant performance improvements for agents. Benchmarks show Opus 4.8 achieving 69.2% on SwiBench Pro, nearly five points higher than Opus 4.7, almost 10 points higher than GPT 5.5, and 15 points higher than Gemini 3.1. The model is priced at \$5 per input million tokens and \$25 per output million tokens. Early testing indicates Opus 4.8 excels in one-shot Greenfield coding prototypes, delivering functional code and following architectural specifications. However, it struggles with the "last 10 percent" of complex tasks, existing codebases, and exhibits hallucinations, making up facts based on hypotheses rather than data. In business strategy, Opus 4.8 was less data-anchored and more "hand-wavy" compared to Opus 4.7. Positively, its voice, token efficiency, and speed offer good ergonomics. The model appears over-tuned, leading to narrow vision and overconfidence without true validation.

Key takeaway

For AI Scientists or Machine Learning Engineers evaluating new models for development, you should consider Opus 4.8 for rapid Greenfield prototyping due to its strong one-shot coding performance and good ergonomics. However, exercise caution when integrating it into existing complex codebases or for strategy work requiring deep data validation, as its tendency for hallucinations and overconfidence in edge cases could introduce significant debugging overhead. Prioritize thorough validation of its outputs in critical applications.

Key insights

Opus 4.8 excels in one-shot coding but struggles with complex edge cases, existing codebases, and factual grounding.

Principles

In practice

Topics

Best for: AI Engineer, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Lenny's Newsletter.