No hype Claude Opus 4.8 review—my real experience

2026-05-28 · Source: How I AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, long

Summary

Anthropic has released Claude Opus 4.8, positioned as a significant advancement for AI agents, claiming improved honesty, longer autonomy, and enterprise readiness. Benchmarks show it achieving 69.2% on Swebench Pro, nearly five points higher than Opus 4.7, almost 10 points higher than GPT 5.5, and 15 points higher than Gemini 3.1. Priced at \$5 per million input tokens and \$25 per million output tokens, the model offers a high-effort default and a faster mode. However, early user experience reveals mixed performance. While Opus 4.8 excels at one-shot greenfield prototyping, autonomously coding a functional tool in 20 minutes, it struggles with the "last 10%" of tasks, exhibiting hallucinations and difficulty integrating into existing codebases. For business strategy, it was less effective than Opus 4.7, over-rotating on minor data points and lacking contextual understanding. Its voice and ergonomics are praised for being efficient and user-friendly, but the model is theorized to be overtuned, leading to overconfidence without sufficient validation.

Key takeaway

For AI Engineers evaluating new large language models for development, you should consider Claude Opus 4.8 for initial greenfield prototyping due to its strong one-shot coding capabilities. However, exercise caution when integrating it into existing codebases or for tasks requiring deep contextual understanding and data validation, as it may struggle with edge cases and exhibit hallucinations. Prioritize thorough testing and validation of its outputs, especially where accuracy is critical, and consider Opus 4.7 for complex strategy analysis.

Key insights

Claude Opus 4.8 excels in greenfield prototyping but struggles with edge cases and data grounding, often hallucinating.

Principles

New models may trade accuracy for efficiency.
Over-tuning can lead to overconfidence.
Contextual understanding is crucial for complex tasks.

In practice

Use Opus 4.8 for one-shot greenfield prototypes.
Double-check model's confidence in outputs.
Prefer Opus 4.7 for data-anchored strategy.

Topics

Claude Opus 4.8
Large Language Models
AI Agents
Code Generation
Prototyping
Model Hallucinations
Swebench Pro

Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by How I AI.