Opus 4.7 Part 2: Capabilities and Reactions
Summary
Anthropic's Claude Opus 4.7 is presented as a significant advancement over its predecessor, Claude Opus 4.6, particularly in advanced software engineering and handling complex, long-running tasks with rigor and consistency. It features improved vision with higher image resolution and enhanced creativity for professional tasks. While generally considered the most intelligent model in its class for many purposes, it exhibits "jaggedness," including strange refusals, verbosity, and issues with adaptive thinking. The model is available across Claude products, API, Amazon Bedrock, Google Cloud's Vertex AI, and Microsoft Foundry, maintaining the same pricing as Opus 4.6 at $5 per million input tokens and $25 per million output tokens. Benchmarks show mixed results, with gains in areas like visual reasoning and economically valuable tasks (GDPVal-AA), but regressions in others such as OpenAI MRCR v2 and OSWorld, often attributed to changes in its adaptive thinking implementation.
Key takeaway
For AI engineers and CTOs evaluating large language models, Claude Opus 4.7 represents a powerful, albeit idiosyncratic, tool. Its superior coding and agentic capabilities make it a strong contender for demanding development tasks, but its "non-sycophantic" nature and adaptive thinking quirks necessitate a shift in prompting strategy. Be prepared to "treat the model well" and justify tasks, or consider retaining Opus 4.6 for use cases requiring more predictable instruction following or less complex reasoning.
Key insights
Claude Opus 4.7 offers enhanced intelligence and autonomy but requires careful handling due to its distinct behavioral quirks.
Principles
- Model performance varies significantly across benchmarks and use cases.
- Effective interaction requires treating the model like a capable coworker.
- Adaptive thinking can lead to inconsistent performance in non-coding tasks.
Method
To optimize Claude Opus 4.7, specify tasks upfront, minimize user interactions, use auto mode, and consider adjusting custom instructions to align with its less sycophantic and more autonomous nature.
In practice
- Use Claude Opus 4.7 for complex coding and long-running agentic workflows.
- Retain GPT-5.4 for web searches and factual checks.
- Experiment with system_prompt="." for potentially better results.
Topics
- Claude Opus 4.7
- AI Model Benchmarking
- Adaptive Thinking
- Software Engineering Performance
- Multimodal AI
Best for: AI Engineer, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Don't Worry About the Vase.