Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation
Summary
Anthropic has addressed user reports of "AI shrinkflation" in its Claude models, which suggested a degradation in reasoning capability, increased hallucinations, and token waste. The company published a technical post-mortem identifying three product-layer changes, not underlying model weight regressions, as the cause. These included a default reasoning effort change from `high` to `medium` for Claude Code on March 4, a caching logic bug introduced on March 26 that cleared thinking history too frequently, and system prompt verbosity limits added on April 16. These issues affected Claude Code CLI, Claude Agent SDK, and Claude Cowork, but not the Claude API. Anthropic has since reverted the reasoning effort and verbosity prompt changes, fixed the caching bug in v2.1.116, and reset subscriber usage limits.
Key takeaway
For AI architects and engineering leads deploying large language models, this incident highlights the critical impact of "harness" or product-layer changes on perceived model performance. Ensure your teams implement robust internal testing, comprehensive evaluation suites for prompt modifications, and strict gating for model-specific adjustments to prevent unintended regressions and maintain user trust. Proactive communication and transparency are key to managing community expectations.
Key insights
Product-layer changes, not model degradation, caused Claude's perceived performance issues.
Principles
- Harness changes impact model behavior.
- Caching logic can degrade model memory.
- Prompt changes alter model quality.
Method
Anthropic identified three specific product-layer changes: default reasoning effort, a caching logic bug, and system prompt verbosity limits. They resolved issues by reverting changes and fixing the bug.
In practice
- Implement internal dogfooding for public builds.
- Run broad evaluation suites for prompt changes.
- Gate model-specific changes strictly.
Topics
- Claude AI Degradation
- Anthropic Post-Mortem
- AI Performance Benchmarks
- System Prompt Engineering
- Caching Logic Bugs
Code references
Best for: CTO, VP of Engineering/Data, AI Architect, Machine Learning Engineer, AI Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.