Anthropic confirms Claude Code problems and promises stricter quality controls

· Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

Anthropic confirmed and fixed three distinct bugs that caused a perceived decline in Claude Code's quality, affecting reasoning depth, caching, and text length restrictions. The issues, which included lowering default reasoning effort from "high" to "medium" on March 4, a caching optimization bug introduced March 26 that wiped reasoning history, and a system prompt instruction on April 16 limiting text to ≤25 words between tool calls and ≤100 words for final responses, were all resolved by April 20 with version 2.1.116. Anthropic is implementing stricter internal testing, including broader eval suites and gradual rollouts for intelligence-impacting changes, and has reset usage limits for all subscribers as compensation.

Key takeaway

For CTOs and VPs of Engineering evaluating AI tool reliability, Anthropic's experience highlights the critical need for robust testing and transparent communication around infrastructure changes. You should prioritize vendors with clear post-mortem processes and commit to gradual rollouts for updates that could impact model performance, especially given the industry's compute crunch and rising API prices.

Key insights

Perceived AI quality drops often stem from infrastructure or tooling changes, not core model regressions.

Principles

Method

Anthropic's post-mortem process involved identifying three independent issues: a reasoning effort reduction, a caching bug, and a prompt restriction, then rolling back or fixing each to restore performance.

In practice

Topics

Best for: Machine Learning Engineer, CTO, VP of Engineering/Data, AI Engineer, MLOps Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.