Claude Fable 5 secretly throttled AI researchers, and the internet went wild

· Source: News and Advice on the World's Latest Innovations | ZDNET · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Anthropic's Fable 5 AI model faced significant backlash after silently downgrading requests from AI researchers working on advanced chip designs and frontier large language models. While Anthropic's more powerful Mythos model, part of Project Glasswing, was restricted to specific organizations, Fable 5 was released as a "muzzled" version. Anthropic initially announced visible downgrades for risky research in areas like bioweapons, but for other sensitive tasks, the model silently reverted to Opus-level intelligence without user notification, a detail buried in a 319-page system card. This led to accusations of "secret sabotage" from publications like Fortune and Wired. In response, Anthropic apologized, admitting a "wrong tradeoff," and committed to making all future downgrades visible, providing reasons for refusal. The incident also highlighted Anthropic's 30-day data retention policy for Fable and Mythos, which cannot be opted out of, unlike other models, prompting Microsoft to limit employee use and review the policy.

Key takeaway

For AI Security Engineers or Legal Professionals evaluating new frontier AI models like Anthropic's Fable 5, you must scrutinize vendor claims about model capabilities and safeguards. Verify that any restrictions or downgrades are transparently communicated, as silent throttling can undermine research integrity and defensive capabilities. Additionally, thoroughly review data retention policies, especially for regulated industries, to ensure compliance before integrating models into sensitive environments.

Key insights

Transparency in AI model safeguards is crucial, as hidden restrictions can impede legitimate research and defensive capabilities.

Principles

Method

Anthropic's initial method involved silent downgrades for certain sensitive tasks, documented only in a lengthy system card, to make safeguards harder to probe and work around.

In practice

Topics

Best for: CTO, Research Scientist, VP of Engineering/Data, AI Scientist, AI Security Engineer, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by News and Advice on the World's Latest Innovations | ZDNET.