Why the US government shut down Anthropic’s latest Claude AI model
Summary
On June 12, AI lab Anthropic suspended public access to its recently released Claude models, Fable 5 and Mythos 5, just three days after launch. This action followed a US government "export control directive" restricting model use to US nationals only. Anthropic believes the directive stemmed from the government's awareness of a "jailbreak" for Fable 5, which could bypass safeguards designed to prevent its use for cyberattacks. Mythos 5, Anthropic's most powerful "frontier" model, was initially withheld due to hacking capabilities and later provided to US tech corporations for system patching. The incident highlights the inherent difficulty in securing large language models, as perfect jailbreak resistance is unachievable, and an "Undersphere" community actively works to circumvent guardrails. This event also occurs amidst escalating conflict between Anthropic and the Trump administration over AI regulation and military use.
Key takeaway
For policy makers developing AI governance frameworks, this incident underscores the critical need for independent model evaluation. You cannot rely solely on developer-provided safeguards or rapid regulation, as models are opaque and jailbreaks are inevitable. Your framework must be global, participatory, and built on reciprocal trust to predict and address potential failures, moving beyond a hands-off approach to proactive, independent review of frontier AI capabilities before public release.
Key insights
AI model safeguards are inherently vulnerable to circumvention, posing significant governance challenges.
Principles
- Perfect jailbreak resistance is unachievable for current AI models.
- LLM behavior remains partly opaque even to their builders.
- Rapid AI development outpaces regulatory and guardrail effectiveness.
In practice
- Implement multi-layered security beyond model-based guardrails.
- Monitor "Undersphere" communities for emerging jailbreak techniques.
Topics
- AI Governance
- Export Controls
- Large Language Models
- AI Safety
- Model Jailbreaks
- Anthropic Claude
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Policy Maker, AI Security Engineer, Tech Journalist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial intelligence (AI) – The Conversation.