Breaking: Trump asks the impossible of Anthropic
Summary
In January 2024, Gary Marcus predicted that the politics and inadequacy of guardrails would become a central issue for generative AI. This prediction materialized when White House officials reportedly demanded that Anthropic ensure its Fable 5 model's guardrails are entirely circumvent-proof before rerelease. However, security experts contend that achieving uncircumventable guardrails for large language models (LLMs) is not possible. The core issue is that next-token predictors, which LLMs are based on, are not inherently designed for safety, making it difficult to thread the needle between overly restrictive and overly permissive controls. This challenge is identified as a fundamental problem for generative AI as a whole, rather than being specific to Anthropic.
Key takeaway
For policymakers considering AI regulation, you must recognize that demanding uncircumventable guardrails for current large language models is technically infeasible. Your focus should shift from absolute prevention of jailbreaks to managing the consequences of their inevitability, potentially by exploring alternative AI architectures or implementing robust monitoring and response systems. This understanding is crucial for developing realistic and effective AI safety policies.
Key insights
Preventing LLM jailbreaks is impossible due to their fundamental next-token prediction architecture.
Principles
- LLM guardrails struggle between restrictiveness and permissiveness.
- Next-token predictors are not built for inherent safety.
- Guardrail inadequacy is a systemic Generative AI problem.
Topics
- Generative AI
- Large Language Models
- AI Safety
- AI Guardrails
- Jailbreaking
- AI Regulation
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, Policy Maker, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Marcus on AI.