Breaking: Trump asks the impossible of Anthropic

· Source: Marcus on AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, AI Governance & Policy · Depth: Intermediate, quick

Summary

In January 2024, Gary Marcus predicted that the politics and inadequacy of guardrails would become a central issue for generative AI. This prediction materialized when White House officials reportedly demanded that Anthropic ensure its Fable 5 model's guardrails are entirely circumvent-proof before rerelease. However, security experts contend that achieving uncircumventable guardrails for large language models (LLMs) is not possible. The core issue is that next-token predictors, which LLMs are based on, are not inherently designed for safety, making it difficult to thread the needle between overly restrictive and overly permissive controls. This challenge is identified as a fundamental problem for generative AI as a whole, rather than being specific to Anthropic.

Key takeaway

For policymakers considering AI regulation, you must recognize that demanding uncircumventable guardrails for current large language models is technically infeasible. Your focus should shift from absolute prevention of jailbreaks to managing the consequences of their inevitability, potentially by exploring alternative AI architectures or implementing robust monitoring and response systems. This understanding is crucial for developing realistic and effective AI safety policies.

Key insights

Preventing LLM jailbreaks is impossible due to their fundamental next-token prediction architecture.

Principles

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, Policy Maker, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Marcus on AI.