Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

· Source: AI News & Artificial Intelligence | TechCrunch · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Intermediate, quick

Summary

Anthropic recently released Fable, a public and limited iteration of its advanced cybersecurity model, Mythos. However, the model's stringent guardrails have drawn significant criticism from cybersecurity researchers and professionals. Fable frequently rejects requests deemed "tangentially cyber related," including basic tasks like reading a blog post or asking for a code review, often falling back to Claude Opus 4.8. These guardrails, intended to prevent misuse for malware development or biological weapons, are criticized for being haphazard and keyword-based. While Anthropic previously restricted Mythos to Project Glasswing and later expanded access to hundreds of organizations across 15 countries, the company also offers a Cyber Verification Program for professionals to bypass some limitations, mirroring OpenAI's Trusted Access for Cyber. Experts anticipate these guardrails will evolve and relax as collaboration with the cybersecurity industry increases.

Key takeaway

For AI Security Engineers evaluating large language models for cybersecurity applications, be aware that initial public releases like Anthropic's Fable may feature overly restrictive, keyword-based guardrails. These limitations can impede legitimate tasks such as secure code review or threat analysis, potentially downgrading your model's capabilities. Consider applying for specialized access programs, like Anthropic's Cyber Verification Program or OpenAI's Trusted Access for Cyber, to gain necessary functionality for your work. Anticipate that guardrails will likely evolve and relax as models mature.

Key insights

AI safety guardrails can inadvertently impede legitimate professional use in specialized fields like cybersecurity.

Principles

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, Security Engineer, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI News & Artificial Intelligence | TechCrunch.