Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

2026-06-10 · Source: AI News & Artificial Intelligence | TechCrunch · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Intermediate, quick

Summary

Anthropic recently released Fable, a public and limited iteration of its advanced cybersecurity model, Mythos. However, the model's stringent guardrails have drawn significant criticism from cybersecurity researchers and professionals. Fable frequently rejects requests deemed "tangentially cyber related," including basic tasks like reading a blog post or asking for a code review, often falling back to Claude Opus 4.8. These guardrails, intended to prevent misuse for malware development or biological weapons, are criticized for being haphazard and keyword-based. While Anthropic previously restricted Mythos to Project Glasswing and later expanded access to hundreds of organizations across 15 countries, the company also offers a Cyber Verification Program for professionals to bypass some limitations, mirroring OpenAI's Trusted Access for Cyber. Experts anticipate these guardrails will evolve and relax as collaboration with the cybersecurity industry increases.

Key takeaway

For AI Security Engineers evaluating large language models for cybersecurity applications, be aware that initial public releases like Anthropic's Fable may feature overly restrictive, keyword-based guardrails. These limitations can impede legitimate tasks such as secure code review or threat analysis, potentially downgrading your model's capabilities. Consider applying for specialized access programs, like Anthropic's Cyber Verification Program or OpenAI's Trusted Access for Cyber, to gain necessary functionality for your work. Anticipate that guardrails will likely evolve and relax as models mature.

Key insights

AI safety guardrails can inadvertently impede legitimate professional use in specialized fields like cybersecurity.

Principles

Overly broad AI guardrails can hinder legitimate professional tasks.
Initial AI model releases often feature strict safety measures that evolve over time.

In practice

Keyword-based guardrails may downgrade model performance to a less capable version.
Cyber verification programs offer reduced AI model limitations for approved professionals.

Topics

Anthropic Fable
AI Guardrails
Cybersecurity LLMs
Cyber Verification Program
AI Safety
Claude Opus 4.8

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, Security Engineer, Tech Journalist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI News & Artificial Intelligence | TechCrunch.