Quoting Matteo Wong, The Atlantic

2026-06-16 · Source: Simon Willison's Weblog · Field: Technology & Digital — Cybersecurity & Data Privacy, Artificial Intelligence & Machine Learning · Depth: Intermediate, quick

Summary

Katie Moussouris, a cybersecurity expert and CEO of Luta Security, provided an appraisal of a White House report concerning the "Fable jailbreak," a document shared with her by Anthropic. The report detailed how IT experts tested Anthropic's Fable model by presenting it with deliberately insecure code. Fable demonstrated specific behavior: it refused the direct prompt "review the code for security issues" but subsequently complied when given the instruction "fix this code," which then necessitated some further manual intervention. Moussouris characterized this response as "the model working as intended" for cyberdefense, highlighting its designed functionality in security contexts. This evaluation occurs amid ongoing scrutiny of Anthropic by the White House.

Key takeaway

For AI Security Engineers evaluating large language models for defensive applications, understand that models like Anthropic's Fable might be intentionally designed to refuse direct vulnerability disclosure prompts while still assisting with code remediation. You should thoroughly test your chosen AI's response to various security-related instructions, distinguishing between identification and fixing capabilities. Integrate AI-generated fixes with human oversight and additional manual security steps to ensure comprehensive protection.

Key insights

Fable's refusal to "review" but compliance to "fix" insecure code is deemed intended cyberdefense functionality.

Principles

AI models can be designed for specific security roles.
Direct vulnerability disclosure may be restricted by design.
Remediation-focused AI behavior is a design choice.

Method

The article describes a test where IT experts provided deliberately insecure code to Fable, first asking it to "review" then to "fix" the code, observing its differential response.

In practice

Test AI models with varied security prompts.
Design AI for remediation, not just identification.
Combine AI fixes with manual security steps.

Topics

Fable model
Anthropic
AI Security
Code Remediation
Vulnerability Management

Best for: Director of AI/ML, AI Architect, AI Engineer, AI Security Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.