UK gov's Mythos AI tests help separate cybersecurity threat from hype
Summary
Anthropic's Mythos Preview model, a new AI system designed for cybersecurity tasks, has undergone an initial evaluation by the UK government's AI Security Institute (AISI). While Mythos Preview shows comparable performance to other frontier models like GPT-5.4 and Anthropic's own Opus 4.6 and Codex 5.3 on individual cybersecurity tasks, its key differentiator is its ability to chain multiple tasks into complex, multistep infiltration attacks. Mythos Preview became the first AI model to successfully complete AISI's "The Last Ones" (TLO) challenge, a 32-step data extraction simulation on a corporate network, succeeding in 3 out of 10 attempts and averaging 22 steps per run, significantly outperforming Claude 4.6's 16-step average. However, the model still struggles with more complex tests like "Cooling Tower," which simulates power plant disruption.
Key takeaway
For cybersecurity leaders evaluating advanced AI models, Mythos Preview's capability to execute multistep infiltration attacks on weakly defended systems signals a critical shift. You should prioritize assessing your enterprise's exposure to chained AI attacks, particularly in less robust network segments. Proactively integrate AI-driven defensive tooling to counter these evolving threats, as future models will likely surpass Mythos's capabilities, necessitating AI-augmented defense strategies.
Key insights
Mythos Preview is the first AI to complete a complex, multistep cyber infiltration challenge.
Principles
- AI models can chain tasks for complex attacks.
- Simulated environments lack real-world defenses.
Method
AISI uses Capture the Flag (CTF) challenges, including multistep simulations like "The Last Ones" (TLO), to evaluate AI cyberattack capabilities, measuring task completion and infiltration steps.
In practice
- Test AI on multistep infiltration scenarios.
- Simulate corporate network data extraction.
- Utilize AI for defensive hardening.
Topics
- Anthropic Mythos Preview
- AI Security Institute
- Cybersecurity AI
- Multistep Infiltration
- Capture the Flag Challenges
Best for: CTO, VP of Engineering/Data, Executive, AI Security Engineer, AI Scientist, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.