GPT-5.5 matches heavily hyped Mythos Preview in new cybersecurity tests
Summary
The UK's AI Security Institute (AISI) recently evaluated OpenAI's GPT-5.5, finding it achieved performance levels similar to Anthropic's Mythos Preview model on cybersecurity tasks. Anthropic had previously restricted Mythos Preview's release due to perceived outsize cybersecurity threats. AISI's evaluations, conducted since 2023, involve 95 Capture the Flag challenges covering reverse engineering, web exploitation, and cryptography. GPT-5.5 passed 71.4 percent of "Expert" tasks, slightly outperforming Mythos Preview's 68.6 percent. Notably, GPT-5.5 solved a complex Rust binary disassembler task in 10 minutes and 22 seconds with no human assistance, costing $1.73. Both models also showed progress on "The Last Ones" (TLO) simulation, with GPT-5.5 succeeding in 3 of 10 attempts, though neither could solve the more difficult "Cooling Tower" power plant disruption simulation.
Key takeaway
For CTOs and VPs of Engineering evaluating AI models for cybersecurity applications, recognize that advanced models like GPT-5.5 offer comparable defensive capabilities to specialized models like Mythos Preview. Your teams should explore OpenAI's Trusted Access for Cyber program to leverage these models for legitimate defensive work, rather than assuming a single model holds a unique advantage in threat mitigation.
Key insights
Advanced AI models like GPT-5.5 and Mythos Preview exhibit similar, significant cybersecurity capabilities.
Principles
- AI progress in autonomy and coding drives cyber capabilities.
- Cybersecurity AI models are not unique breakthroughs.
Method
AISI evaluates frontier AI models using 95 Capture the Flag challenges, including "Expert" tasks and network attack simulations like "The Last Ones" and "Cooling Tower."
In practice
- Use GPT-5.5 for reverse engineering and web exploitation.
- Consider AI for complex Rust binary decoding tasks.
Topics
- GPT-5.5
- Mythos Preview
- Cybersecurity Testing
- AI Security Institute
- Capture the Flag
Best for: CTO, VP of Engineering/Data, Executive, AI Security Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.