GPT-5.5 matches heavily hyped Mythos Preview in new cybersecurity tests

2026-05-01 · Source: AI - Ars Technica · Field: Technology & Digital — Cybersecurity & Data Privacy, Artificial Intelligence & Machine Learning · Depth: Advanced, quick

Summary

The UK's AI Security Institute (AISI) recently evaluated OpenAI's GPT-5.5, finding it achieved performance levels similar to Anthropic's Mythos Preview model on cybersecurity tasks. Anthropic had previously restricted Mythos Preview's release due to perceived outsize cybersecurity threats. AISI's evaluations, conducted since 2023, involve 95 Capture the Flag challenges covering reverse engineering, web exploitation, and cryptography. GPT-5.5 passed 71.4 percent of "Expert" tasks, slightly outperforming Mythos Preview's 68.6 percent. Notably, GPT-5.5 solved a complex Rust binary disassembler task in 10 minutes and 22 seconds with no human assistance, costing $1.73. Both models also showed progress on "The Last Ones" (TLO) simulation, with GPT-5.5 succeeding in 3 of 10 attempts, though neither could solve the more difficult "Cooling Tower" power plant disruption simulation.

Key takeaway

For CTOs and VPs of Engineering evaluating AI models for cybersecurity applications, recognize that advanced models like GPT-5.5 offer comparable defensive capabilities to specialized models like Mythos Preview. Your teams should explore OpenAI's Trusted Access for Cyber program to leverage these models for legitimate defensive work, rather than assuming a single model holds a unique advantage in threat mitigation.

Key insights

Advanced AI models like GPT-5.5 and Mythos Preview exhibit similar, significant cybersecurity capabilities.

Principles

AI progress in autonomy and coding drives cyber capabilities.
Cybersecurity AI models are not unique breakthroughs.

Method

AISI evaluates frontier AI models using 95 Capture the Flag challenges, including "Expert" tasks and network attack simulations like "The Last Ones" and "Cooling Tower."

In practice

Use GPT-5.5 for reverse engineering and web exploitation.
Consider AI for complex Rust binary decoding tasks.

Topics

GPT-5.5
Mythos Preview
Cybersecurity Testing
AI Security Institute
Capture the Flag

Best for: CTO, VP of Engineering/Data, Executive, AI Security Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.