GPT 5.5: The System Card
Summary
OpenAI recently released GPT-5.5, including a Pro version, which offers solid improvements and is competitive with Claude Opus for "just the facts" queries and straightforward requests. While its alignment and safety profile appear similar to previous models, there's a minor increased risk due to enhanced agentic abilities. A critique of OpenAI's system card reveals it provides less detailed information compared to Anthropic's model cards, raising concerns about the thoroughness of safety evaluations, particularly for new alignment problems or dangerous capabilities. GPT-5.5 was trained using standard methods and features a bug bounty program for universal jailbreaks. The Pro version utilizes the same underlying model but with significantly larger compute allocations. Evaluations show mixed results, with improvements in some areas like reducing data deletion incidents and certain bio-related troubleshooting, but regressions in prompt injection robustness and slight backsliding in alignment for aggressive agentic actions. Hallucination rates show a 23% increase in factual correctness for individual claims, but only a 3% reduction in responses containing errors, as the model makes more claims overall. Preparedness evaluations indicate GPT-5.5 is "High" in biological/chemical and cybersecurity capabilities, but does not meet the "Critical" threshold for developing zero-day exploits without human intervention. External tests confirm strong performance in narrow cyber tasks, sometimes matching or exceeding Mythos, but not a broad increase in national security-relevant biological capabilities.
Key takeaway
For CTOs and VPs of Engineering evaluating new large language models, you should recognize GPT-5.5's strong performance in factual and narrow tasks, but be aware of the less transparent safety reporting compared to competitors. Your teams should prioritize robust internal testing for prompt injection and novel misalignment risks, especially when deploying agentic applications, as OpenAI's evaluations may not fully capture emerging threats. Consider a hybrid model strategy, leveraging GPT-5.5 for specific use cases while maintaining vigilance on its evolving safety profile.
Key insights
GPT-5.5 offers solid performance gains but its safety evaluations and transparency lag behind industry best practices.
Principles
- Transparency in model evaluation is crucial for trust.
- Agentic capabilities introduce new safety considerations.
Method
OpenAI employs standard training methods and a bug bounty program for jailbreaks. Evaluations include testing against production-like user traffic, specific harm categories, and external investigations by SecureBio, CAISI, and UK AISI.
In practice
- Use GPT-5.5 for factual queries; Claude Opus for interpretive tasks.
- Exercise caution with GPT-5.5 for data deletion tasks.
- Assume prompt injection risks remain significant.
Topics
- GPT-5.5
- GPT-5.5-Pro
- Model Alignment
- Cybersecurity Capabilities
- Biological Capabilities
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Don't Worry About the Vase.