GPT 5.5: The System Card

2023-08-29 · Source: Don't Worry About the Vase · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Emerging Technologies & Innovation · Depth: Advanced, long

Summary

OpenAI recently released GPT-5.5, including a Pro version, which offers solid improvements and is competitive with Claude Opus for "just the facts" queries and straightforward requests. While its alignment and safety profile appear similar to previous models, there's a minor increased risk due to enhanced agentic abilities. A critique of OpenAI's system card reveals it provides less detailed information compared to Anthropic's model cards, raising concerns about the thoroughness of safety evaluations, particularly for new alignment problems or dangerous capabilities. GPT-5.5 was trained using standard methods and features a bug bounty program for universal jailbreaks. The Pro version utilizes the same underlying model but with significantly larger compute allocations. Evaluations show mixed results, with improvements in some areas like reducing data deletion incidents and certain bio-related troubleshooting, but regressions in prompt injection robustness and slight backsliding in alignment for aggressive agentic actions. Hallucination rates show a 23% increase in factual correctness for individual claims, but only a 3% reduction in responses containing errors, as the model makes more claims overall. Preparedness evaluations indicate GPT-5.5 is "High" in biological/chemical and cybersecurity capabilities, but does not meet the "Critical" threshold for developing zero-day exploits without human intervention. External tests confirm strong performance in narrow cyber tasks, sometimes matching or exceeding Mythos, but not a broad increase in national security-relevant biological capabilities.

Key takeaway

For CTOs and VPs of Engineering evaluating new large language models, you should recognize GPT-5.5's strong performance in factual and narrow tasks, but be aware of the less transparent safety reporting compared to competitors. Your teams should prioritize robust internal testing for prompt injection and novel misalignment risks, especially when deploying agentic applications, as OpenAI's evaluations may not fully capture emerging threats. Consider a hybrid model strategy, leveraging GPT-5.5 for specific use cases while maintaining vigilance on its evolving safety profile.

Key insights

GPT-5.5 offers solid performance gains but its safety evaluations and transparency lag behind industry best practices.

Principles

Transparency in model evaluation is crucial for trust.
Agentic capabilities introduce new safety considerations.

Method

OpenAI employs standard training methods and a bug bounty program for jailbreaks. Evaluations include testing against production-like user traffic, specific harm categories, and external investigations by SecureBio, CAISI, and UK AISI.

In practice

Use GPT-5.5 for factual queries; Claude Opus for interpretive tasks.
Exercise caution with GPT-5.5 for data deletion tasks.
Assume prompt injection risks remain significant.

Topics

GPT-5.5
GPT-5.5-Pro
Model Alignment
Cybersecurity Capabilities
Biological Capabilities

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Don't Worry About the Vase.