GPT-5.5 tops benchmarks but still hallucinates frequently and costs 20 percent more over the API

· Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

OpenAI's GPT-5.5, released April 24, 2026, leads the Artificial Analysis Intelligence Index with 60 points, surpassing Claude Opus 4.7 and Gemini 3.1 Pro Preview. While its API price nominally doubled to $5 and $30 per million input/output tokens, a 40 percent reduction in token consumption compared to GPT-5.4 results in a net price increase of approximately 20 percent. Despite achieving the highest accuracy of 57 percent on the AA Omniscience benchmark for factual recall, GPT-5.5 exhibits a high hallucination rate of 86 percent, significantly higher than Claude Opus 4.7's 36 percent. The model demonstrates strong price-performance at medium compute, matching Claude Opus 4.7's maximum score for a quarter of the cost.

Key takeaway

For NLP Engineers and CTOs evaluating large language models for production, GPT-5.5 offers leading benchmark performance and improved token efficiency, resulting in a 20 percent net price increase over GPT-5.4. However, its 86 percent hallucination rate demands careful consideration for applications requiring high factual accuracy, potentially necessitating robust fact-checking layers or alternative models like Claude Opus 4.7 for critical tasks.

Key insights

GPT-5.5 leads benchmarks and offers better price-performance but struggles significantly with high hallucination rates.

Principles

In practice

Topics

Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.