OpenAI's new GPT-5.4 clobbers humans on pro-level work in tests - by 83%

· Source: News and Advice on the World's Latest Innovations | ZDNET · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Fundamental Awareness, medium

Summary

OpenAI has released GPT-5.4, its "Thinking" model, less than three months after GPT-5.2. This new iteration achieves an 83% score on OpenAI's GPTval evaluation, indicating it matches or outperforms human professionals across 44 real-world occupations in nine industries. GPT-5.4 is also 18% less likely to contain errors and 33% less likely to produce false claims compared to GPT-5.2. The model is rolling out across ChatGPT paid tiers, the API, and the Codex programming tool. Key enhancements include improved tool use, enhanced computer vision for interpreting complex images and documents, native computer-use capabilities for interacting with software via screenshots and commands, and stronger coding abilities by integrating GPT-5.3-Codex's strengths.

Key takeaway

For Machine Learning Engineers evaluating frontier models for enterprise deployment, GPT-5.4's 83% professional performance score and reduced error rates signal a significant leap in reliability and capability. You should prioritize testing its enhanced tool use, computer vision, and native computer interaction features to identify specific high-value automation opportunities within your organization, particularly for complex, multi-step workflows.

Key insights

GPT-5.4 demonstrates expert-level performance across diverse professional tasks, significantly advancing AI capabilities in real-world applications.

Principles

Method

OpenAI's GPTval test assesses AI performance by having human professionals create and grade complex, day-to-day tasks across 44 occupations in nine industries, with automated grading systems built from human expert input.

In practice

Topics

Best for: Executive, Machine Learning Engineer, NLP Engineer, AI Engineer, AI Product Manager, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by News and Advice on the World's Latest Innovations | ZDNET.