GPT 5.4 "we see no wall"

2026-03-06 · Source: Wes Roth · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Robotics & Autonomous Systems · Depth: Intermediate, long

Summary

OpenAI has released GPT 5.4, featuring significant advancements in performance and capabilities, including native computer use and enhanced vision. The model achieved an 82-83% win/tie rate against human experts on the GDP val benchmark, which assesses performance on industry-specific tasks, and surpassed human performance on the OS World Verified benchmark with a 75% success rate in desktop navigation. Concurrently, OpenAI is expanding its offerings with a suite of financial service tools and adopting strategies from Anthropic, such as supporting skills and facilitating migration. However, Anthropic has been officially designated a supply chain risk, though its scope is limited to Department of War contracts. Additionally, Anthropic published a report on AI's labor market impacts, noting a slowdown in early-career hiring, and a prominent OpenAI researcher, Max Schwarzer, has moved to Anthropic.

Key takeaway

For CTOs and VPs of Engineering evaluating AI integration, GPT 5.4's native computer use and superior performance on benchmarks like GDP val and OS World Verified indicate a critical shift. Your teams should explore its potential for automating complex workflows, particularly in financial services and general desktop operations, to capitalize on its ability to surpass human expert performance in specific tasks. Be mindful of the evolving competitive landscape and regulatory challenges, such as Anthropic's supply chain risk designation.

Key insights

GPT 5.4 demonstrates human-surpassing performance in desktop navigation and expert-level task completion, signaling a new era for AI agents.

Principles

AI models are achieving parity with human experts in complex tasks.
Native computer use capabilities enhance AI agent autonomy.

Method

The OS World Verified benchmark measures a model's ability to navigate a desktop environment via screenshots and keyboard/mouse actions, providing a quantifiable success rate for computer interaction.

In practice

Develop AI agents for web and software automation using Playright.
Utilize GPT 5.4's vision for troubleshooting visual applications like games.

Topics

GPT 5.4
Native Computer Use
AI Benchmarks
Labor Market Impact
Anthropic Supply Chain Risk

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, AI Product Manager, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Wes Roth.