NEW GPT-5.4 Reasoning TEST

· Source: Discover AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, medium

Summary

OpenAI's new GBD 5.4 AI model, released on March 5th, 2026, was subjected to a "causal reasoning test" designed for scientific work. The test involved an "elevator problem" requiring the model to find the shortest path from floor 0 to floor 50 using fewer than 20 button presses, with moves outside 0-50 being illegal. The standard GBD 5.4 model, priced at $2.5 per million input tokens and $15 per million output tokens (significantly cheaper than the Pro version's $180 output price), repeatedly failed this task. Across multiple attempts, including self-verification runs, the model either could not reach floor 50, landed on an incorrect floor (e.g., 46), or proposed illegal moves (e.g., floor 53). It ultimately concluded that no valid path exists under the given rules without clarification, even when prompted to accept its own default rule sets. The analyst plans to test the "high" version of GBD 5.4 next.

Key takeaway

For AI Engineers evaluating new large language models for scientific or constraint-based problem-solving, you should rigorously test base versions like GBD 5.4 with specific, non-trivial reasoning tasks. Do not assume basic models can handle complex logical constraints or pathfinding without explicit rule clarification or resorting to higher-tier, more expensive versions. Your initial assessment should include edge cases and implicit rule adherence to avoid deployment failures.

Key insights

GBD 5.4 struggles with complex causal reasoning and constraint satisfaction in a simple mathematical puzzle.

Principles

Method

A "causal reasoning test" involving an elevator pathfinding problem with specific floor and button press constraints was used to evaluate model capabilities.

In practice

Topics

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.