5 frontier AI models were asked to code bots to navigate a foggy maze with teleportals. 1st to the exit wins. Over 500 steps and you're eliminated. Gemini, ChatGPT, and Mimo bots never made it past round 8. Here's Claude's and Grok's bots playing Round 93.

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, quick

Summary

Five frontier AI models, including Gemini, ChatGPT, Mimo, Claude, and Grok, were tasked with coding bots to navigate a complex, foggy maze featuring teleportals. The objective was for bots to reach an exit in the fewest possible steps, with a strict elimination limit of 500 steps per round. Bots operated with only a 5x5 window of visibility around their current position, requiring them to build internal "mental maps" from partial observations. While Gemini, ChatGPT, and Mimo bots were eliminated before Round 9, Claude's and Grok's bots demonstrated superior performance, advancing to Round 93 in the ongoing competition. This setup provides a novel method for comparing AI capabilities beyond standard maze-solving tasks.

Key takeaway

For AI scientists evaluating model capabilities, consider designing competitive coding challenges that introduce partial observability and dynamic environmental elements like teleportals. This approach moves beyond standard, fully observable problems, providing a more robust assessment of an AI's ability to infer, plan, and adapt under uncertainty, which is crucial for developing more sophisticated autonomous agents.

Key insights

AI models can be compared by tasking them to code bots for complex, partially observable navigation challenges.

Principles

Method

Bots navigate a foggy maze with teleportals, building a mental map from a 5x5 observation window, aiming for the exit within 500 steps.

In practice

Topics

Best for: Machine Learning Engineer, AI Scientist, Research Scientist, AI Researcher, AI Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.