Teaching AI agents to ask better questions by playing “Battleship”

· Source: MIT News - Artificial intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

MIT researchers from CSAIL and SEAS developed "Collaborative Battleship" to test AI agents' question-asking abilities, finding that smaller models can outperform larger ones at 1 percent of the cost. They created the "BattleshipQA" dataset from human play. Initially, top LMs like GPT-5 beat humans, but smaller systems like Llama 4 Scout struggled. By implementing a Monte Carlo inference strategy, Llama 4 Scout's win rate against humans jumped from 8 percent to 82 percent, surpassing GPT-5's performance. Additionally, converting questions into code boosted answer accuracy by 15 percent on average, with GPT-4o-mini seeing a nearly 30 percent bump. This approach also improved performance in "Guess Who?", with Llama 4 Scout reaching 72 percent success and GPT-4o 90 percent.

Key takeaway

For Machine Learning Engineers developing AI agents for information-seeking tasks, consider integrating explicit inference strategies and code-based verification. Your agents can achieve superior performance and cost-efficiency, as demonstrated by Llama 4 Scout outperforming GPT-5 at 1 percent of the cost. Focus on equipping models with "world models" and question-to-code conversion to enhance their exploration and information gathering capabilities in uncertain environments.

Key insights

AI agents ask better questions and make discoveries more efficiently when given access to a "world model" and explicit verification methods.

Principles

Method

Implement a Monte Carlo inference strategy to weigh options and convert natural language questions into executable code for answer verification.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MIT News - Artificial intelligence.