Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning

2025-08-07 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

A new study introduces a method to characterize Large Language Model (LLM) planning by extracting and quantifying search trees from their chain-of-thought (CoT) reasoning traces in the four-in-a-row board game. Researchers found that LLMs' search is significantly shallower than human planning, with an average maximum depth between 1.00 and 3.48 plies, compared to 4-6 for humans. Performance in LLMs is predicted by search breadth (number of candidate moves considered) rather than depth. Crucially, while LLMs generate deep nodes in their reasoning, their move choices are best explained by a myopic model that ignores these deeper nodes. A causal intervention study, where CoT paragraphs were selectively pruned, further confirmed that move selection is predominantly driven by shallow rather than deep search, contrasting with human planning where expertise correlates with deeper search.

Key takeaway

For AI scientists and research scientists developing or evaluating LLMs for strategic tasks, recognize that simply increasing reasoning trace length or encouraging deeper search may not improve performance. Your models might generate deep deliberation without actually using it for decision-making. Instead, consider implementing training signals that explicitly reward the utilization of deep lookahead to align LLM planning with human-like expertise, which is driven by deeper search.

Key insights

LLMs generate deep reasoning traces but their decisions are driven by shallow, myopic planning, unlike humans.

Principles

LLM planning prioritizes breadth over depth.
Reasoning traces do not always reflect decision-making mechanisms.

Method

Search trees are extracted from LLM reasoning traces using an LLM judge (GPT-5) and then analyzed with computational cognitive models to predict move decisions in four-in-a-row.

In practice

Focus LLM training on acting on deep lookahead.
Use mechanistic analysis beyond behavioral benchmarks.

Topics

LLM Planning
Search Tree Extraction
Chain-of-Thought Reasoning
Myopic Planning
Computational Cognitive Modeling

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.