Looking Is Not Picking: An Attention-Segment Account of Tool-Selection Failures in LLM Agents

2026-06-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

LLM agents frequently mis-call tools, a phenomenon often attributed to models failing to identify the correct tool within a crowded selection. However, new research demonstrates the opposite: models attend most to the correct tool 80% of the time (versus a 21% chance baseline) on real BFCL failures, with the gold tool being under-attended in only 10% of cases. This refutes the "crowded-harness" theory, pinpointing the failure at the decision readout. Evidence includes prompt repairs recovering <=23% of failures, while readout-side interventions recover 59-91%. Furthermore, two gold-pointed interventions in different representations recover largely the same failures (Jaccard 0.865 pooled). A training-free selector, using per-segment attention, closes most of the gold-free-vs-oracle gap on BFCL (+11.9 pts) and adds +14.9 pts on Seal-Tools. The causal attention-bias dose-response was tested on 10 models (3-32B), and the deployable selector on 5 single-turn models.

Key takeaway

For AI Engineers debugging LLM agent tool-selection failures, shift your focus from prompt engineering to readout mechanisms. The research indicates that models often "see" the correct tool, but the decision-making process is flawed. Prioritize readout-side interventions, which recover 59-91% of failures, over prompt reordering or duplication, which yield only <=23% recovery. Consider implementing attention-based selectors to significantly improve tool-calling accuracy.

Key insights

LLM agent tool-selection failures stem from decision readout, not insufficient attention to the correct tool.

Principles

LLM agents often attend to the correct tool but fail at decision readout.
Readout-side interventions are more effective than prompt repairs.
Attention-logit bias and residual-stream steering recover similar failures.

Method

A training-free, gold-free selector uses per-segment attention to improve tool selection, closing most of the gold-free-vs-oracle gap.

In practice

Implement readout-side interventions for LLM tool-selection issues.
Utilize attention-based selectors to enhance tool-calling accuracy.

Topics

LLM Agents
Tool Selection
Attention Mechanisms
Decision Readout
Model Debugging
BFCL Benchmark
Seal-Tools

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.