To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling
Summary
A new framework, inspired by decision-making theory, has been introduced to assess and optimize Large Language Model (LLM) tool-calling decisions, particularly for web search tools. This framework evaluates tool use along three factors: necessity, utility, and affordability. Researchers analyzed six open-source LLMs, ranging from 3 to 120 billion parameters, across three question-answering tasks using Google Search (SerpApi) and Perplexity Search. The study found a consistent misalignment between models' self-perceived need and utility for tool calls and their true need and utility, leading to suboptimal performance. To address this, lightweight estimators of need and utility were trained using models' hidden states, enabling simple controllers that significantly improve decision quality and task performance compared to the models' self-perceived decisions.
Key takeaway
For NLP Engineers and Research Scientists developing agentic AI architectures, you should prioritize implementing external control mechanisms for LLM tool-calling. Relying solely on an LLM's self-perceived need and utility for tools like web search leads to suboptimal performance and inefficient resource allocation. Integrating lightweight latent estimators, trained on hidden states, can significantly improve decision quality and task performance, especially under budget constraints, by better aligning tool calls with true necessity and utility.
Key insights
LLMs often misjudge when to use external tools, leading to suboptimal performance and inefficient resource use.
Principles
- Tool use is not always beneficial.
- Optimal tool calling maximizes utility gain under cost constraints.
- Perceived need and utility often misalign with true need and utility.
Method
The framework assesses tool-calling decisions through normative (optimal), descriptive (actual behavior), and prescriptive (improvement via latent estimators) lenses, focusing on necessity, utility, and affordability.
In practice
- Train latent estimators on hidden states to predict true need.
- Rank instances by utility estimator confidence for budget-aware allocation.
- Use external controllers to guide tool-call decisions.
Topics
- LLM Tool Calling
- Agentic AI
- Decision-Making Framework
- Web Search Integration
- Latent State Estimators
Code references
Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.