The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A study reveals "tool overuse" in Large Language Models (LLMs), where models unnecessarily invoke external tools even when internal knowledge suffices, leading to avoidable resource consumption and performance degradation. This phenomenon is pervasive across diverse LLMs, including frontier, open-source, and RLVR-trained models, with an average of 0.93 unnecessary tool calls per query and a 3.29% to 14.48% accuracy drop on internally solvable questions. The research identifies two primary mechanisms: a "knowledge epistemic illusion," where LLMs misjudge their internal knowledge boundaries, and an "outcome-only reward trap" in Reinforcement Learning with Verifiable Rewards (RLVR) training, which incentivizes final correctness over tool efficiency. Mitigation strategies, including a knowledge-aware direct preference optimization (K-DPO) and a balanced outcome-efficiency reward, significantly reduce tool calls by 82.8% and 66.7% (7B model) respectively, while improving or maintaining accuracy.

Key takeaway

For NLP Engineers and Research Scientists developing tool-augmented LLMs, understanding and mitigating tool overuse is critical. Your models may be inefficiently calling external tools due to miscalibrated internal knowledge and reward structures that prioritize correctness over efficiency. Implement knowledge-aware optimization and balanced reward functions to reduce unnecessary tool invocations, thereby improving performance and resource efficiency without sacrificing accuracy.

Key insights

LLMs frequently overuse external tools due to misjudging internal knowledge and outcome-only training rewards.

Principles

Method

Knowledge-aware Direct Preference Optimization (K-DPO) aligns perceived knowledge with actual capacity. A balanced outcome-efficiency reward penalizes tool calls during RLVR training.

In practice

Topics

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.