Quantifying the Accuracy and Cost Impact of Design Decisions in Budget-Constrained Agentic LLM Search
Summary
A controlled measurement study introduces Budget-Constrained Agentic Search (BCAS), an evaluation harness designed to quantify the accuracy and cost impact of design decisions in agentic Retrieval-Augmented Generation (RAG) systems under explicit budget constraints. The study compares six Large Language Models (LLMs) across three question-answering benchmarks (TriviaQA, HotpotQA, 2WikiMultihopQA), analyzing how search depth, retrieval strategy, and completion token budgets affect performance. Key findings indicate that accuracy generally improves with up to three additional searches, hybrid lexical and dense retrieval with lightweight re-ranking provides the most significant average gains, and larger completion budgets are particularly beneficial for synthesis-heavy tasks like HotpotQA. The BCAS framework is model-agnostic, uses commodity prompts, and logs per-question resource consumption for cost analysis.
Key takeaway
For AI Architects designing cost-effective agentic RAG systems, prioritize allocating budget to iterative search depth and enhancing retrieval quality through hybrid search with re-ranking. Your initial focus should be on "How many searches can we afford?" rather than "How large a completion window should we buy?" This strategy captures most accuracy gains while bounding compute costs, especially for smaller models, which can match larger models' single-search performance with optimized search and retrieval.
Key insights
Budgeting search depth and retrieval quality offers more accuracy gains than expanding token limits for agentic RAG.
Principles
- Accuracy improves reliably up to ~3 searches.
- Hybrid retrieval with re-ranking yields consistent gains.
- Larger token budgets primarily aid multi-hop synthesis.
Method
BCAS uses a stateful loop, providing LLMs with explicit search and token allowances, dynamically gating tool use, and tracking resource consumption to evaluate agentic RAG performance under budget.
In practice
- Prioritize iterative search depth (up to 3 steps).
- Implement hybrid retrieval with re-ranking for evidence quality.
- Expand token limits for multi-hop reasoning tasks only.
Topics
- Retrieval-Augmented Generation
- Agentic LLMs
- Budget-Constrained AI
- Hybrid Retrieval
- Question Answering
Code references
Best for: AI Scientist, Research Scientist, AI Architect, AI Researcher, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.