Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs
Summary
Budget-Guided MCTS (BG-MCTS) is a novel tree-search decoding algorithm designed for Large Language Models (LLMs) operating under fixed per-query token budgets. Unlike traditional budget-agnostic policies that often lead to inefficient search patterns, BG-MCTS dynamically aligns its search strategy with the remaining budget. It initiates with broad exploration and progressively shifts towards refining and completing promising candidates as the budget depletes, effectively reducing late-stage over-branching. Evaluated on mathematical reasoning benchmarks MATH500 and AIME24/25, using open-weight LLMs like Llama-3.1-8B-Instruct and Qwen-2.5-7B-Instruct, BG-MCTS consistently outperformed budget-agnostic baselines across budgets of 10k, 20k, and 30k tokens, achieving superior accuracy and precision by optimizing budget utilization for higher-quality answers.
Key takeaway
For MLOps Engineers deploying LLMs with fixed per-query token budgets, you should consider implementing budget-aware tree-search decoding like BG-MCTS. This approach optimizes answer quality by dynamically shifting from broad exploration to focused refinement as your budget depletes, preventing wasteful late-stage branching. It ensures more reliable solutions within your cost constraints, offering a predictable accuracy-cost trade-off.
Key insights
Budget-Guided MCTS dynamically adjusts LLM tree-search from broad exploration to focused refinement based on remaining token budget.
Principles
- Align search policy with remaining token budget.
- Suppress late-stage widening from shallow nodes.
- Prioritize depth-first search as budget depletes.
Method
BG-MCTS modifies PUCT selection by annealing exploration and adding a depth-biased completion bonus, alongside a budget-guided widening mechanism, all conditioned on the budget sufficiency ratio.
In practice
- Use BG-MCTS for LLM inference in resource-constrained settings.
- Apply budget-conditioned policies for high-quality data synthesis.
Topics
- Large Language Models
- Tree-Search Decoding
- Monte Carlo Tree Search
- Test-Time Scaling
- Token Budget Optimization
- Mathematical Reasoning Benchmarks
Code references
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.