Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Budget-Guided MCTS (BG-MCTS) is a novel tree-search decoding algorithm designed for Large Language Models (LLMs) operating under fixed per-query token budgets. Unlike traditional budget-agnostic policies that often lead to inefficient search patterns, BG-MCTS dynamically aligns its search strategy with the remaining budget. It initiates with broad exploration and progressively shifts towards refining and completing promising candidates as the budget depletes, effectively reducing late-stage over-branching. Evaluated on mathematical reasoning benchmarks MATH500 and AIME24/25, using open-weight LLMs like Llama-3.1-8B-Instruct and Qwen-2.5-7B-Instruct, BG-MCTS consistently outperformed budget-agnostic baselines across budgets of 10k, 20k, and 30k tokens, achieving superior accuracy and precision by optimizing budget utilization for higher-quality answers.

Key takeaway

For MLOps Engineers deploying LLMs with fixed per-query token budgets, you should consider implementing budget-aware tree-search decoding like BG-MCTS. This approach optimizes answer quality by dynamically shifting from broad exploration to focused refinement as your budget depletes, preventing wasteful late-stage branching. It ensures more reliable solutions within your cost constraints, offering a predictable accuracy-cost trade-off.

Key insights

Budget-Guided MCTS dynamically adjusts LLM tree-search from broad exploration to focused refinement based on remaining token budget.

Principles

Method

BG-MCTS modifies PUCT selection by annealing exploration and adding a depth-biased completion bonus, alongside a budget-guided widening mechanism, all conditioned on the budget sufficiency ratio.

In practice

Topics

Code references

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.