Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs

2026-01-02 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Budget-Guided MCTS (BG-MCTS) is a novel tree-search decoding algorithm designed for Large Language Models (LLMs) operating under fixed per-query token budgets. Unlike traditional budget-agnostic policies that often lead to inefficient search patterns, BG-MCTS dynamically aligns its search strategy with the remaining budget. It initiates with broad exploration and progressively shifts towards refining and completing promising candidates as the budget depletes, effectively reducing late-stage over-branching. Evaluated on mathematical reasoning benchmarks MATH500 and AIME24/25, using open-weight LLMs like Llama-3.1-8B-Instruct and Qwen-2.5-7B-Instruct, BG-MCTS consistently outperformed budget-agnostic baselines across budgets of 10k, 20k, and 30k tokens, achieving superior accuracy and precision by optimizing budget utilization for higher-quality answers.

Key takeaway

For MLOps Engineers deploying LLMs with fixed per-query token budgets, you should consider implementing budget-aware tree-search decoding like BG-MCTS. This approach optimizes answer quality by dynamically shifting from broad exploration to focused refinement as your budget depletes, preventing wasteful late-stage branching. It ensures more reliable solutions within your cost constraints, offering a predictable accuracy-cost trade-off.

Key insights

Budget-Guided MCTS dynamically adjusts LLM tree-search from broad exploration to focused refinement based on remaining token budget.

Principles

Align search policy with remaining token budget.
Suppress late-stage widening from shallow nodes.
Prioritize depth-first search as budget depletes.

Method

BG-MCTS modifies PUCT selection by annealing exploration and adding a depth-biased completion bonus, alongside a budget-guided widening mechanism, all conditioned on the budget sufficiency ratio.

In practice

Use BG-MCTS for LLM inference in resource-constrained settings.
Apply budget-conditioned policies for high-quality data synthesis.

Topics

Large Language Models
Tree-Search Decoding
Monte Carlo Tree Search
Test-Time Scaling
Token Budget Optimization
Mathematical Reasoning Benchmarks

Code references

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.