Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new method addresses the challenge of allocating test-time compute for large language models (LLMs) under finite inference budgets, formalizing it as a constrained optimization problem to maximize expected accuracy while adhering to an average compute budget. The proposed two-stage "Solve-then-Learn" pipeline first uses Lagrangian relaxation to decompose the global constraint into per-instance sub-problems, each with a closed-form oracle action that optimally balances accuracy and cost. This stage leverages binary search for exact budget targeting, as the induced cost is monotone in the dual variable. The second stage trains a lightweight classifier to predict these oracle actions from inexpensive input features, enabling real-time deployment. Experiments on MATH and GSM8K datasets with DeepSeek-V3, GPT-4o-mini, and Qwen2.5-7B LLMs demonstrate up to a 12.8% relative accuracy improvement on MATH compared to uniform and heuristic baselines, achieving over 91% imitation accuracy of the Lagrangian oracle.

Key takeaway

For MLOps Engineers deploying LLMs with test-time compute scaling, this method offers a principled way to optimize resource allocation. You should consider implementing a "Solve-then-Learn" pipeline to dynamically assign compute based on input complexity, potentially achieving significant accuracy gains (e.g., 12.8% on MATH) within your existing budget constraints. This approach provides a robust alternative to uniform or heuristic allocation strategies.

Key insights

Optimizing LLM test-time compute involves balancing accuracy and cost via a constrained policy.

Principles

Method

A two-stage "Solve-then-Learn" pipeline uses Lagrangian relaxation for optimal per-instance compute pricing, followed by training a classifier to predict these optimal actions from cheap input features for real-time deployment.

In practice

Topics

Best for: Research Scientist, MLOps Engineer, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.