Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, long

Summary

Granularity-Regulated Adaptive Computational Efficiency (GRACE) is a unified theoretical framework addressing optimal verification granularity in large language model (LLM) test-time scaling (TTS). It characterizes the ideal granularity, ranging from coarse-grained outcome reward models (ORMs) to fine-grained process reward models (PRMs), as an explicit function of problem difficulty, verifier accuracy, and compute budget. The framework establishes a phase transition theorem: fine-grained verification dominates when compute budgets are large or problems are hard, while coarse-grained is preferred for low budgets or easy problems. GRACE unifies existing TTS methods like Best-of-N, beam search, and MCTS. Motivated by this theory, GRACE-Adapt, an adaptive strategy, dynamically selects optimal granularity per problem instance. Empirical validation on MATH-500, GSM8K, and AIME benchmarks demonstrates GRACE-Adapt's effectiveness, outperforming fixed-granularity baselines by up to 3.1% accuracy at matched compute, confirming all four theoretical claims.

Key takeaway

For Machine Learning Engineers optimizing LLM reasoning performance under compute constraints, you should dynamically adjust verification granularity rather than using fixed strategies. Your choice between coarse-grained (ORM) and fine-grained (PRM) verification should depend on problem difficulty and available compute budget. Implement an adaptive strategy like GRACE-Adapt to achieve up to 3.1% accuracy gains, ensuring compute-performance Pareto optimality for your LLM applications.

Key insights

Optimal LLM verification granularity depends on problem difficulty and compute budget, exhibiting a phase transition.

Principles

Method

GRACE-Adapt estimates problem difficulty, then computes optimal granularity g* and candidate count N* to generate and verify solutions.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.