MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

MARS (Margin-Adversarial Risk-controlled Stopping) is a novel early-stopping rule designed for parallel LLM test-time scaling, which typically samples many reasoning traces and majority-votes their answers. This method addresses the substantial computational overhead of running all traces to completion by probing partial traces at intermediate checkpoints. MARS estimates the likelihood of active traces changing their answers and stops generation when the leading answer's margin is robust against potential future vote shifts. It separates uncertainty into trace-level switch probabilities, learned by a five-feature logistic model, and adversarial bounds for switching trace destinations, calibrated from warmup traces. Across three reasoning models and three competition-math benchmarks, MARS saves 25–47% of self-consistency tokens and an additional 14–29% on DeepConf Online, while matching full-budget accuracy within 0.6 percentage points and improving it by up to 0.8 points in some cases.

Key takeaway

For MLOps Engineers deploying parallel LLM reasoning systems, you should consider integrating MARS to significantly reduce inference costs and latency. This method allows you to achieve 25–47% token savings on self-consistency and 14–29% on DeepConf Online without sacrificing accuracy. Evaluate MARS by running a small set of warmup traces to calibrate its switch probability model and adversarial bounds, ensuring robust early stopping.

Key insights

MARS uses margin-adversarial risk control to safely stop parallel LLM inference early, preserving accuracy while reducing token usage.

Principles

Method

MARS probes active traces for current answers, estimates switch probabilities using a 5-feature logistic model, and computes maximum adversarial margin loss. Stopping occurs when the leader's margin exceeds this loss plus a concentration correction.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.