CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

CAPS (Cascaded Adaptive Pairwise Selection) is an inference-only framework designed to enhance the efficiency of parallel reasoning in large language models by optimizing pairwise self-verification. It addresses the high computational cost of existing methods like V1-Infer, which perform numerous full-evidence comparisons. CAPS introduces a four-stage cascade that non-uniformly allocates verifier compute along an "evidence axis" (adapting how much of each candidate the judge sees) and a "distribution axis" (adapting how comparisons are spread). This framework includes an optional rescue subroutine for mis-eliminated candidates. Evaluated across four self-verifying models (Qwen3-14B, GPT-OSS-20B, Qwen3-4B-Instruct/Thinking) and five reasoning benchmarks (LiveCodeBench-v5/v6, CodeContests, AIME 2025, HMMT 2025), CAPS outperforms V1-Infer on 14 of 20 suites while using only 25.4% of its verifier-token budget on code, and surpasses pointwise self-verification on all 20 suites. The efficiency gains are structural, with a closed-form cost analysis showing the per-candidate marginal cost is roughly halved.

Key takeaway

For research scientists optimizing LLM inference costs in parallel reasoning tasks, CAPS offers a compelling alternative to existing pairwise verification methods. By strategically reducing verifier-token expenditure through cascaded adaptive selection, you can achieve comparable or superior Pass@1 accuracy with significantly lower computational overhead. Before deployment, assess your verifier's accuracy at partial versus full evidence on a small dataset; if the accuracy drop is minimal (e.g., <5 percentage points), CAPS is likely to yield substantial efficiency gains without sacrificing performance.

Key insights

CAPS significantly reduces verifier-token cost in parallel reasoning by adaptively allocating compute for pairwise self-verification.

Principles

Method

CAPS employs a four-stage cascade: deduplication, two halving rounds at increasing evidence levels (partial then full), and a round-robin among finalists, with an optional rescue mechanism.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.