CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning
Summary
CAPS (Cascaded Adaptive Pairwise Selection) is an inference-only framework designed to enhance the efficiency of parallel reasoning in large language models by optimizing pairwise self-verification. It addresses the high computational cost of existing methods like V1-Infer, which perform numerous full-evidence comparisons. CAPS introduces a four-stage cascade that non-uniformly allocates verifier compute along an "evidence axis" (adapting how much of each candidate the judge sees) and a "distribution axis" (adapting how comparisons are spread). This framework includes an optional rescue subroutine for mis-eliminated candidates. Evaluated across four self-verifying models (Qwen3-14B, GPT-OSS-20B, Qwen3-4B-Instruct/Thinking) and five reasoning benchmarks (LiveCodeBench-v5/v6, CodeContests, AIME 2025, HMMT 2025), CAPS outperforms V1-Infer on 14 of 20 suites while using only 25.4% of its verifier-token budget on code, and surpasses pointwise self-verification on all 20 suites. The efficiency gains are structural, with a closed-form cost analysis showing the per-candidate marginal cost is roughly halved.
Key takeaway
For research scientists optimizing LLM inference costs in parallel reasoning tasks, CAPS offers a compelling alternative to existing pairwise verification methods. By strategically reducing verifier-token expenditure through cascaded adaptive selection, you can achieve comparable or superior Pass@1 accuracy with significantly lower computational overhead. Before deployment, assess your verifier's accuracy at partial versus full evidence on a small dataset; if the accuracy drop is minimal (e.g., <5 percentage points), CAPS is likely to yield substantial efficiency gains without sacrificing performance.
Key insights
CAPS significantly reduces verifier-token cost in parallel reasoning by adaptively allocating compute for pairwise self-verification.
Principles
- Non-uniform compute allocation improves efficiency.
- Partial evidence can reliably discriminate many candidate pairs.
- Focus full-evidence comparisons on strongest candidates.
Method
CAPS employs a four-stage cascade: deduplication, two halving rounds at increasing evidence levels (partial then full), and a round-robin among finalists, with an optional rescue mechanism.
In practice
- Use partial views for initial candidate elimination.
- Reserve full-evidence comparisons for top contenders.
- Implement a rescue mechanism for noisy low-evidence judgments.
Topics
- Cascaded Adaptive Pairwise Selection
- LLM Parallel Reasoning
- Pairwise Self-Verification
- Verifier-Token Cost Efficiency
- Evidence Cascade
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.