CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning

2026-05-18 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

CAPS (Cascaded Adaptive Pairwise Selection) is an inference-only framework designed to enhance the efficiency of parallel reasoning in large language models by optimizing pairwise self-verification. It addresses the high computational cost of existing methods like V1-Infer, which perform numerous full-evidence comparisons. CAPS introduces a four-stage cascade that non-uniformly allocates verifier compute along an "evidence axis" (adapting how much of each candidate the judge sees) and a "distribution axis" (adapting how comparisons are spread). This framework includes an optional rescue subroutine for mis-eliminated candidates. Evaluated across four self-verifying models (Qwen3-14B, GPT-OSS-20B, Qwen3-4B-Instruct/Thinking) and five reasoning benchmarks (LiveCodeBench-v5/v6, CodeContests, AIME 2025, HMMT 2025), CAPS outperforms V1-Infer on 14 of 20 suites while using only 25.4% of its verifier-token budget on code, and surpasses pointwise self-verification on all 20 suites. The efficiency gains are structural, with a closed-form cost analysis showing the per-candidate marginal cost is roughly halved.

Key takeaway

For research scientists optimizing LLM inference costs in parallel reasoning tasks, CAPS offers a compelling alternative to existing pairwise verification methods. By strategically reducing verifier-token expenditure through cascaded adaptive selection, you can achieve comparable or superior Pass@1 accuracy with significantly lower computational overhead. Before deployment, assess your verifier's accuracy at partial versus full evidence on a small dataset; if the accuracy drop is minimal (e.g., <5 percentage points), CAPS is likely to yield substantial efficiency gains without sacrificing performance.

Key insights

CAPS significantly reduces verifier-token cost in parallel reasoning by adaptively allocating compute for pairwise self-verification.

Principles

Non-uniform compute allocation improves efficiency.
Partial evidence can reliably discriminate many candidate pairs.
Focus full-evidence comparisons on strongest candidates.

Method

CAPS employs a four-stage cascade: deduplication, two halving rounds at increasing evidence levels (partial then full), and a round-robin among finalists, with an optional rescue mechanism.

In practice

Use partial views for initial candidate elimination.
Reserve full-evidence comparisons for top contenders.
Implement a rescue mechanism for noisy low-evidence judgments.

Topics

Cascaded Adaptive Pairwise Selection
LLM Parallel Reasoning
Pairwise Self-Verification
Verifier-Token Cost Efficiency
Evidence Cascade

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.