When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A novel unsupervised aggregator, Propagational Proxy Voting (PPV), significantly outperforms traditional majority voting for multi-sample LLM inference. Tested on the MMLU-Pro benchmark with 128 Qwen3-1.7B samples per question, PPV achieved 42.2% accuracy versus majority's 40.7% overall, and a notable +2.24 percentage point gain (30.2% vs 28.0%) on the 8,099 non-trivial questions, with a McNemar p-value of approximately 1.0×10^-14. PPV leverages two previously discarded signals: letter-level semantic entropy to determine a group's self-weight ("When") and reasoning embedding geometry for peer delegation ("Whom"). The method involves partitioning 128 samples into 16 groups, computing their semantic entropy and centered reasoning embedding centroids, and feeding these into a stochastic delegation matrix. Crucially, per-question centering of embeddings is vital for discriminative geometry. While entropy is the primary driver of gains, geometry enables non-trivial propagation.

Key takeaway

For Machine Learning Engineers optimizing LLM inference, consider implementing Propagational Proxy Voting (PPV) instead of simple majority voting. Your multi-sample LLM aggregations can achieve higher accuracy, especially on challenging questions, by incorporating letter-level semantic entropy and reasoning embedding geometry. This unsupervised approach offers a significant performance boost, closing 38% of the gap between majority and oracle performance without requiring additional supervision.

Key insights

Delegation-based aggregation using semantic entropy and reasoning geometry significantly improves LLM multi-sample inference over majority voting.

Principles

LLM samples carry unused confidence and reasoning signals.
Geometric coherence of reasoning can overturn numerical majorities.
Per-question centering of embeddings is crucial for discriminative geometry.

Method

Partition 128 LLM samples into 16 groups. Compute each group's letter entropy and centered reasoning embedding centroid. Construct a Propagational Proxy Voting (PPV) matrix using these signals, then propagate to find consensus.

In practice

Use letter entropy to weight LLM sample group confidence.
Employ centered reasoning embeddings for peer delegation.
Consider PPV for unsupervised LLM aggregation tasks.

Topics

LLM Inference Aggregation
Propagational Proxy Voting
Semantic Entropy
Reasoning Embeddings
MMLU-Pro Benchmark
Unsupervised Learning

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.