When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference
Summary
A novel unsupervised aggregator, Propagational Proxy Voting (PPV), significantly outperforms traditional majority voting for multi-sample LLM inference. Tested on the MMLU-Pro benchmark with 128 Qwen3-1.7B samples per question, PPV achieved 42.2% accuracy versus majority's 40.7% overall, and a notable +2.24 percentage point gain (30.2% vs 28.0%) on the 8,099 non-trivial questions, with a McNemar p-value of approximately 1.0×10^-14. PPV leverages two previously discarded signals: letter-level semantic entropy to determine a group's self-weight ("When") and reasoning embedding geometry for peer delegation ("Whom"). The method involves partitioning 128 samples into 16 groups, computing their semantic entropy and centered reasoning embedding centroids, and feeding these into a stochastic delegation matrix. Crucially, per-question centering of embeddings is vital for discriminative geometry. While entropy is the primary driver of gains, geometry enables non-trivial propagation.
Key takeaway
For Machine Learning Engineers optimizing LLM inference, consider implementing Propagational Proxy Voting (PPV) instead of simple majority voting. Your multi-sample LLM aggregations can achieve higher accuracy, especially on challenging questions, by incorporating letter-level semantic entropy and reasoning embedding geometry. This unsupervised approach offers a significant performance boost, closing 38% of the gap between majority and oracle performance without requiring additional supervision.
Key insights
Delegation-based aggregation using semantic entropy and reasoning geometry significantly improves LLM multi-sample inference over majority voting.
Principles
- LLM samples carry unused confidence and reasoning signals.
- Geometric coherence of reasoning can overturn numerical majorities.
- Per-question centering of embeddings is crucial for discriminative geometry.
Method
Partition 128 LLM samples into 16 groups. Compute each group's letter entropy and centered reasoning embedding centroid. Construct a Propagational Proxy Voting (PPV) matrix using these signals, then propagate to find consensus.
In practice
- Use letter entropy to weight LLM sample group confidence.
- Employ centered reasoning embeddings for peer delegation.
- Consider PPV for unsupervised LLM aggregation tasks.
Topics
- LLM Inference Aggregation
- Propagational Proxy Voting
- Semantic Entropy
- Reasoning Embeddings
- MMLU-Pro Benchmark
- Unsupervised Learning
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.