When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference
Summary
Propagational Proxy Voting (PPV) is introduced as a novel unsupervised aggregator for multi-sample LLM inference, outperforming traditional majority voting. This new consensus rule achieved a +1.5 percentage point (pp) improvement overall on MMLU-Pro, and a +2.24 pp gain on its non-trivial subset, with a paired McNemar p ~ 1.0e-14 (n = 8,099). PPV leverages two previously discarded signals from each sample: within-group letter entropy and between-group reasoning geometry. It employs "WHEN" and "WHOM" levers, where "WHEN" (self-weight) is driven by letter entropy and "WHOM" (peer delegation) by per-question-centered embedding cosine. The method operates without gold labels or auxiliary training, processing 128 sampled generations partitioned into 16 groups per question. It computes each group's semantic entropy and reasoning embedding centroid, feeding these into a stochastic delegation matrix to determine the consensus answer. An example illustrates PPV overturning a 10-6 majority by identifying geometric incoherence in the majority cluster (mean within-cluster cosine -0.02) versus a tight minority (+0.26).
Key takeaway
For Machine Learning Engineers optimizing multi-sample LLM inference, consider implementing Propagational Proxy Voting (PPV) instead of simple majority voting. Your aggregation strategy can significantly improve accuracy, achieving +1.5 pp on MMLU-Pro, by incorporating signals like within-group letter entropy and between-group reasoning geometry. This unsupervised method requires no auxiliary training, offering a direct path to more robust LLM consensus. Evaluate PPV to enhance the reliability of your LLM outputs, particularly in critical applications where nuanced reasoning is paramount.
Key insights
PPV beats majority voting in LLM inference by using delegation based on entropy and reasoning geometry.
Principles
- Majority voting discards valuable LLM signals.
- Delegation can improve LLM consensus accuracy.
- Geometric coherence indicates answer reliability.
Method
Partition 128 LLM generations into 16 groups, compute letter entropy and reasoning embedding centroids, then feed into a stochastic delegation matrix for consensus.
In practice
- Apply PPV for unsupervised LLM aggregation.
- Use letter entropy to weigh self-picks.
- Employ embedding cosine for peer delegation.
Topics
- LLM Inference
- Multi-Sample Aggregation
- Propagational Proxy Voting
- Majority Voting
- MMLU-Pro Benchmark
- Reasoning Geometry
- Letter Entropy
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.