DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination
Summary
DICE, a novel framework for Entropy-Regularized Equilibrium Selection, addresses the instability in multi-agent large language model (LLM) systems that often underperform single strong models using best-of-N sampling. This instability stems from ill-posed equilibrium selection, leading to oscillations and drift, which cause unstable learning and linear Bayesian regret. DICE introduces the Heterogeneous Quantal Response Equilibrium (HQRE), an entropy-regularized concept featuring agent- and state-dependent temperatures. Under a monotonicity condition, HQRE is unique, supports linearly convergent mirror updates, and ensures bounded Bayesian regret. The framework is instantiated in two algorithms: DICE-PC, which coordinates frozen models through prompt-control actions, and DICE-FT, performing parameter-efficient mirror fine-tuning. Across eleven benchmarks in four domains, DICE significantly improves accuracy-cost trade-offs, with DICE-PC showing a 4.3 percentage point average improvement and DICE-FT an 8.5 point improvement on reasoning and planning tasks.
Key takeaway
For Machine Learning Engineers developing multi-agent LLM systems, if you are encountering instability or sub-optimal coordination, you should investigate DICE. This framework provides a principled method to achieve stable multi-agent performance by addressing ill-posed equilibrium selection. Implementing DICE-PC for prompt-control or DICE-FT for fine-tuning can significantly improve your accuracy-cost trade-offs, with reported gains of 4.3 to 8.5 percentage points on reasoning and planning tasks.
Key insights
Multi-agent LLM instability can be resolved by well-posed, entropy-regularized equilibrium selection using HQRE.
Principles
- Ill-posed equilibrium selection causes multi-agent LLM instability.
- HQRE ensures unique, stable coordination with bounded Bayesian regret.
- Monotonicity enables linearly convergent updates and stability diagnostics.
Method
DICE-PC coordinates frozen models via prompt-control; DICE-FT uses parameter-efficient mirror fine-tuning to achieve HQRE.
In practice
- DICE-PC improves reasoning/planning by 4.3 percentage points.
- DICE-FT improves reasoning/planning by 8.5 percentage points.
- Improves accuracy-cost trade-offs across diverse benchmarks.
Topics
- Multi-Agent LLMs
- Equilibrium Selection
- HQRE
- Prompt-Control
- Parameter-Efficient Fine-Tuning
- Markov Games
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.