The Price of Paranoia: Robust Risk-Sensitive Cooperation in Non-Stationary Multi-Agent Reinforcement Learning
Summary
Cooperative equilibria in multi-agent reinforcement learning (MARL) are inherently unstable due to co-learning noise, where each agent's gradient step alters its partner's action distribution. This instability causes cooperative equilibria, even Pareto-dominant ones, to collapse exponentially once partner noise exceeds a critical cooperation threshold. Applying traditional distributional robustness to hedge against partner uncertainty exacerbates the problem, as risk-averse objectives penalize high-variance cooperative actions, expanding the instability region. A novel approach resolves this by targeting policy gradient update variance, not return distribution, modulating gradient updates based on online partner unpredictability. This method provably expands the cooperation basin in symmetric coordination games. The authors introduce the "Price of Paranoia" and a "Cooperation Window" to characterize welfare recovery under partner noise, defining optimal robustness as a balance between equilibrium stability and sample efficiency.
Key takeaway
For AI Scientists developing multi-agent reinforcement learning systems, recognize that standard risk-neutral learning and traditional risk-averse robustness undermine cooperation. You should instead focus on mitigating policy gradient update variance induced by partner uncertainty. This approach, guided by concepts like the "Price of Paranoia," offers a path to more stable and welfare-optimal cooperative behaviors in non-stationary environments.
Key insights
Co-learning noise destabilizes cooperation in MARL, requiring targeted robustness for policy gradient updates.
Principles
- Cooperative equilibria are exponentially unstable under risk-neutral learning.
- Traditional risk-averse robustness worsens cooperative instability.
- Robustness should target policy gradient update variance.
Method
Modulate policy gradient updates using an online measure of partner unpredictability to expand the cooperation basin in symmetric coordination games.
In practice
- Implement online partner unpredictability measures.
- Balance equilibrium stability with sample efficiency.
Topics
- Multi-Agent Reinforcement Learning
- Cooperative Equilibria
- Non-Stationary Environments
- Risk-Sensitive Learning
- Policy Gradient Variance
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.