EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games
Summary
EMAgnet is a novel parameter-space exponential moving average (EMA) regularization method designed for policy gradient self-play in large, two-player zero-sum imperfect-information games. Unlike traditional uniform distribution regularization, which applies equally to all actions, EMAgnet adaptively regularizes toward an EMA of the last-iterate policy's parameters, allowing the regularization target to evolve with the agent's improving strategy. Evaluated against PPO self-play with uniform-magnet regularization, under both linear and power-law annealing schedules, EMAgnet demonstrated lower exploitability in most tested environments. It showed consistent performance gains, particularly in games featuring strictly dominated strategies and exploration challenges, indicating its effectiveness in complex game theory benchmarks.
Key takeaway
For Machine Learning Engineers developing self-play algorithms for large, imperfect-information games, consider implementing EMAgnet's parameter-space EMA regularization. This method adaptively targets evolving strategies, demonstrating lower exploitability and consistent performance gains over uniform regularization, especially in environments with strictly dominated strategies. You should evaluate EMAgnet to enhance the robustness and learning efficiency of your policy gradient systems in complex game-theoretic scenarios.
Key insights
EMAgnet uses adaptive parameter-space EMA regularization for policy gradient self-play, outperforming uniform regularization in large games.
Principles
- Adaptive regularization improves game-theoretic learning.
- Parameter-space EMA targets evolving strategies.
- Self-play with regularization can exceed specialized algorithms.
Method
EMAgnet regularizes policy gradient methods by targeting an exponential moving average (EMA) of the last-iterate policy's parameters, allowing the regularization target to adapt as the agent's strategy improves.
In practice
- Apply EMAgnet to two-player zero-sum games.
- Improve self-play in games with dominated strategies.
- Enhance exploration in complex game environments.
Topics
- Policy Gradient
- Self-Play
- EMA Regularization
- Game Theory
- Multiagent Systems
- Exploitability
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.