EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games

2026-06-22 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Gaming & Interactive Media · Depth: Expert, quick

Summary

EMAgnet is a novel parameter-space exponential moving average (EMA) regularization method designed for policy gradient self-play in large, two-player zero-sum imperfect-information games. Unlike traditional uniform distribution regularization, which applies equally to all actions, EMAgnet adaptively regularizes toward an EMA of the last-iterate policy's parameters, allowing the regularization target to evolve with the agent's improving strategy. Evaluated against PPO self-play with uniform-magnet regularization, under both linear and power-law annealing schedules, EMAgnet demonstrated lower exploitability in most tested environments. It showed consistent performance gains, particularly in games featuring strictly dominated strategies and exploration challenges, indicating its effectiveness in complex game theory benchmarks.

Key takeaway

For Machine Learning Engineers developing self-play algorithms for large, imperfect-information games, consider implementing EMAgnet's parameter-space EMA regularization. This method adaptively targets evolving strategies, demonstrating lower exploitability and consistent performance gains over uniform regularization, especially in environments with strictly dominated strategies. You should evaluate EMAgnet to enhance the robustness and learning efficiency of your policy gradient systems in complex game-theoretic scenarios.

Key insights

EMAgnet uses adaptive parameter-space EMA regularization for policy gradient self-play, outperforming uniform regularization in large games.

Principles

Adaptive regularization improves game-theoretic learning.
Parameter-space EMA targets evolving strategies.
Self-play with regularization can exceed specialized algorithms.

Method

EMAgnet regularizes policy gradient methods by targeting an exponential moving average (EMA) of the last-iterate policy's parameters, allowing the regularization target to adapt as the agent's strategy improves.

In practice

Apply EMAgnet to two-player zero-sum games.
Improve self-play in games with dominated strategies.
Enhance exploration in complex game environments.

Topics

Policy Gradient
Self-Play
EMA Regularization
Game Theory
Multiagent Systems
Exploitability

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.