Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games with Average Reward

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

This research introduces a Maximum Entropy Inverse Reinforcement Learning (IRL) framework for discrete-time, infinite-horizon mean-field games (MFGs) under an average-reward criterion. The method recovers an unknown reward policy from expert demonstrations, assuming they arise from a stationary mean-field equilibrium. It formulates the inverse problem by ensuring consistency with the expert mean-field term and long-run feature expectations, using a unified occupation-measure framework for two reward classes. For finite-dimensional linear rewards, a convex dual reformulation with an explicit log-partition objective supports constant-step-size gradient descent. For infinite-dimensional RKHS rewards, a Lagrangian relaxation yields a policy defined by a soft Bellman equation. A minorisation-based sub-stochastic kernel resolves the absence of a discount-factor contraction, ensuring a strict contraction of the soft Bellman operator. The framework establishes Fréchet differentiability and Lipschitz smoothness for the log-likelihood score, enabling a gradient ascent algorithm with convergence guarantees. Numerical examples, like a malware-spread MFG, confirm recovered policies match expert behavior.

Key takeaway

For Machine Learning Engineers developing multi-agent systems, this research offers a robust approach to infer underlying reward structures from observed collective behavior. You can apply this Maximum Entropy IRL framework to understand complex mean-field game dynamics, even without a discount factor. Consider implementing the proposed gradient ascent algorithm, especially for systems like malware propagation or consumer choice models, to accurately recover expert policies and improve your system's predictive capabilities.

Key insights

Recovering unknown reward policies in average-reward MFGs is feasible using maximum causal entropy and a novel contraction method.

Principles

Method

The method formulates an inverse problem enforcing consistency with expert mean-field terms and long-run feature expectations. It uses a convex dual for linear rewards and a Lagrangian relaxation with a minorisation-based sub-stochastic kernel for RKHS rewards, followed by gradient ascent.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.