Provably Convergent Actor-Critic in Risk-averse MARL

2026-02-16 · Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A novel two-timescale Actor-Critic algorithm is proposed for Multi-Agent Reinforcement Learning (MARL) to address the challenge of learning stationary policies in infinite-horizon general-sum Markov games (MGs). This algorithm focuses on Risk-averse Quantal response Equilibria (RQE), a solution concept from behavioral game theory that accounts for risk aversion and bounded rationality. The RQE framework exhibits strong regularity conditions, making it suitable for learning in MGs. The proposed algorithm features a fast-timescale actor and a slow-timescale critic, and is proven to achieve global convergence with finite-sample guarantees. Empirical validation in various environments demonstrates its superior convergence properties compared to risk-neutral baselines.

Key takeaway

For Research Scientists developing MARL systems, understanding Risk-averse Quantal response Equilibria (RQE) is crucial. This work demonstrates that RQE's regularity enables provably convergent Actor-Critic algorithms, offering a robust approach to learning stationary policies in complex multi-agent environments. You should consider integrating RQE-based methods, particularly the two-timescale Actor-Critic, to achieve superior convergence and stability in your risk-averse MARL applications.

Key insights

Risk-averse Quantal response Equilibria enable provably convergent Actor-Critic learning in general-sum Markov games.

Principles

Stationary policies are practical but computationally hard.
RQE incorporates risk aversion and bounded rationality.
RQE regularity facilitates learning in Markov games.

Method

A two-timescale Actor-Critic algorithm with a fast-timescale actor and a slow-timescale critic achieves global convergence for RQE in MGs.

In practice

Apply RQE for risk-averse multi-agent learning.
Use two-timescale Actor-Critic for MARL convergence.

Topics

Multi-Agent Reinforcement Learning
Markov Games
Actor-Critic Algorithms
Risk-averse Equilibria
Quantal Response Equilibria

Best for: Research Scientist, AI Researcher, AI Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.