Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization

2026-04-16 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

This work introduces a geometric framework for Reinforcement Learning (RL) that conceptualizes policies as mappings within the Wasserstein space of action probabilities. It establishes a Riemannian structure derived from stationary distributions and proves its existence generally. The framework defines the tangent space of policies, characterizes geodesics, and addresses the measurability of vector fields from state space to the tangent space of action probability measures. A general RL optimization problem is formulated, and a gradient flow is constructed using Otto's calculus, including a formal second-order analysis with computed gradient and Hessian of the energy. The method is demonstrated with numerical examples for low-dimensional problems, directly computing the gradient from the theoretical formalism, and for high-dimensional problems, policies are parameterized with neural networks and optimized via an ergodic approximation of the cost.

Key takeaway

For Research Scientists developing advanced RL algorithms, this geometric framework offers a novel perspective on policy optimization. Understanding the Wasserstein formulation and Otto's calculus can lead to new gradient-based methods, potentially improving the stability and convergence of complex RL systems. You should explore applying this second-order analysis to existing policy gradient algorithms.

Key insights

RL policies can be geometrically analyzed as maps in Wasserstein space, enabling gradient flow optimization.

Principles

Policies induce a Riemannian structure.
Otto's calculus enables gradient flow construction.

Method

Define a Riemannian structure on policies in Wasserstein space, characterize geodesics, formulate RL as an optimization problem, and construct a gradient flow using Otto's calculus for policy optimization.

In practice

Compute gradients directly for low-dimensional problems.
Parameterize policies with neural networks for high-dimensional tasks.

Topics

Reinforcement Learning
Optimal Transport
Wasserstein Space
Riemannian Geometry
Gradient Flow

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.