Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization
Summary
This work introduces a geometric framework for Reinforcement Learning (RL) that conceptualizes policies as mappings within the Wasserstein space of action probabilities. It establishes a Riemannian structure derived from stationary distributions and proves its existence generally. The framework defines the tangent space of policies, characterizes geodesics, and addresses the measurability of vector fields from state space to the tangent space of action probability measures. A general RL optimization problem is formulated, and a gradient flow is constructed using Otto's calculus, including a formal second-order analysis with computed gradient and Hessian of the energy. The method is demonstrated with numerical examples for low-dimensional problems, directly computing the gradient from the theoretical formalism, and for high-dimensional problems, policies are parameterized with neural networks and optimized via an ergodic approximation of the cost.
Key takeaway
For Research Scientists developing advanced RL algorithms, this geometric framework offers a novel perspective on policy optimization. Understanding the Wasserstein formulation and Otto's calculus can lead to new gradient-based methods, potentially improving the stability and convergence of complex RL systems. You should explore applying this second-order analysis to existing policy gradient algorithms.
Key insights
RL policies can be geometrically analyzed as maps in Wasserstein space, enabling gradient flow optimization.
Principles
- Policies induce a Riemannian structure.
- Otto's calculus enables gradient flow construction.
Method
Define a Riemannian structure on policies in Wasserstein space, characterize geodesics, formulate RL as an optimization problem, and construct a gradient flow using Otto's calculus for policy optimization.
In practice
- Compute gradients directly for low-dimensional problems.
- Parameterize policies with neural networks for high-dimensional tasks.
Topics
- Reinforcement Learning
- Optimal Transport
- Wasserstein Space
- Riemannian Geometry
- Gradient Flow
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.