$L^2$ over Wasserstein: Statistical Analysis for Optimal Transport

2026-05-21 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Mathematics & Computational Sciences · Depth: Expert, medium

Summary

The $L^{2}$ over Wasserstein space ($L^{2}_{W}(\mathbb{R}^{d})$) extends classical optimal transport theory to random probability measures, providing a framework to account for statistical uncertainty. This work establishes that $L^{2}_{W}(\mathbb{R}^{d})$ inherits the formal Riemannian structure of the Wasserstein space, characterizing its distances and geodesic geometry. This structure induces random flows with Wasserstein gradient flow sample paths. The framework unifies statistical convergence results for optimal transport using empirical measures, including an $L^{2}_{W}(\mathbb{R}^{d})$ law of large numbers and adjusted Fournier-Guillin rates. It also refines Schwartz's consistency theorem to the Wasserstein topology for Bayesian non-parametrics, deducing posterior convergence in $L^{2}_{W}(\mathbb{R}^{d})$. Additionally, the theory demonstrates that random token sampling for transformer models using self-attention flow paths can be embedded into the $L^{2}_{W}(\mathbb{R}^{d})$ framework, offering a unified treatment for random optimal transport in inference and generative modeling.

Key takeaway

For AI Scientists and Research Scientists developing generative models or inference systems, this framework offers a principled way to incorporate statistical uncertainty into optimal transport applications. You can now analyze the convergence of empirical measures and posterior distributions in a geometrically rich space, directly impacting the robustness and theoretical guarantees of your models. Consider applying the $L^{2}_{W}(\mathbb{R}^{d})$ framework to improve the theoretical grounding of random sampling techniques in transformer architectures.

Key insights

The $L^{2}$ over Wasserstein space unifies optimal transport under statistical uncertainty, extending its geometric and flow properties to random measures.

Principles

$L^{2}_{W}(\mathbb{R}^{d})$ inherits Wasserstein's Riemannian structure.
Optimal transport can account for statistical uncertainty.
Random gradient flows extend Wasserstein dynamics.

Method

The framework defines $L^{2}_{W}(\mathbb{R}^{d})$ as $L^{2}(\Omega;(\mathcal{P}_{2}(\mathbb{R}^{d}),W_{2}))$, using $d(\xi,\eta)\coloneqq\mathbb{E}_{\omega}[W_{2}^{2}(\xi(\omega),\eta(\omega))]^{rac{1}{2}}$ to characterize distances and geodesic geometry.

In practice

Embed transformer random token sampling.
Analyze empirical measure convergence rates.
Refine Bayesian posterior consistency.

Topics

Optimal Transport
Random Probability Measures
Wasserstein Geometry
Bayesian Non-parametrics
Transformer Models
Statistical Uncertainty

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.