Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Deep reinforcement learning (DRL) systems frequently encounter unstable training dynamics, marked by non-stationarity, representation collapse, and neuron dormancy. This research demonstrates that isotropic Gaussian embeddings offer a provably stable solution for tracking time-varying targets. Building on this, the authors propose Sketched Isotropic Gaussian Regularization (SIGReg), a computationally inexpensive method to shape representations towards an isotropic Gaussian distribution during training. Empirically, SIGReg significantly improves performance and stability across diverse domains. It enhanced 51 of 57 Atari games (89.5%) for PQN, yielding a mean AUC improvement of 889%, and 68% of games for PPO with a 25% average AUC gain. SIGReg also reduces representation collapse and neuron dormancy in environments like CIFAR-10 and Isaac Gym, often outperforming or matching complex second-order optimizers like Kronecker-factored optimization with greater efficiency.

Key takeaway

For Machine Learning Engineers developing deep reinforcement learning systems, non-stationarity is a core challenge. You should integrate Sketched Isotropic Gaussian Regularization (SIGReg) into your DRL pipelines. This lightweight method shapes representations towards an isotropic Gaussian distribution, enhancing training stability, reducing collapse, and mitigating neuron dormancy. Adopting SIGReg improves performance and sample efficiency, often matching complex optimizers without their overhead.

Key insights

Isotropic Gaussian representations fundamentally stabilize deep RL by mitigating non-stationarity, preventing collapse, and reducing neuron dormancy.

Principles

Method

Sketched Isotropic Gaussian Regularization (SIGReg) projects embeddings onto random directions. It then applies a univariate distribution-matching loss to each projection, enforcing isotropy and Gaussianity efficiently.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.