Retry Policy Gradients in Continuous Action Spaces

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Researchers Soichiro Nishimori and Paavo Parmas from The University of Tokyo introduce Retry Policy Gradients for continuous action spaces, extending retry-based objectives like ReMax from discrete settings. Their work, ReMax Actor-Critic (ReMAC), is an off-policy actor-critic algorithm that uses a pathwise derivative estimator to optimize the ReMax objective. ReMAC promotes stochastic exploration without explicit entropy regularization by reshaping the policy-gradient landscape, biasing updates towards higher policy entropy and damping gradients to slow convergence. They demonstrate that Adam's adaptive normalization can mitigate this damping effect. Empirical evaluations across six Brax continuous-control tasks, including Ant and HalfCheetah, show ReMAC with retry budgets M>1 achieves performance comparable to Soft Actor-Critic (SAC) and consistently yields higher policy entropy than M=1 settings, particularly for M=4 and M=8 with Adam's default ε=10⁻⁸.

Key takeaway

For Machine Learning Engineers developing continuous control agents, consider ReMax Actor-Critic (ReMAC) as an alternative to entropy-regularized methods like SAC. ReMAC can achieve comparable performance while naturally encouraging exploration and higher policy entropy with retry budgets M>1. You should experiment with Adam's ε parameter to fine-tune the balance between exploration and convergence speed, potentially increasing it to restore ReMax's damping effect.

Key insights

ReMax extends to continuous action spaces, promoting exploration by reshaping policy gradients without entropy bonuses.

Principles

Method

ReMAC is an off-policy actor-critic algorithm. It uses a pathwise derivative estimator to optimize the ReMax objective, replacing SAC's entropy bonus with the ReMax loss.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.