Reflex: Reinforcement Learning with Reflection Symmetry Exploitation in State-Based Continuous Control

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Reflex is a novel reinforcement learning paradigm designed to enhance sample efficiency in state-based continuous control tasks by exploiting reflection symmetry. This approach formalizes two types of reflection—axial and bilateral—and integrates them into both on-policy (PPO) and off-policy (SAC, TD3) RL algorithms through principled symmetry regularization mechanisms. Unlike prior work focusing on image-based RL or rotational symmetry, Reflex specifically targets state-based environments where symmetries are often implicit. Evaluated on OpenAI Gym and DeepMind Control benchmarks, Reflex consistently demonstrated superior performance and improved sample efficiency compared to standard baselines. For instance, PPO-based methods showed up to approximately 30% improvement in final performance on bilateral reflection tasks. The method uses a decaying regularization weight, w_t=w_0(1-t/T), with w=0.1 proving optimal for Reflex-PPO.

Key takeaway

For Machine Learning Engineers developing state-based continuous control agents, consider integrating reflection symmetry into your RL algorithms. Reflex consistently improves sample efficiency and final performance by leveraging axial or bilateral reflection. You should apply symmetry regularization to both actor and critic, using a decaying weight schedule (e.g., w=0.1 initially). This approach reduces environment interaction costs and enhances learning robustness, particularly for tasks with inherent left-right symmetry.

Key insights

Exploiting reflection symmetry in state-based RL significantly improves sample efficiency and performance.

Principles

Optimal value functions and policies are G-invariant in G-invariant MDPs.
Symmetry regularization enhances information sharing across mirrored states.
Decaying regularization weight balances bias and flexibility.

Method

Reflex integrates reflection symmetry into RL algorithms (PPO, SAC) via symmetry regularization terms for actor/critic or symmetric target averaging.

In practice

Apply axial reflection for systems like CartPole.
Use bilateral reflection for agents with intrinsic left-right symmetry.
Implement a decaying regularization weight for stability.

Topics

Reinforcement Learning
Reflection Symmetry
State-based Control
Sample Efficiency
PPO
SAC
TD3

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.