Investigating Action Encodings in Recurrent Neural Networks in Reinforcement Learning

· Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

This research investigates how action information is incorporated into the state update function of recurrent neural networks (RNNs) in reinforcement learning (RL) agents, a critical aspect for real-world deployment. The study empirically evaluates several architectural designs, including Additive Action (AA), Multiplicative Action (MA), Factored (Fac), and Deep Additive Action (DAA) variants, using both standard RNNs and Gated Recurrent Units (GRUs). Experiments across illustrative domains like Ring World, TMaze, Directional TMaze, Image Directional TMaze, and LunarLander-v2 demonstrate that the choice of action encoding significantly impacts an RL agent's performance in both prediction and control tasks. The multiplicative operation consistently outperforms other variants, often requiring a smaller state vector and learning faster, even in environments with complex observations like images. The paper also highlights open challenges for recurrent architectures in RL, such as practical online learning, active data collection, and the need for insights beyond simple learning curves.

Key takeaway

For Research Scientists designing recurrent neural networks for reinforcement learning, you should prioritize multiplicative action encoding in your RNN or GRU architectures. This approach consistently demonstrates superior performance and more effective state separation across various environments and observation types, often with fewer parameters. While not universally optimal, the multiplicative method significantly improves learning efficiency and prediction accuracy, suggesting a re-evaluation of architectures traditionally borrowed from supervised learning.

Key insights

Multiplicative action encoding in RNNs significantly enhances RL agent performance and state representation.

Principles

Method

The study empirically evaluates additive, multiplicative, factored, and deep additive action encoding architectures within RNNs and GRUs, using off-policy semi-gradient TD(0) for prediction and Q-learning for control across diverse RL environments.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.