Investigating Action Encodings in Recurrent Neural Networks in Reinforcement Learning

2026-05-19 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

This research investigates how action information is incorporated into the state update function of recurrent neural networks (RNNs) in reinforcement learning (RL) agents, a critical aspect for real-world deployment. The study empirically evaluates several architectural designs, including Additive Action (AA), Multiplicative Action (MA), Factored (Fac), and Deep Additive Action (DAA) variants, using both standard RNNs and Gated Recurrent Units (GRUs). Experiments across illustrative domains like Ring World, TMaze, Directional TMaze, Image Directional TMaze, and LunarLander-v2 demonstrate that the choice of action encoding significantly impacts an RL agent's performance in both prediction and control tasks. The multiplicative operation consistently outperforms other variants, often requiring a smaller state vector and learning faster, even in environments with complex observations like images. The paper also highlights open challenges for recurrent architectures in RL, such as practical online learning, active data collection, and the need for insights beyond simple learning curves.

Key takeaway

For Research Scientists designing recurrent neural networks for reinforcement learning, you should prioritize multiplicative action encoding in your RNN or GRU architectures. This approach consistently demonstrates superior performance and more effective state separation across various environments and observation types, often with fewer parameters. While not universally optimal, the multiplicative method significantly improves learning efficiency and prediction accuracy, suggesting a re-evaluation of architectures traditionally borrowed from supervised learning.

Key insights

Multiplicative action encoding in RNNs significantly enhances RL agent performance and state representation.

Principles

Action encoding method impacts RL agent performance.
Multiplicative updates can reduce required sequence length.
Online learning presents unique challenges for RNNs in RL.

Method

The study empirically evaluates additive, multiplicative, factored, and deep additive action encoding architectures within RNNs and GRUs, using off-policy semi-gradient TD(0) for prediction and Q-learning for control across diverse RL environments.

In practice

Prioritize multiplicative action encoding for RNNs in RL.
Consider combined cells for adaptive architecture selection.
Investigate active data collection to mitigate long temporal dependencies.

Topics

Reinforcement Learning
Recurrent Neural Networks
Action Encoding
Multiplicative Updates
State Representation

Code references

mkschleg/ActionRNNs.jl

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.