ARROW: Augmented Replay for RObust World models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

ARROW (Augmented Replay for RObust World models) is a novel model-based continual reinforcement learning algorithm that enhances DreamerV3 with a memory-efficient, distribution-matching replay buffer. Unlike standard fixed-size buffers, ARROW employs two complementary buffers: a short-term FIFO buffer and a long-term global distribution matching buffer, each with a capacity of 2^18 observations, totaling 2^19 observations. Evaluated across challenging continual RL settings, including Atari games (without shared structure) and Procgen CoinRun variants (with shared structure), ARROW demonstrated substantially less catastrophic forgetting. For instance, on Atari tasks, it reduced forgetting by over six-fold compared to DreamerV3 (0.197 vs. 1.217) and achieved near-zero forgetting on CoinRun. The algorithm also exhibited a superior stability-plasticity trade-off, with WC-ACC scores up to 0.615 for Atari, and exceptional performance recovery in two-cycle training scenarios.

Key takeaway

For machine learning engineers deploying reinforcement learning agents in environments requiring continuous skill acquisition, ARROW offers a robust solution to catastrophic forgetting. You should consider implementing a dual-buffer replay system, combining short-term recency with long-term distribution matching, to maintain performance across diverse tasks. This approach significantly improves stability and knowledge retention, especially in scenarios without shared task structure, enabling more reliable lifelong learning systems.

Key insights

Augmented replay to a World Model significantly reduces catastrophic forgetting in continual reinforcement learning.

Principles

Method

ARROW extends DreamerV3 with a short-term FIFO buffer and a long-term global distribution matching buffer, sampled uniformly. It uses spliced rollouts and fixed-entropy regularization for exploration.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.