Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark
Summary
RL4F, an open-source benchmark for offline reinforcement learning (RL) in nuclear fusion plasma control, has been introduced to standardize progress measurement. It provides closed-loop evaluation environments and baseline comparisons across four full-profile tracking tasks: rotation, density, temperature, and pressure. The underlying dynamics function is built from 5,882 historical DIII-D tokamak discharge data, comprising 945,828 transitions. Evaluation of imitation learning and offline RL baselines under a unified protocol revealed that offline model-based RL methods generally achieve the best average performance, particularly on rotation, temperature, and pressure tracking, though no single method dominates all tasks. The benchmark highlights the critical role of dynamics modeling in complex, long-horizon plasma control and offers its codebase, datasets, and evaluation framework to foster further research.
Key takeaway
For Machine Learning Engineers developing plasma control systems, this benchmark underscores the value of offline model-based reinforcement learning, especially for complex, long-horizon tasks like temperature and pressure profile tracking. You should prioritize methods that account for model uncertainty, such as MOPO, but also recognize that no single algorithm is universally optimal across all plasma control objectives. Consider evaluating multiple model-based approaches to identify the best fit for specific profile regulation challenges.
Key insights
RL4F provides a standardized offline RL benchmark for multi-actuator, long-horizon nuclear fusion plasma control.
Principles
- Offline model-based RL generally outperforms model-free methods in fusion plasma control.
- No single offline RL algorithm dominates all plasma profile tracking tasks.
- Tracking near the plasma core is more challenging due to higher magnitude and temporal variation.
Method
RL4F trains a reference recurrent probabilistic neural network (RPNN) dynamics model from 18,000 DIII-D discharges, then generates synthetic trajectories for offline policy learning and closed-loop evaluation.
In practice
- Develop controllers for rotation, density, temperature, and pressure profiles.
- Utilize neutral-beam power/torque, gas puffing, and electron-cyclotron heating as actuators.
Topics
- Offline Reinforcement Learning
- Nuclear Fusion
- Plasma Control
- Tokamak Devices
- DIII-D
- Recurrent Probabilistic Neural Networks
- Profile Tracking
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.