Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark

2026-06-09 · Source: cs.LG updates on arXiv.org · Field: Science & Research — Artificial Intelligence & Machine Learning, Engineering & Applied Sciences, Research Methodology & Innovation · Depth: Expert, extended

Summary

RL4F, an open-source benchmark for offline reinforcement learning (RL) in nuclear fusion plasma control, has been introduced to standardize progress measurement. It provides closed-loop evaluation environments and baseline comparisons across four full-profile tracking tasks: rotation, density, temperature, and pressure. The underlying dynamics function is built from 5,882 historical DIII-D tokamak discharge data, comprising 945,828 transitions. Evaluation of imitation learning and offline RL baselines under a unified protocol revealed that offline model-based RL methods generally achieve the best average performance, particularly on rotation, temperature, and pressure tracking, though no single method dominates all tasks. The benchmark highlights the critical role of dynamics modeling in complex, long-horizon plasma control and offers its codebase, datasets, and evaluation framework to foster further research.

Key takeaway

For Machine Learning Engineers developing plasma control systems, this benchmark underscores the value of offline model-based reinforcement learning, especially for complex, long-horizon tasks like temperature and pressure profile tracking. You should prioritize methods that account for model uncertainty, such as MOPO, but also recognize that no single algorithm is universally optimal across all plasma control objectives. Consider evaluating multiple model-based approaches to identify the best fit for specific profile regulation challenges.

Key insights

RL4F provides a standardized offline RL benchmark for multi-actuator, long-horizon nuclear fusion plasma control.

Principles

Offline model-based RL generally outperforms model-free methods in fusion plasma control.
No single offline RL algorithm dominates all plasma profile tracking tasks.
Tracking near the plasma core is more challenging due to higher magnitude and temporal variation.

Method

RL4F trains a reference recurrent probabilistic neural network (RPNN) dynamics model from 18,000 DIII-D discharges, then generates synthetic trajectories for offline policy learning and closed-loop evaluation.

In practice

Develop controllers for rotation, density, temperature, and pressure profiles.
Utilize neutral-beam power/torque, gas puffing, and electron-cyclotron heating as actuators.

Topics

Offline Reinforcement Learning
Nuclear Fusion
Plasma Control
Tokamak Devices
DIII-D
Recurrent Probabilistic Neural Networks
Profile Tracking

Code references

LucasCJYSDL/Offline-RL-Kit-for-Nuclear-FusionJiayu

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.