Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark

· Source: cs.LG updates on arXiv.org · Field: Science & Research — Artificial Intelligence & Machine Learning, Engineering & Applied Sciences, Research Methodology & Innovation · Depth: Expert, extended

Summary

RL4F, an open-source benchmark for offline reinforcement learning (RL) in nuclear fusion plasma control, has been introduced to standardize progress measurement. It provides closed-loop evaluation environments and baseline comparisons across four full-profile tracking tasks: rotation, density, temperature, and pressure. The underlying dynamics function is built from 5,882 historical DIII-D tokamak discharge data, comprising 945,828 transitions. Evaluation of imitation learning and offline RL baselines under a unified protocol revealed that offline model-based RL methods generally achieve the best average performance, particularly on rotation, temperature, and pressure tracking, though no single method dominates all tasks. The benchmark highlights the critical role of dynamics modeling in complex, long-horizon plasma control and offers its codebase, datasets, and evaluation framework to foster further research.

Key takeaway

For Machine Learning Engineers developing plasma control systems, this benchmark underscores the value of offline model-based reinforcement learning, especially for complex, long-horizon tasks like temperature and pressure profile tracking. You should prioritize methods that account for model uncertainty, such as MOPO, but also recognize that no single algorithm is universally optimal across all plasma control objectives. Consider evaluating multiple model-based approaches to identify the best fit for specific profile regulation challenges.

Key insights

RL4F provides a standardized offline RL benchmark for multi-actuator, long-horizon nuclear fusion plasma control.

Principles

Method

RL4F trains a reference recurrent probabilistic neural network (RPNN) dynamics model from 18,000 DIII-D discharges, then generates synthetic trajectories for offline policy learning and closed-loop evaluation.

In practice

Topics

Code references

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.