TokaMark: A Comprehensive Benchmark for MAST Tokamak Plasma Models
Summary
TokaMark is a new, open-source benchmark designed to evaluate AI models for predicting plasma dynamics in fusion energy reactors, specifically using real experimental data from the Mega Ampere Spherical Tokamak (MAST). It addresses the lack of curated datasets and standardized benchmarks in fusion AI. TokaMark unifies access to multi-modal, heterogeneous fusion data, harmonizes formats, and provides 14 tasks across four groups: equilibrium reconstruction, magnetics dynamics, profile dynamics, and MHD activity. The benchmark includes a multi-branch convolutional encoder-decoder baseline model, trained on 11,573 shots from the FAIR-MAST dataset with an 80%/10%/10% split. It uses a hierarchical evaluation protocol and provides Python tools for data loading and processing.
Key takeaway
For AI Scientists and Machine Learning Engineers developing models for fusion energy, TokaMark offers a critical standardized platform. You should use this open benchmark to rigorously compare your models against established baselines. Focus on tasks like profile dynamics and MHD activity. The baseline shows significant room for improvement in these areas, with NRMSE scores exceeding 0.17; task 4-5 even exceeds unity. This will accelerate the development of robust, data-driven plasma models essential for commercially viable fusion.
Key insights
TokaMark provides a unified, open benchmark with 14 tasks and a baseline for AI models in fusion plasma modeling.
Principles
- Fusion data is multi-modal, multi-rate, and often incomplete.
- AI models can learn latent plasma representations from raw data.
- Standardized benchmarks accelerate progress and ensure reproducibility.
Method
TokaMark defines 14 tasks with input/output windows, uses a sliding-window approach with a 0.001-second stride, and employs a hierarchical evaluation protocol (samples -> windows -> signals -> tasks -> shots) with NRMSE.
In practice
- Evaluate AI models on 14 diverse fusion plasma tasks.
- Utilize the provided multi-branch convolutional baseline.
- Develop models robust to multi-fidelity and missing data.
Topics
- TokaMark
- Fusion Energy
- Tokamak Plasma Modeling
- AI Benchmarking
- Multi-modal Data
- MAST Tokamak
- Plasma Diagnostics
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.