Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data
Summary
Researchers have developed an exascale workflow for materials discovery using atomistic graph foundation models (GFMs) built on HydraGNN. This workflow jointly trains on 16 open first-principles datasets, comprising over 544 million structures and 85+ elements, utilizing a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. Executed on the Frontier supercomputer, the project involved six large-scale DeepHyper hyperparameter optimization campaigns in FP64, leading to a PaiNN-based lead model. This model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, a task that would otherwise require years of first-principles computation. The work quantifies precision-performance tradeoffs (BF16/FP32/FP64), demonstrates transfer across twelve chemically diverse downstream tasks, and establishes seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter supercomputers.
Key takeaway
For AI Scientists and Machine Learning Engineers developing materials discovery platforms, this work demonstrates that exascale-trained atomistic GFMs can transform previously intractable first-principles screening into a practical workflow. You should consider adopting multi-task learning architectures with per-dataset heads and robust data pipelines to handle heterogeneous, imbalanced datasets, ensuring your models are both scalable and transferable across diverse chemical domains. Prioritize FP64 training to maintain precision for downstream tasks requiring high accuracy.
Key insights
Exascale multi-task graph foundation models accelerate materials discovery by enabling billion-scale screening and data-efficient fine-tuning.
Principles
- Multi-task learning stabilizes optimization on imbalanced, multi-fidelity datasets.
- Exascale model selection must balance accuracy, throughput, and computational cost.
- Pre-trained GFMs capture reusable physical structure for data-scarce adaptation.
Method
The workflow integrates HydraGNN, ADIOS2, and DDStore for scalable training. It uses multi-task learning with shared message-passing layers and dataset-specific output heads, alongside large-scale hyperparameter optimization and precision-sensitivity characterization.
In practice
- Use FP64 for training to preserve information for lower-precision inference.
- Employ composition-conditioned branch weighting for novel chemical systems.
- Optimize inference with encoder reuse and fused gradient techniques.
Topics
- Atomistic Graph Foundation Models
- Exascale Computing
- Multi-Task Learning
- HydraGNN
- Materials Discovery
Code references
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.