Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data

· Source: cs.AI updates on arXiv.org · Field: Science & Research — Physical Sciences & Chemistry, Engineering & Applied Sciences, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

Researchers have developed an exascale workflow for materials discovery using atomistic graph foundation models (GFMs) built on HydraGNN. This workflow jointly trains on 16 open first-principles datasets, comprising over 544 million structures and 85+ elements, utilizing a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. Executed on the Frontier supercomputer, the project involved six large-scale DeepHyper hyperparameter optimization campaigns in FP64, leading to a PaiNN-based lead model. This model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, a task that would otherwise require years of first-principles computation. The work quantifies precision-performance tradeoffs (BF16/FP32/FP64), demonstrates transfer across twelve chemically diverse downstream tasks, and establishes seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter supercomputers.

Key takeaway

For AI Scientists and Machine Learning Engineers developing materials discovery platforms, this work demonstrates that exascale-trained atomistic GFMs can transform previously intractable first-principles screening into a practical workflow. You should consider adopting multi-task learning architectures with per-dataset heads and robust data pipelines to handle heterogeneous, imbalanced datasets, ensuring your models are both scalable and transferable across diverse chemical domains. Prioritize FP64 training to maintain precision for downstream tasks requiring high accuracy.

Key insights

Exascale multi-task graph foundation models accelerate materials discovery by enabling billion-scale screening and data-efficient fine-tuning.

Principles

Method

The workflow integrates HydraGNN, ADIOS2, and DDStore for scalable training. It uses multi-task learning with shared message-passing layers and dataset-specific output heads, alongside large-scale hyperparameter optimization and precision-sensitivity characterization.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.