cuNNQS-SCI: A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection withNeural Network QQantum States

· Source: Artificial Intelligence · Field: Science & Research — Mathematics & Computational Sciences, Physical Sciences & Chemistry, Engineering & Applied Sciences · Depth: Expert, quick

Summary

cuNNQS-SCI is a new, fully GPU-accelerated framework designed to enhance the scalability and performance of the Neural Network Quantum State-Selected Configuration Interaction (NNQS-SCI) method. The original NNQS-SCI, while accurate, faced severe limitations in larger systems due to its hybrid CPU-GPU architecture, specifically CPU-based global de-duplication and host-resident coupled-configuration generation. cuNNQS-SCI addresses these bottlenecks by integrating a distributed, load-balanced global de-duplication algorithm, employing fine-grained CUDA kernels for exact coupled configuration generation, and incorporating a GPU memory-centric runtime with pooling, streaming mini-batches, and overlapped offloading. This design allows for significantly larger configuration spaces and shifts the computational bottleneck to on-device inference. Evaluated on an NVIDIA A100 cluster with 64 GPUs, cuNNQS-SCI achieved up to a 2.32X end-to-end speedup over the baseline NNQS-SCI while maintaining chemical accuracy and demonstrating over 90% parallel efficiency.

Key takeaway

For AI Scientists and Research Scientists working on quantum chemistry simulations, cuNNQS-SCI offers a significant advancement by enabling the application of NNQS-SCI to much larger systems. Its 2.32X speedup and high parallel efficiency on GPU clusters mean you can tackle previously intractable problems with improved computational throughput. Consider adopting fully GPU-accelerated frameworks to overcome CPU-GPU communication bottlenecks in your high-performance computing workflows.

Key insights

Fully GPU-accelerating NNQS-SCI overcomes CPU bottlenecks, enabling larger quantum system simulations with significant speedups.

Principles

Method

cuNNQS-SCI integrates distributed de-duplication, uses specialized CUDA kernels for coupled configuration generation, and employs a GPU memory-centric runtime with pooling, streaming mini-batches, and overlapped offloading.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.