Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point Searches
Summary
This paper introduces a unified Bayesian Optimization (BO) framework using Gaussian Process Regression (GPR) to accelerate stationary point searches on potential energy surfaces (PES), crucial for understanding chemical reactions and material properties. The framework unifies minimization, single-point saddle searches (Dimer method), and double-ended saddle searches (Nudged Elastic Band, NEB) through a six-step surrogate loop. It employs GPR with derivative observations, inverse-distance kernels, and active learning. Key extensions include Optimal Transport GP (OT-GP) with farthest point sampling (FPS) using Earth Mover's Distance (EMD), MAP regularization via variance barriers, oscillation detection, and an adaptive trust radius. Random Fourier features (RFF) are also integrated to improve scaling for high-dimensional systems. The accompanying Rust code, "chemgp-core," demonstrates practical implementation, bridging theoretical formulation with executable code and showing significant reductions in expensive electronic structure evaluations (e.g., 5-10x for NEB, 10x for Dimer) while maintaining accuracy.
Key takeaway
For computational chemists and materials scientists performing PES explorations, this GPR-accelerated framework offers a robust method to drastically reduce computational cost. By leveraging local surrogates and active learning, you can achieve 5-10x fewer expensive electronic structure calculations for saddle point searches and minimizations. Consider adopting this unified Bayesian optimization approach, especially for large systems or high-throughput workflows, to accelerate your research without sacrificing accuracy.
Key insights
GPR with active learning and inverse-distance kernels significantly accelerates stationary point searches on potential energy surfaces.
Principles
- Local surrogates reduce evaluations by orders of magnitude.
- Calibrated uncertainty enables effective active learning.
- Analytical derivatives are crucial for GP numerical stability.
Method
The unified six-step Bayesian surrogate loop involves training a GP, optimizing on the surrogate, checking trust constraints, evaluating the oracle, selecting the next query point, and updating the training set.
In practice
- Use inverse-distance kernels for molecular systems to ensure invariance.
- Implement FPS with EMD to manage training set size and diversity.
- Apply MAP regularization and adaptive trust regions to stabilize hyperparameter optimization.
Topics
- Bayesian Optimization
- Gaussian Process Regression
- Stationary Point Search
- Inverse-Distance Kernel
- Dimer Method
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.