Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point Searches

2026-04-17 · Source: stat.ML updates on arXiv.org · Field: Science & Research — Physical Sciences & Chemistry, Mathematics & Computational Sciences, Research Methodology & Innovation · Depth: Expert, extended

Summary

This paper introduces a unified Bayesian Optimization (BO) framework using Gaussian Process Regression (GPR) to accelerate stationary point searches on potential energy surfaces (PES), crucial for understanding chemical reactions and material properties. The framework unifies minimization, single-point saddle searches (Dimer method), and double-ended saddle searches (Nudged Elastic Band, NEB) through a six-step surrogate loop. It employs GPR with derivative observations, inverse-distance kernels, and active learning. Key extensions include Optimal Transport GP (OT-GP) with farthest point sampling (FPS) using Earth Mover's Distance (EMD), MAP regularization via variance barriers, oscillation detection, and an adaptive trust radius. Random Fourier features (RFF) are also integrated to improve scaling for high-dimensional systems. The accompanying Rust code, "chemgp-core," demonstrates practical implementation, bridging theoretical formulation with executable code and showing significant reductions in expensive electronic structure evaluations (e.g., 5-10x for NEB, 10x for Dimer) while maintaining accuracy.

Key takeaway

For computational chemists and materials scientists performing PES explorations, this GPR-accelerated framework offers a robust method to drastically reduce computational cost. By leveraging local surrogates and active learning, you can achieve 5-10x fewer expensive electronic structure calculations for saddle point searches and minimizations. Consider adopting this unified Bayesian optimization approach, especially for large systems or high-throughput workflows, to accelerate your research without sacrificing accuracy.

Key insights

GPR with active learning and inverse-distance kernels significantly accelerates stationary point searches on potential energy surfaces.

Principles

Local surrogates reduce evaluations by orders of magnitude.
Calibrated uncertainty enables effective active learning.
Analytical derivatives are crucial for GP numerical stability.

Method

The unified six-step Bayesian surrogate loop involves training a GP, optimizing on the surrogate, checking trust constraints, evaluating the oracle, selecting the next query point, and updating the training set.

In practice

Use inverse-distance kernels for molecular systems to ensure invariance.
Implement FPS with EMD to manage training set size and diversity.
Apply MAP regularization and adaptive trust regions to stabilize hyperparameter optimization.

Topics

Bayesian Optimization
Gaussian Process Regression
Stationary Point Search
Inverse-Distance Kernel
Dimer Method

Code references

HaoZeke/ChemGP

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.