MST-Direct at Scale: Multivariate and Conditional Geostatistical Simulation via Sinkhorn Optimal Transport
Summary
MST-Direct, a Matching-via-Sinkhorn-Transport approach for multivariate geostatistical simulation, has been extended to handle multivariate, conditional, and large-grid regimes. This advancement resolves prior limitations, including scalability beyond a few thousand nodes by employing a sparse, candidate-restricted Sinkhorn matcher with O(nC) memory, enabling operations on 200x200 (40,000-node) and 100x100 grids in under a minute. The method now supports many variables, demonstrated with six, by matching target value tuples onto an independent FFT-MA Gaussian backbone. Additionally, it incorporates hard-data conditioning through pinning data tuples and kriging the backbone. Validated against the Direct Multivariate Simulation (DMS) 6-variate benchmark, MST-Direct reproduces the joint distribution with zero histogram error and honors hard data exactly, outperforming the Projection Pursuit Multivariate Transform (PPMT) which remains an approximation.
Key takeaway
For geostatistical modelers and data scientists working with complex, non-linear multivariate dependencies, MST-Direct offers a robust solution. You should consider this method for generating high-fidelity simulations on large grids, especially when exact preservation of joint distributions and precise honoring of hard data are critical. This approach outperforms approximation methods like PPMT, ensuring your uncertainty quantification is based on accurate spatial and multivariate statistics.
Key insights
MST-Direct now scales to large grids, multiple variables, and conditional data while preserving exact joint distributions.
Principles
- Optimal Transport ensures exact joint distribution preservation.
- Sparse Sinkhorn matching enables large-scale geostatistical simulation.
- Kriging and data pinning facilitate exact hard-data conditioning.
Method
MST-Direct uses sparse, candidate-restricted Sinkhorn optimal transport to match target tuples onto an FFT-MA Gaussian backbone. It incorporates relational passes and greedy bijection completion, with kriging for hard-data conditioning.
In practice
- Simulate complex non-linear multivariate geological models.
- Generate conditional realizations honoring specific hard data.
- Apply to large spatial grids for uncertainty quantification.
Topics
- Geostatistical Simulation
- Optimal Transport
- Sinkhorn Algorithm
- Multivariate Analysis
- Hard Data Conditioning
- FFT-MA Gaussian Backbone
Best for: AI Scientist, Research Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.