MST-Direct at Scale: Multivariate and Conditional Geostatistical Simulation via Sinkhorn Optimal Transport

· Source: cs.LG updates on arXiv.org · Field: Science & Research — Environmental Science & Earth Systems, Mathematics & Computational Sciences, Artificial Intelligence & Machine Learning · Depth: Expert, long

Summary

MST-Direct, a Matching-via-Sinkhorn-Transport approach for multivariate geostatistical simulation, has been extended to handle multivariate, conditional, and large-grid regimes. This advancement resolves prior limitations, including scalability beyond a few thousand nodes by employing a sparse, candidate-restricted Sinkhorn matcher with O(nC) memory, enabling operations on 200x200 (40,000-node) and 100x100 grids in under a minute. The method now supports many variables, demonstrated with six, by matching target value tuples onto an independent FFT-MA Gaussian backbone. Additionally, it incorporates hard-data conditioning through pinning data tuples and kriging the backbone. Validated against the Direct Multivariate Simulation (DMS) 6-variate benchmark, MST-Direct reproduces the joint distribution with zero histogram error and honors hard data exactly, outperforming the Projection Pursuit Multivariate Transform (PPMT) which remains an approximation.

Key takeaway

For geostatistical modelers and data scientists working with complex, non-linear multivariate dependencies, MST-Direct offers a robust solution. You should consider this method for generating high-fidelity simulations on large grids, especially when exact preservation of joint distributions and precise honoring of hard data are critical. This approach outperforms approximation methods like PPMT, ensuring your uncertainty quantification is based on accurate spatial and multivariate statistics.

Key insights

MST-Direct now scales to large grids, multiple variables, and conditional data while preserving exact joint distributions.

Principles

Method

MST-Direct uses sparse, candidate-restricted Sinkhorn optimal transport to match target tuples onto an FFT-MA Gaussian backbone. It incorporates relational passes and greedy bijection completion, with kriging for hard-data conditioning.

In practice

Topics

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.