Robust Simulation-Based Inference Under Missing Data via Neural Processes
Summary
RISE (Robust Inference under imputed SimulatEd data) is a novel method introduced by Verna et al. [Ver25R] that addresses the significant problem of missing data in simulation-based inference (SBI). Traditional SBI methods, particularly neural posterior estimation (NPE), assume complete observations, leading to biased posterior estimates when data is incomplete; even 10% missingness can cause drift, and 60% can invalidate scientific conclusions. RISE tackles this by jointly learning imputation and parameter inference within a unified framework, extending the NPE loss function to optimize both components simultaneously. It utilizes Neural Processes (NP) for flexible imputation, allowing it to model distributions over functions and capture uncertainty, and supports different missingness mechanisms (MCAR, MAR, MNAR) by factorizing its latent variable. Empirical evaluations show RISE outperforms baselines like NPE-NN and Simformer on benchmarks like GLU, GLM, and Two Moons, especially under MNAR conditions, while adding only about 50% more training time.
Key takeaway
For research scientists working with simulation-based inference on real-world datasets, you should consider adopting RISE if your data contains missing values. This method directly addresses the bias introduced by incomplete observations, which can severely compromise your posterior estimates. By jointly modeling imputation and inference, RISE offers a more robust solution than naive approaches, ensuring more accurate and reliable parameter estimation, particularly in scenarios with complex missingness mechanisms. Ensure your simulator is well-specified and you have domain knowledge to select the appropriate missingness assumption for training.
Key insights
RISE jointly learns data imputation and parameter inference in SBI to robustly handle missing data and prevent biased posteriors.
Principles
- Missing data biases SBI posteriors.
- Imputation and inference are interdependent.
- Missingness mechanisms affect inference.
Method
RISE extends NPE by jointly optimizing a neural posterior estimator and a Neural Process-based imputation model, which learns distributions over missing values and their mechanisms (MCAR, MAR, MNAR) from simulated data.
In practice
- Use RISE for SBI with up to 60% missing data.
- Select missingness assumption (MCAR/MAR/MNAR) for training.
- Leverage the PyTorch implementation at Aalto-QuML/RISE.
Topics
- Simulation-based Inference
- Missing Data Imputation
- Neural Posterior Estimation
- Neural Processes
- Bayesian Parameter Estimation
Code references
- Aalto-QuML/RISE
- mackelab/sbi
- mackelab/simformer
- sbi-benchmark/sbibm
- montefiore-ai/trust-crisis-in-simulation-based-inference
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Research feeds | TransferLab — appliedAI Institute.