Robust Simulation-Based Inference Under Missing Data via Neural Processes

2025-11-20 · Source: Research feeds | TransferLab — appliedAI Institute · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, long

Summary

RISE (Robust Inference under imputed SimulatEd data) is a novel method introduced by Verna et al. [Ver25R] that addresses the significant problem of missing data in simulation-based inference (SBI). Traditional SBI methods, particularly neural posterior estimation (NPE), assume complete observations, leading to biased posterior estimates when data is incomplete; even 10% missingness can cause drift, and 60% can invalidate scientific conclusions. RISE tackles this by jointly learning imputation and parameter inference within a unified framework, extending the NPE loss function to optimize both components simultaneously. It utilizes Neural Processes (NP) for flexible imputation, allowing it to model distributions over functions and capture uncertainty, and supports different missingness mechanisms (MCAR, MAR, MNAR) by factorizing its latent variable. Empirical evaluations show RISE outperforms baselines like NPE-NN and Simformer on benchmarks like GLU, GLM, and Two Moons, especially under MNAR conditions, while adding only about 50% more training time.

Key takeaway

For research scientists working with simulation-based inference on real-world datasets, you should consider adopting RISE if your data contains missing values. This method directly addresses the bias introduced by incomplete observations, which can severely compromise your posterior estimates. By jointly modeling imputation and inference, RISE offers a more robust solution than naive approaches, ensuring more accurate and reliable parameter estimation, particularly in scenarios with complex missingness mechanisms. Ensure your simulator is well-specified and you have domain knowledge to select the appropriate missingness assumption for training.

Key insights

RISE jointly learns data imputation and parameter inference in SBI to robustly handle missing data and prevent biased posteriors.

Principles

Missing data biases SBI posteriors.
Imputation and inference are interdependent.
Missingness mechanisms affect inference.

Method

RISE extends NPE by jointly optimizing a neural posterior estimator and a Neural Process-based imputation model, which learns distributions over missing values and their mechanisms (MCAR, MAR, MNAR) from simulated data.

In practice

Use RISE for SBI with up to 60% missing data.
Select missingness assumption (MCAR/MAR/MNAR) for training.
Leverage the PyTorch implementation at Aalto-QuML/RISE.

Topics

Simulation-based Inference
Missing Data Imputation
Neural Posterior Estimation
Neural Processes
Bayesian Parameter Estimation

Code references

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Research feeds | TransferLab — appliedAI Institute.