Score-Based Causal Discovery of Latent Variable Causal Models
Summary
The paper introduces SALAD (Score-bAsed Latent cAusal Discovery), a novel score-based method for identifying causal structures involving causally-related latent variables. Unlike traditional constraint-based approaches that suffer from testing-order dependency and error propagation, SALAD offers identifiability guarantees and achieves score equivalence and consistency. The method formulates a scoring function, including a BIC score, and characterizes the degrees of freedom for marginal distributions over observed variables under two structural assumptions: linear 1-factor models (Silva et al., 2003) and more general latent hierarchical structures (Huang et al., 2022). Experimental results demonstrate SALAD's superior performance, achieving F1 scores of 0.99 for 1-factor models and 0.92 for hierarchical structures with 100 samples, significantly outperforming existing constraint-based baselines like FOFC, HUANG, and GIN. Both exact and continuous search procedures are developed.
Key takeaway
For AI Scientists and Research Scientists working on causal inference with unobserved confounders, SALAD offers a robust alternative to constraint-based methods. Its demonstrated superior performance, particularly with smaller datasets (e.g., 100 samples), suggests that adopting score-based approaches like SALAD can lead to more accurate and reliable discovery of latent causal structures. You should explore SALAD's exact or continuous search procedures, especially when dealing with linear latent variable models or hierarchical structures, to improve the fidelity of your causal graph estimations.
Key insights
Score-based causal discovery with identifiability guarantees effectively uncovers latent variable causal structures, outperforming constraint-based methods.
Principles
- Properly formulated scoring functions can achieve score equivalence and consistency for latent variable models.
- Characterizing degrees of freedom is crucial for score-based causal discovery with latent variables.
- Score-based methods mitigate error propagation issues common in constraint-based approaches.
Method
SALAD minimizes a scoring function (e.g., BIC score) over potential graph structures, leveraging characterizations of degrees of freedom and generalized faithfulness assumptions, using either exact enumeration or continuous optimization with Gumbel-Softmax.
In practice
- Use BIC score for finite-sample cases in latent variable causal discovery.
- Consider continuous optimization methods for computational efficiency in structure search.
Topics
- Causal Discovery
- Latent Variables
- Score-Based Methods
- BIC Score
- Graphical Models
- Structure Learning
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.