Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration
Summary
Researchers Raphiel J. Murden, Ganzhong Tian, Deqiang Qiu, and Benjamin B. Risk introduce Probabilistic Joint and Individual Variation Explained (ProJIVE), a novel statistical model and Expectation-Maximization (EM) algorithm for integrating multiple datasets. ProJIVE extends probabilistic Principal Component Analysis (pPCA) to handle two or more datasets, simultaneously estimating joint and individual components. The model assumes mutual orthogonality between joint and individual subject scores, which distinguishes it from existing methods like JIVE, R.JIVE, and AJIVE. Through simulation studies, ProJIVE demonstrated greater accuracy in estimating joint subject scores and variable loadings, particularly in mixed-dimension settings and when data did not strictly conform to Gaussian assumptions. The authors applied ProJIVE to Alzheimer's Disease Neuroimaging Initiative (ADNI) data, integrating brain morphometry (cortical thickness, surface area, volume) and cognitive measures from 587 participants. The analysis revealed that ProJIVE's joint subject scores were significantly associated with genetic risk factors (ApoE4), AD diagnosis, and expensive PET biomarkers (AV45 and FDG), indicating its utility in learning biologically meaningful sources of variation.
Key takeaway
For research scientists working with multi-modal biological or clinical data, ProJIVE offers a robust method to decompose complex datasets into shared and unique components. You should consider ProJIVE for its demonstrated accuracy in estimating joint subject scores and its ability to link these scores to critical biomarkers and diagnoses, even with non-Gaussian data. This can lead to more interpretable findings and potentially reduce reliance on more expensive, invasive diagnostic methods.
Key insights
ProJIVE is a probabilistic model extending pPCA for accurate multi-dataset integration, identifying joint and individual variations.
Principles
- Model joint and individual variations as subject random effects.
- Assume mutual orthogonality between joint and individual scores.
- Maximum likelihood estimation can improve accuracy and interpretability.
Method
ProJIVE employs an Expectation-Maximization (EM) algorithm to estimate variable loadings, noise variances, and subject scores, generalizing probabilistic PCA to multiple datasets with block-specific isotropic error assumptions.
In practice
- Integrate neuroimaging and cognitive data for AD research.
- Identify features driving shared or individual variation.
- Use AIC/BIC for guiding total and joint rank choices.
Topics
- ProJIVE
- Multimodal Data Integration
- EM Algorithm
- Alzheimer's Disease
- Neuroimaging Biomarkers
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.