Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

Researchers Raphiel J. Murden, Ganzhong Tian, Deqiang Qiu, and Benjamin B. Risk introduce Probabilistic Joint and Individual Variation Explained (ProJIVE), a novel statistical model and Expectation-Maximization (EM) algorithm for integrating multiple datasets. ProJIVE extends probabilistic Principal Component Analysis (pPCA) to handle two or more datasets, simultaneously estimating joint and individual components. The model assumes mutual orthogonality between joint and individual subject scores, which distinguishes it from existing methods like JIVE, R.JIVE, and AJIVE. Through simulation studies, ProJIVE demonstrated greater accuracy in estimating joint subject scores and variable loadings, particularly in mixed-dimension settings and when data did not strictly conform to Gaussian assumptions. The authors applied ProJIVE to Alzheimer's Disease Neuroimaging Initiative (ADNI) data, integrating brain morphometry (cortical thickness, surface area, volume) and cognitive measures from 587 participants. The analysis revealed that ProJIVE's joint subject scores were significantly associated with genetic risk factors (ApoE4), AD diagnosis, and expensive PET biomarkers (AV45 and FDG), indicating its utility in learning biologically meaningful sources of variation.

Key takeaway

For research scientists working with multi-modal biological or clinical data, ProJIVE offers a robust method to decompose complex datasets into shared and unique components. You should consider ProJIVE for its demonstrated accuracy in estimating joint subject scores and its ability to link these scores to critical biomarkers and diagnoses, even with non-Gaussian data. This can lead to more interpretable findings and potentially reduce reliance on more expensive, invasive diagnostic methods.

Key insights

ProJIVE is a probabilistic model extending pPCA for accurate multi-dataset integration, identifying joint and individual variations.

Principles

Method

ProJIVE employs an Expectation-Maximization (EM) algorithm to estimate variable loadings, noise variances, and subject scores, generalizing probabilistic PCA to multiple datasets with block-specific isotropic error assumptions.

In practice

Topics

Code references

Best for: Research Scientist, AI Researcher, AI Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.