A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

· Source: stat.ML updates on arXiv.org · Field: Science & Research — Health & Medical Research, Mathematics & Computational Sciences, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A Bayesian Boolean Matrix Factorization (BBMF) model is introduced for analyzing binary data, particularly in cancer genomics. This model addresses limitations of existing heuristic Boolean Matrix Factorization (BooMF) methods by incorporating a generative process, sparsity-inducing priors, and uncertainty quantification. Key features include an asymmetric two-parameter noise model, hierarchical Beta priors with a continuous spike-and-slab prior for factor-activation probabilities, and an aggregate factor-alignment similarity (AFAS) diagnostic. In simulation experiments with $K=70$ patients, $G=44$ chromosomal arms, and $R=4$ latent factors, BBMF demonstrated superior recovery of true latent factors and achieved higher F1 and MCC scores with lower reconstruction error rates compared to Asso and GreConD+. Applied to real multiple myeloma copy-number alteration (CNA) data from $62$ patient-samples and $44$ chromosomal arms, BBMF successfully identified interpretable bicliques that align with known hyperdiploid signatures, providing a biologically meaningful summary of tumor heterogeneity.

Key takeaway

For AI Scientists and Machine Learning Engineers working with binary genomic data, you should consider Bayesian Boolean Matrix Factorization (BBMF) to uncover interpretable latent structures. This method offers superior factor recovery and reconstruction accuracy compared to heuristic alternatives, especially for complex patterns like chromosomal alterations in cancer. You can use its uncertainty quantification and sparsity-inducing priors to gain robust, biologically meaningful insights into disease evolution.

Key insights

Bayesian Boolean Matrix Factorization (BBMF) improves binary data decomposition by quantifying uncertainty and reliably recovering latent factors.

Principles

Method

BBMF uses Gibbs sampling with closed-form full conditionals, applying a generative process with sparsity-inducing priors and an asymmetric two-parameter noise model for posterior inference.

In practice

Topics

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.