A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

2026-06-17 · Source: stat.ML updates on arXiv.org · Field: Science & Research — Health & Medical Research, Mathematics & Computational Sciences, Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A Bayesian Boolean Matrix Factorization (BBMF) model is introduced for analyzing binary data, particularly in cancer genomics. This model addresses limitations of existing heuristic Boolean Matrix Factorization (BooMF) methods by incorporating a generative process, sparsity-inducing priors, and uncertainty quantification. Key features include an asymmetric two-parameter noise model, hierarchical Beta priors with a continuous spike-and-slab prior for factor-activation probabilities, and an aggregate factor-alignment similarity (AFAS) diagnostic. In simulation experiments with $K=70$ patients, $G=44$ chromosomal arms, and $R=4$ latent factors, BBMF demonstrated superior recovery of true latent factors and achieved higher F1 and MCC scores with lower reconstruction error rates compared to Asso and GreConD+. Applied to real multiple myeloma copy-number alteration (CNA) data from $62$ patient-samples and $44$ chromosomal arms, BBMF successfully identified interpretable bicliques that align with known hyperdiploid signatures, providing a biologically meaningful summary of tumor heterogeneity.

Key takeaway

For AI Scientists and Machine Learning Engineers working with binary genomic data, you should consider Bayesian Boolean Matrix Factorization (BBMF) to uncover interpretable latent structures. This method offers superior factor recovery and reconstruction accuracy compared to heuristic alternatives, especially for complex patterns like chromosomal alterations in cancer. You can use its uncertainty quantification and sparsity-inducing priors to gain robust, biologically meaningful insights into disease evolution.

Key insights

Bayesian Boolean Matrix Factorization (BBMF) improves binary data decomposition by quantifying uncertainty and reliably recovering latent factors.

Principles

Boolean factorization reveals coordinated feature changes in binary data.
Asymmetric noise models enhance accuracy for genomic measurements.
Hierarchical priors can dynamically prune effective factorization rank.

Method

BBMF uses Gibbs sampling with closed-form full conditionals, applying a generative process with sparsity-inducing priors and an asymmetric two-parameter noise model for posterior inference.

In practice

Apply BBMF to identify discrete latent patterns in binary genomic data.
Use AFAS to monitor factor recovery stability in MCMC chains.
Consider BBMF for cancer genomics to reveal tumor evolution drivers.

Topics

Bayesian Boolean Matrix Factorization
Copy Number Alterations
Cancer Genomics
Multiple Myeloma
Latent Factor Models
Gibbs Sampling

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.