Unsupervised feature selection using Bayesian Tucker decomposition
Summary
Researchers from Chuo University and Hitotsubashi University proposed Bayesian Tucker decomposition (BTuD) for unsupervised feature selection, building upon existing tensor decomposition (TD) methods. Unlike prior Bayesian TD approaches that assume Gaussian distribution for decomposed components, BTuD assumes residuals obey a Gaussian distribution, analogous to linear regression. The method was successfully applied to various synthetic datasets, global coupled maps with randomized coupling strength (RCS-GCM), and gene expression profiles (GEO ID GSE142068). For practical implementation, BTuD leverages Higher Order Orthogonal Iteration (HOOI) for convergence, particularly when the regularization parameter $\alpha=0$. The study demonstrated BTuD's ability to identify relevant features, such as sinusoidal functions in synthetic data and tissue-specific gene sets in biological data, without requiring pre-assigned labels or prior knowledge.
Key takeaway
For Data Scientists and Research Scientists working with high-dimensional, unlabeled tensor data, BTuD offers a powerful unsupervised feature selection tool. You should consider integrating BTuD into your analysis pipeline, especially when traditional methods struggle with complex data structures or lack prior labels. This approach can effectively uncover hidden patterns and relevant features in datasets like gene expression profiles or numerical model outputs, providing a robust alternative to supervised techniques.
Key insights
BTuD offers robust unsupervised feature selection by modeling residuals as Gaussian, enhancing traditional Tucker decomposition.
Principles
- Residuals, not components, should follow Gaussian distribution for effective feature selection.
- Alternating optimization can approximate complex Bayesian inference problems.
- Unsupervised methods can identify structured features even without explicit labels.
Method
BTuD decomposes Tucker decomposition into four linear regression subproblems, iteratively estimating singular value matrices and the core tensor. It uses Moore-Penrose pseudoinverse for regression and HOOI for practical convergence.
In practice
- Apply BTuD for feature selection in high-dimensional tensor data.
- Use HOOI for efficient BTuD computation when $\alpha=0$.
- Validate selected features using P-values corrected by BH criterion.
Topics
- Bayesian Tucker Decomposition
- Unsupervised Feature Selection
- Tensor Decomposition
- Higher-Order Orthogonal Iteration
- Gene Expression Analysis
Best for: AI Scientist, Research Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.