Unsupervised feature selection using Bayesian Tucker decomposition

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

Researchers from Chuo University and Hitotsubashi University proposed Bayesian Tucker decomposition (BTuD) for unsupervised feature selection, building upon existing tensor decomposition (TD) methods. Unlike prior Bayesian TD approaches that assume Gaussian distribution for decomposed components, BTuD assumes residuals obey a Gaussian distribution, analogous to linear regression. The method was successfully applied to various synthetic datasets, global coupled maps with randomized coupling strength (RCS-GCM), and gene expression profiles (GEO ID GSE142068). For practical implementation, BTuD leverages Higher Order Orthogonal Iteration (HOOI) for convergence, particularly when the regularization parameter $\alpha=0$. The study demonstrated BTuD's ability to identify relevant features, such as sinusoidal functions in synthetic data and tissue-specific gene sets in biological data, without requiring pre-assigned labels or prior knowledge.

Key takeaway

For Data Scientists and Research Scientists working with high-dimensional, unlabeled tensor data, BTuD offers a powerful unsupervised feature selection tool. You should consider integrating BTuD into your analysis pipeline, especially when traditional methods struggle with complex data structures or lack prior labels. This approach can effectively uncover hidden patterns and relevant features in datasets like gene expression profiles or numerical model outputs, providing a robust alternative to supervised techniques.

Key insights

BTuD offers robust unsupervised feature selection by modeling residuals as Gaussian, enhancing traditional Tucker decomposition.

Principles

Method

BTuD decomposes Tucker decomposition into four linear regression subproblems, iteratively estimating singular value matrices and the core tensor. It uses Moore-Penrose pseudoinverse for regression and HOOI for practical convergence.

In practice

Topics

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.