Covariate-dependent Hierarchical Dirichlet Processes

· Source: JMLR · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Computational Biology · Depth: Expert, quick

Summary

A new hierarchical Bayesian approach, Covariate-dependent Hierarchical Dirichlet Processes (CD-HDP), is proposed for density estimation and cluster identification across related groups. This method integrates covariate information and combines hierarchical Dirichlet processes with dependent Dirichlet processes, offering flexibility for multiple and mixed covariate types via kernel functions and various output types through component-specific likelihoods. The CD-HDP model enhances the ability to discern relationships between covariates and clusters, effectively borrowing information and quantifying group differences. Posterior inference is performed using a Markov chain Monte Carlo algorithm, facilitated by a data augmentation trick to handle intractable normalized weights. The model's efficacy is demonstrated on simulated data, single-cell RNA sequencing (scRNA-seq) data, and calcium imaging data, revealing additional cell subgroups and interpretable neural activity clusters, respectively.

Key takeaway

For research scientists working with complex biological data like scRNA-seq or calcium imaging, you should consider applying Covariate-dependent Hierarchical Dirichlet Processes (CD-HDP). This method can reveal more nuanced subgroups and interpretable clusters by incorporating covariate information, potentially leading to deeper biological insights than traditional hierarchical models. Evaluate its performance against existing methods for improved density estimation and cluster identification.

Key insights

CD-HDP integrates covariates into hierarchical Bayesian nonparametrics for flexible density estimation and cluster identification.

Principles

Method

The CD-HDP model uses a data augmentation trick to handle intractable normalized weights, enabling posterior inference via a Markov chain Monte Carlo algorithm for density estimation and cluster identification.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by JMLR.