Hellinger Multimodal Variational Autoencoders
Summary
Multimodal variational autoencoders (VAEs) are crucial for weakly supervised generative learning. This work introduces HELVAE, a novel multimodal VAE based on Hellinger aggregation, derived from Hölder pooling with α=0.5. HELVAE avoids sub-sampling during training, a common issue in existing models like MMVAE that limits generative quality. Empirically, HELVAE learns more expressive latent representations, shows improved performance across modalities, and achieves superior trade-offs between generative coherence and quality. It outperforms leading multimodal VAE models on benchmark datasets including PolyMNIST (five modalities), CUB Image-Captions, and bimodal CelebA, while also being computationally more efficient.
Key takeaway
For Machine Learning Engineers developing multimodal generative models, HELVAE offers a robust alternative to existing VAEs. Its Hellinger aggregation method, which avoids sub-sampling, provides superior generative coherence and quality, especially when dealing with multiple modalities. You should consider integrating HELVAE or its MoHELVAE variant to achieve better latent representations and more semantically consistent cross-modal generation, particularly in scenarios where balancing quality and coherence is critical.
Key insights
HELVAE uses Hellinger aggregation from Hölder pooling (α=0.5) to improve multimodal VAE coherence and quality without sub-sampling.
Principles
- Hölder pooling with α=0.5 induces soft dependencies between experts.
- Avoiding sub-sampling improves multimodal VAE generative quality.
- Probabilistic opinion pooling generalizes PoE and MoE.
Method
HELVAE aggregates unimodal Gaussian posteriors using Hellinger aggregation, a moment-matching approximation of Hölder pooling with α=0.5, projecting the pooled density onto a diagonal Gaussian. This avoids sub-sampling.
In practice
- Apply Hellinger aggregation for robust multimodal posterior approximation.
- Consider MoHELVAE for enhanced coherence with small modality counts.
Topics
- Multimodal VAEs
- Hellinger Aggregation
- Hölder Pooling
- Generative Models
- Latent Representations
- Probabilistic Opinion Pooling
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.