Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior
Summary
Generative models often suffer from mode collapse despite their high fidelity. A new method, Diversity-inducing Initialization (DivIn), addresses this by formulating the selection of initial noise from a guidance potential posterior, effectively re-weighting the prior towards diversity-rich regions. DivIn identifies that standard Gaussian initialization causes collapse because it is agnostic to the guidance potential landscape. To efficiently sample from this distribution, DivIn uses Langevin dynamics to navigate the initialization landscape, steering initial noise away from collapsing regions while anchoring to the valid data manifold. This inference-time diversity enhancement is compatible with both diffusion and flow matching models. Extensive experiments demonstrate DivIn's superior performance in class-to-image and text-to-image scenarios. Combining DivIn with trajectory-based methods, to which it is orthogonal, significantly expands the diversity-quality Pareto frontier.
Key takeaway
For machine learning engineers aiming to enhance diversity in generative models, you should consider implementing Diversity-inducing Initialization (DivIn). This method improves output diversity in class-to-image and text-to-image tasks by intelligently selecting initial noise. Integrating DivIn, which is compatible with diffusion and flow matching models, can significantly expand your diversity-quality Pareto frontier, especially when combined with existing trajectory-based techniques.
Key insights
Initializing generative models from a guidance potential posterior, rather than standard Gaussian noise, significantly enhances output diversity and quality.
Principles
- Mode collapse stems from initialization's ignorance of guidance potential.
- Re-weighting prior towards diversity-rich regions improves generation.
- Orthogonal diversity methods can be combined for greater gains.
Method
DivIn formulates initial noise selection from a guidance potential posterior. It uses Langevin dynamics to navigate the initialization landscape, avoiding collapse and anchoring to the data manifold.
In practice
- Apply DivIn at inference time for diffusion models.
- Use DivIn with flow matching models.
- Combine DivIn with trajectory-based diversity methods.
Topics
- Generative Models
- Mode Collapse
- Diversity-inducing Initialization
- Diffusion Models
- Flow Matching
- Langevin Dynamics
- Text-to-Image Generation
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.