Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

Generative models often suffer from mode collapse despite their high fidelity. A new method, Diversity-inducing Initialization (DivIn), addresses this by formulating the selection of initial noise from a guidance potential posterior, effectively re-weighting the prior towards diversity-rich regions. DivIn identifies that standard Gaussian initialization causes collapse because it is agnostic to the guidance potential landscape. To efficiently sample from this distribution, DivIn uses Langevin dynamics to navigate the initialization landscape, steering initial noise away from collapsing regions while anchoring to the valid data manifold. This inference-time diversity enhancement is compatible with both diffusion and flow matching models. Extensive experiments demonstrate DivIn's superior performance in class-to-image and text-to-image scenarios. Combining DivIn with trajectory-based methods, to which it is orthogonal, significantly expands the diversity-quality Pareto frontier.

Key takeaway

For machine learning engineers aiming to enhance diversity in generative models, you should consider implementing Diversity-inducing Initialization (DivIn). This method improves output diversity in class-to-image and text-to-image tasks by intelligently selecting initial noise. Integrating DivIn, which is compatible with diffusion and flow matching models, can significantly expand your diversity-quality Pareto frontier, especially when combined with existing trajectory-based techniques.

Key insights

Initializing generative models from a guidance potential posterior, rather than standard Gaussian noise, significantly enhances output diversity and quality.

Principles

Method

DivIn formulates initial noise selection from a guidance potential posterior. It uses Langevin dynamics to navigate the initialization landscape, avoiding collapse and anchoring to the data manifold.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.