Don't Settle at the Mode! Mitigating Diversity Collapse in Pretrained Flow Models via Feature Self-Guidance

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

A new research introduces an efficient, training-free self-guidance mechanism designed to mitigate diversity collapse in pretrained flow models. While advanced flow models generate high-quality images, they often produce similar samples under identical conditioning, a problem existing methods like latent guidance or sample selection struggle to fully address without significant inference overhead. This novel approach, termed feature self-guidance, disperses the internal features of the flow model during batch generation. It further incorporates a manifold regularization step, projecting these dispersed features back onto the data manifold to ensure diverse generation without compromising alignment with input conditions. The method integrates as a plug-and-play module, adding only marginal inference cost, and demonstrates significant improvements in diversity and fidelity across various conditional flow models, including text-to-image, depth-to-image, and reference image generation.

Key takeaway

For Computer Vision Engineers developing generative AI applications, if you are struggling with diversity collapse in your pretrained flow models, consider integrating feature self-guidance. This training-free, plug-and-play mechanism efficiently disperses internal features and regularizes them to the data manifold, significantly improving output diversity without sacrificing fidelity or incurring substantial inference overhead. You can apply this to enhance multi-step and few-step text-to-image, depth-to-image, and reference image generation.

Key insights

Mitigating diversity collapse in flow models through internal feature self-guidance and manifold regularization.

Principles

Method

Disperse internal features of a flow model during batch generation via feature self-guidance, then project these features back onto the data manifold using regularization.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.