FVD: Inference-Time Alignment of Diffusion Models via Fleming-Viot Resampling

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

Fleming-Viot Diffusion (FVD) is a novel inference-time alignment method for diffusion models that addresses diversity collapse in Sequential Monte Carlo (SMC)-based samplers. Unlike traditional multinomial resampling, FVD employs a specialized Fleming-Viot birth-death mechanism, integrating independent reward-based survival decisions with stochastic rebirth noise. This approach preserves broader trajectory support and effectively explores reward-tilted distributions without requiring value function approximation or costly rollouts. FVD is fully parallelizable and scales efficiently with inference compute. Empirically, it achieves a 7% improvement in ImageReward on DrawBench, enhances FID by 14–20% on class-conditional tasks compared to strong baselines, and is up to 66x faster than value-based methods like DTS. The method also includes an adaptive control mechanism for alignment strength using a Robbins-Monro update.

Key takeaway

For research scientists and computer vision engineers developing or deploying reward-aligned diffusion models, FVD offers a significant advancement. Its Fleming-Viot resampling and adaptive control mechanism mitigate diversity collapse and over-optimization, leading to higher quality and more diverse samples without expensive fine-tuning or value function learning. You should consider integrating FVD to achieve superior performance and efficiency in tasks like class-conditional posterior sampling and text-to-image generation, especially when balancing reward maximization with sample diversity is critical.

Key insights

FVD uses a Fleming-Viot birth-death mechanism to prevent diversity collapse in diffusion model inference, improving sample quality and efficiency.

Principles

Method

FVD replaces multinomial resampling with a Fleming-Viot birth-death mechanism, where particles survive based on reward-based probabilities and dead particles are stochastically reborn from survivors. An adaptive Robbins-Monro controller adjusts alignment strength based on particle absorption rate.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.