Diffusion Alignment Beyond KL: Variance Minimisation as Effective Policy Optimiser
Summary
A new framework called Variance Minimisation Policy Optimisation (VMPO) has been introduced to adapt pretrained diffusion models for sampling from reward-tilted distributions. This method reinterprets diffusion alignment as a Sequential Monte Carlo (SMC) process, where the denoising model serves as a proposal and reward guidance generates importance weights. VMPO focuses on minimizing the variance of log importance weights, departing from traditional Kullback-Leibler (KL) based objectives. The authors prove that this variance objective is minimized by the reward-tilted target distribution and that its gradient matches that of KL-based alignment under on-policy sampling. This approach unifies existing diffusion alignment techniques and suggests novel design pathways.
Key takeaway
For research scientists working on diffusion model alignment, VMPO offers a robust alternative to KL-based objectives. You should consider implementing variance minimization strategies, as this approach provides a unified theoretical framework and opens avenues for developing more effective reward-tilted sampling methods. This could lead to improved performance and broader applicability of your diffusion models.
Key insights
VMPO minimizes log importance weight variance for diffusion alignment, unifying existing methods and suggesting new designs.
Principles
- Diffusion alignment admits an SMC interpretation.
- Variance minimization aligns with reward-tilted distributions.
Method
VMPO formulates diffusion alignment by minimizing the variance of log importance weights, rather than directly optimizing a KL-based objective, leveraging an SMC interpretation.
In practice
- Recover existing diffusion alignment methods.
- Explore new design directions beyond KL.
Topics
- Diffusion Alignment
- Variance Minimisation
- Policy Optimization
- Diffusion Models
- Kullback-Leibler Divergence
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.