Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion
Summary
"Learning When to Denoise" introduces a novel method for optimizing asynchronous schedules in multi-representation latent diffusion models, which are crucial for visual synthesis performance. The proposed approach learns the denoising schedule by formulating asynchronous flow matching across multiple representation spaces, employing a schedule-corrected objective that maintains fixed local noising-time weights. This schedule is implemented using a flexible, parametric class that is convex and monotone, and is learned efficiently with less than 1% additional training compute. Benchmarking on ImageNet 256x256, a 675M-parameter XL backbone achieved significant improvements. With AutoGuidance, the 200-epoch model reached FID 1.05, matching the 800-epoch SFD-XL baseline with 4x less training, and further improved to FID 1.02 at 600 epochs, surpassing the 1B-parameter SFD-XXL's FID 1.04. In unguided settings, the 200-epoch model achieved FID 2.37, outperforming the 800-epoch SFD-XL's 2.54.
Key takeaway
For Machine Learning Engineers developing multi-representation diffusion models, optimizing asynchronous denoising schedules is crucial. Your models can achieve significantly better FID scores and require up to 4x less training by learning these schedules. Consider implementing a learned, parametric schedule to reduce computational costs and accelerate model development, potentially outperforming larger baselines with smaller models.
Key insights
Learning denoising schedules in multi-representation diffusion models significantly boosts performance and training efficiency.
Principles
- Asynchronous schedules are critical for multi-representation diffusion.
- Learning schedules can reduce training epochs by 4x.
- Convex, monotone parametric schedules are effective.
Method
The method formulates asynchronous flow matching over multiple representation spaces, using a schedule-corrected objective. It learns a flexible, parametric, convex, and monotone schedule with less than 1% additional training compute.
In practice
- Apply learned schedules to multi-representation diffusion.
- Optimize denoising timing for improved FID scores.
- Reduce training epochs for large diffusion models.
Topics
- Latent Diffusion Models
- Asynchronous Schedules
- Image Synthesis
- Denoising Optimization
- Multi-representation Models
- FID Score
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.