Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

"Learning When to Denoise" introduces a novel method for optimizing asynchronous schedules in multi-representation latent diffusion models, which are crucial for visual synthesis performance. The proposed approach learns the denoising schedule by formulating asynchronous flow matching across multiple representation spaces, employing a schedule-corrected objective that maintains fixed local noising-time weights. This schedule is implemented using a flexible, parametric class that is convex and monotone, and is learned efficiently with less than 1% additional training compute. Benchmarking on ImageNet 256x256, a 675M-parameter XL backbone achieved significant improvements. With AutoGuidance, the 200-epoch model reached FID 1.05, matching the 800-epoch SFD-XL baseline with 4x less training, and further improved to FID 1.02 at 600 epochs, surpassing the 1B-parameter SFD-XXL's FID 1.04. In unguided settings, the 200-epoch model achieved FID 2.37, outperforming the 800-epoch SFD-XL's 2.54.

Key takeaway

For Machine Learning Engineers developing multi-representation diffusion models, optimizing asynchronous denoising schedules is crucial. Your models can achieve significantly better FID scores and require up to 4x less training by learning these schedules. Consider implementing a learned, parametric schedule to reduce computational costs and accelerate model development, potentially outperforming larger baselines with smaller models.

Key insights

Learning denoising schedules in multi-representation diffusion models significantly boosts performance and training efficiency.

Principles

Method

The method formulates asynchronous flow matching over multiple representation spaces, using a schedule-corrected objective. It learns a flexible, parametric, convex, and monotone schedule with less than 1% additional training compute.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.