DASH: Dual-Branch Score Distillation for Guidance-Calibrated Compact Diffusion Models
Summary
DASH is a novel dual-branch distillation framework designed to address limitations in compressing class-conditional diffusion models. Existing methods often neglect the unconditional score branch, causing an underdetermined classifier-free guidance gap that can lead to ineffective guidance and degenerate predictions in the student model. DASH resolves this by independently supervising both score branches, employing distinct branch constraints for each training sample and an anchor term to regularize conditional predictions towards ground-truth noise. The framework also incorporates TIRT Transfer, which copies the teacher's per-timestep importance curriculum to the student, eliminating the need for relearning. Experiments on CIFAR-10 and CIFAR-100 datasets demonstrate that DASH achieves 5.9x model compression while preserving quality within 4 FID points of the teacher using 50-step DDIM sampling, significantly outperforming models trained from scratch. Ablation studies confirm unconditional supervision contributes over 60% of the total distillation gain, with curriculum transfer and anchor regularization offering complementary benefits.
Key takeaway
For Machine Learning Engineers tasked with compressing class-conditional diffusion models, you should consider DASH's dual-branch distillation framework. Current compression techniques often compromise classifier-free guidance by neglecting unconditional score branch supervision, leading to ineffective model behavior. By implementing DASH's independent supervision for both score branches and its TIRT Transfer mechanism, you can achieve significant model compression, such as 5.9x, while preserving guidance fidelity and maintaining quality within 4 FID points, outperforming training from scratch.
Key insights
Unsupervised unconditional score branches in diffusion model distillation cause guidance failure; dual-branch supervision is essential for effective compression.
Principles
- Supervise both score branches.
- Transfer teacher's curriculum.
- Regularize conditional predictions.
Method
DASH independently supervises both score branches using distinct constraints and an anchor term, while also transferring the teacher's per-timestep importance curriculum.
In practice
- Implement dual-branch supervision for compact diffusion models.
- Adopt teacher's importance curriculum for student training.
- Apply anchor regularization to preserve guidance fidelity.
Topics
- Diffusion Models
- Model Compression
- Score Distillation
- Classifier-Free Guidance
- Dual-Branch Supervision
- TIRT Transfer
Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.