SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation

2026-05-28 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Score Gradient Matching Distillation (SGMD) is a novel approach designed to accelerate inference in few-step video diffusion models, addressing limitations of the widely used Distribution Matching Distillation (DMD) paradigm. DMD-style methods face challenges with costly training due to continuously evolving generators and conservative reverse-KL matching that can hinder strong motion dynamics. SGMD tackles these issues by directly optimizing the fake score towards the teacher, employing a teacher stop-gradient Fisher as a stable distribution-matching objective. This method incorporates dual potentials: negative-residual (NR) for outer-loop correction and residual-contraction (RC) for inner-loop tracking. Benchmarking against DMD2, SGMD demonstrates an approximate ~3x training speedup and significantly enhances motion dynamics in 4-step distilled models, while preserving temporal consistency. Human evaluations further indicate a preference for SGMD's motion quality and overall performance, with visual quality and text alignment remaining comparable.

Key takeaway

For Machine Learning Engineers optimizing video diffusion models for faster inference, SGMD offers a compelling alternative to DMD-style methods. If you are struggling with costly training or conservative motion dynamics in few-step distillation, consider implementing SGMD. This approach can provide an approximate ~3x training speedup and substantially improve motion quality in your 4-step distilled models, while maintaining visual quality and text alignment. Explore the provided code to integrate this method into your workflows.

Key insights

SGMD directly optimizes fake scores with stable teacher stop-gradient Fisher for faster, better video distillation.

Principles

Direct fake score optimization accelerates training.
Stable distribution matching improves motion dynamics.
Dual potentials refine inner and outer loop tracking.

Method

SGMD optimizes the fake score towards a teacher using teacher stop-gradient Fisher for stable distribution matching. It employs negative-residual (NR) for outer-loop correction and residual-contraction (RC) for inner-loop tracking.

In practice

Achieve ~3x training speedup for video models.
Improve motion dynamics in 4-step distilled models.
Preserve temporal consistency in distilled videos.

Topics

Video Diffusion Models
Model Distillation
Score Gradient Matching
Few-Step Inference
Motion Dynamics
Temporal Consistency

Code references

ModelTC/LightX2V

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.