SpectralDiT: Timestep-Conditioned Spectral Residual Correction for Flow-Matching DiTs
Summary
SpectralDiT is a novel, lightweight modification for flow-matching Diffusion Transformers (DiTs) that integrates timestep-conditioned spectral correction into the MLP residual branch. This module decomposes each residual update into low- and high-frequency components on the patch-token grid, employing a zero-initialized additive gate to initially match baseline DiT performance. Evaluated on CIFAR-10 pixel-space generation, SpectralDiT improved the FID score from 20.78 to 19.71 at patch size 1, concurrently reducing the radial Fourier spectrum gap. When scaled to latent diffusion on ImageNet-100, the method, with only 0.6% additional theoretical FLOPs and 1.36% more parameters, achieved an 8.7% relative FID reduction under classifier-free guidance (CFG 2.0). All reported results are averaged over five seeds, with ablations confirming stable block-specific spectral correction patterns.
Key takeaway
For Machine Learning Engineers developing Diffusion Transformers, SpectralDiT offers a low-cost path to enhance generative quality. If you are aiming to reduce FID scores in flow-matching DiTs, consider integrating timestep-conditioned spectral correction into your MLP residual branches. This approach, with minimal parameter and FLOPs increase, can yield significant FID reductions, as demonstrated by an 8.7% relative improvement on ImageNet-100. You should evaluate its impact on your specific datasets.
Key insights
SpectralDiT enhances Diffusion Transformers by applying timestep-conditioned spectral correction to residual updates, improving FID with minimal overhead.
Principles
- Spectral correction improves DiT performance.
- Low-overhead modifications can yield significant gains.
- Timestep conditioning refines spectral adjustments.
Method
SpectralDiT decomposes residual updates into low- and high-frequency components on a patch-token grid, then learns a zero-initialized additive gate for timestep-conditioned spectral correction.
In practice
- Apply spectral correction to DiT residual branches.
- Use zero-initialized gates for additive modifications.
- Evaluate FID and Fourier spectrum gap.
Topics
- Diffusion Transformers
- Flow Matching
- Spectral Correction
- Generative Models
- ImageNet-100
- FID Score
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.