SpectralDiT: Timestep-Conditioned Spectral Residual Correction for Flow-Matching DiTs
Summary
Jiayu Tian proposes SpectralDiT, a lightweight modification for flow-matching Diffusion Transformers (DiTs) that integrates timestep-conditioned spectral correction into the MLP residual branch. This module operates by decomposing each residual update into low- and high-frequency components on the patch-token grid, then learns a zero-initialized additive gate to initially match the baseline DiT performance. On CIFAR-10 pixel-space generation, SpectralDiT improved the FID score from 20.78 to 19.71 at patch size 1 and reduced the radial Fourier spectrum gap. When scaled to latent diffusion on ImageNet-100, the method achieved an 8.7% relative FID reduction under classifier-free guidance (CFG 2.0), with only 0.6% additional theoretical FLOPs and 1.36% additional parameters. Ablation studies on CIFAR-10 confirmed stable block-specific spectral correction patterns.
Key takeaway
For Machine Learning Engineers optimizing Diffusion Transformers for image generation, consider integrating SpectralDiT's timestep-conditioned spectral correction. This lightweight modification improves FID scores, such as an 8.7% relative reduction on ImageNet-100, with minimal overhead (0.6% FLOPs, 1.36% parameters). You can achieve better generation quality and reduced spectral distortion without significant computational cost, making it a practical enhancement for your flow-matching DiT implementations.
Key insights
SpectralDiT enhances Diffusion Transformers by applying timestep-conditioned spectral correction to residual updates, improving generation quality with minimal overhead.
Principles
- Decomposing residual updates into spectral components can improve DiT performance.
- Timestep-conditioned gates allow adaptive spectral correction.
- Zero-initialization ensures baseline performance matching.
Method
SpectralDiT modifies flow-matching DiTs by adding a timestep-conditioned spectral correction module to the MLP residual branch, decomposing updates into frequency components and learning a zero-initialized additive gate.
In practice
- Apply spectral correction to DiT residual branches.
- Use zero-initialized gates for stable integration.
- Evaluate FID and Fourier spectrum gap.
Topics
- SpectralDiT
- Diffusion Transformers
- Flow Matching
- Spectral Correction
- Image Generation
- FID Score
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.