SpectralDiT: Timestep-Conditioned Spectral Residual Correction for Flow-Matching DiTs

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, medium

Summary

Jiayu Tian proposes SpectralDiT, a lightweight modification for flow-matching Diffusion Transformers (DiTs) that integrates timestep-conditioned spectral correction into the MLP residual branch. This module operates by decomposing each residual update into low- and high-frequency components on the patch-token grid, then learns a zero-initialized additive gate to initially match the baseline DiT performance. On CIFAR-10 pixel-space generation, SpectralDiT improved the FID score from 20.78 to 19.71 at patch size 1 and reduced the radial Fourier spectrum gap. When scaled to latent diffusion on ImageNet-100, the method achieved an 8.7% relative FID reduction under classifier-free guidance (CFG 2.0), with only 0.6% additional theoretical FLOPs and 1.36% additional parameters. Ablation studies on CIFAR-10 confirmed stable block-specific spectral correction patterns.

Key takeaway

For Machine Learning Engineers optimizing Diffusion Transformers for image generation, consider integrating SpectralDiT's timestep-conditioned spectral correction. This lightweight modification improves FID scores, such as an 8.7% relative reduction on ImageNet-100, with minimal overhead (0.6% FLOPs, 1.36% parameters). You can achieve better generation quality and reduced spectral distortion without significant computational cost, making it a practical enhancement for your flow-matching DiT implementations.

Key insights

SpectralDiT enhances Diffusion Transformers by applying timestep-conditioned spectral correction to residual updates, improving generation quality with minimal overhead.

Principles

Method

SpectralDiT modifies flow-matching DiTs by adding a timestep-conditioned spectral correction module to the MLP residual branch, decomposing updates into frequency components and learning a zero-initialized additive gate.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.