teasr: training-efficient any-step diffusion transformer for real-world image super-resolution

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

TEASR, a new training-efficient any-step diffusion transformer, addresses the slow iterative sampling of diffusion models in Real-World Image Super-Resolution (Real-ISR). Unlike existing one-step distillation methods that require auxiliary teacher models, TEASR employs self-adversarial distillation within a single diffusion model, eliminating the need for extra teachers or discriminators. This approach, combined with a timestep-aware rectification strategy, stabilizes one-step generation across noise levels and significantly improves training efficiency, enabling the distillation of 20B-parameter diffusion models on a single GPU. Furthermore, TEASR introduces a dual-branch diffusion transformer with a decoupled timestep condition to enhance sampling quality by separating the current noise state and the denoising target. Experiments show TEASR supports seamless any-step sampling and outperforms state-of-the-art methods across multiple datasets.

Key takeaway

For Machine Learning Engineers developing Real-ISR solutions, TEASR offers a compelling alternative to traditional diffusion models. If you are constrained by GPU memory or need flexible inference speeds, consider adopting TEASR's self-adversarial distillation to train large models efficiently on a single GPU. This approach allows you to achieve superior image quality with adaptable sampling steps, optimizing both resource use and output fidelity for your applications.

Key insights

TEASR uses self-adversarial distillation within a single diffusion model for efficient, flexible Real-ISR, eliminating auxiliary teachers.

Principles

Method

TEASR performs self-adversarial distillation within a single diffusion model, employing a timestep-aware rectification strategy to stabilize one-step generation across noise levels.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.