Dual-Constrained Diffusion Image Compression for Operational Rate-Distortion-Perception Optimization

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

Dual-Constrained Diffusion Image Compression (DCIC) is a novel framework addressing the rate-distortion-perception (RDP) trade-off in neural image compression, balancing fidelity and perceptual realism. It integrates a learned codec with a diffusion-based decoder, guided by joint distortion and idempotence constraints. The distortion constraint limits reconstruction fidelity, while the idempotence constraint, a surrogate for distributional perception, ensures re-encoding recovers the base codec output. This approach uses iterative optimization with consistent noise injection to achieve common randomness without additional rate overhead. Dual attenuation factors $(K_D, K_P)$ enable continuous adjustment of fidelity-realism trade-offs from a single bitstream. DCIC$_{RDP}$ ($K_D = K_P=1$) achieves superior BD-PSNR compared to other perceptual codecs, and DCIC$_{RP}$ ($K_D{=}0$) matches dedicated perception-oriented methods in BD-FID, validated on CelebA-HQ, CLIC2020, and ImageNet-1K datasets.

Key takeaway

For Machine Learning Engineers optimizing image compression, DCIC offers a unified framework to precisely control the rate-distortion-perception trade-off. You can dynamically adjust fidelity and perceptual realism from a single bitstream using the $(K_D, K_P)$ attenuation factors. This allows you to tailor compression for specific use cases, achieving superior BD-PSNR for fidelity-critical tasks or matching BD-FID for perception-focused applications, without needing separate codecs.

Key insights

DCIC navigates the full RDP surface by integrating diffusion with dual distortion and idempotence constraints for flexible image compression.

Principles

Method

DCIC integrates a learned codec with a diffusion-based decoder, steering reverse denoising via iterative optimization with consistent noise injection, guided by distortion and idempotence constraints.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.