Dual-Constrained Diffusion Image Compression for Operational Rate-Distortion-Perception Optimization

2026-06-11 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

Dual-Constrained Diffusion Image Compression (DCIC) is a novel framework addressing the rate-distortion-perception (RDP) trade-off in neural image compression, balancing fidelity and perceptual realism. It integrates a learned codec with a diffusion-based decoder, guided by joint distortion and idempotence constraints. The distortion constraint limits reconstruction fidelity, while the idempotence constraint, a surrogate for distributional perception, ensures re-encoding recovers the base codec output. This approach uses iterative optimization with consistent noise injection to achieve common randomness without additional rate overhead. Dual attenuation factors $(K_D, K_P)$ enable continuous adjustment of fidelity-realism trade-offs from a single bitstream. DCIC$_{RDP}$ ($K_D = K_P=1$) achieves superior BD-PSNR compared to other perceptual codecs, and DCIC$_{RP}$ ($K_D{=}0$) matches dedicated perception-oriented methods in BD-FID, validated on CelebA-HQ, CLIC2020, and ImageNet-1K datasets.

Key takeaway

For Machine Learning Engineers optimizing image compression, DCIC offers a unified framework to precisely control the rate-distortion-perception trade-off. You can dynamically adjust fidelity and perceptual realism from a single bitstream using the $(K_D, K_P)$ attenuation factors. This allows you to tailor compression for specific use cases, achieving superior BD-PSNR for fidelity-critical tasks or matching BD-FID for perception-focused applications, without needing separate codecs.

Key insights

DCIC navigates the full RDP surface by integrating diffusion with dual distortion and idempotence constraints for flexible image compression.

Principles

RDP trade-off requires distributional constraints.
Idempotence can surrogate perceptual requirements.
Common randomness is key for RDP surface realization.

Method

DCIC integrates a learned codec with a diffusion-based decoder, steering reverse denoising via iterative optimization with consistent noise injection, guided by distortion and idempotence constraints.

In practice

Adjust fidelity-realism via $(K_D, K_P)$ factors.
Achieve superior BD-PSNR with DCIC$_{RDP}$.
Match BD-FID of perception-oriented methods.

Topics

Neural Image Compression
Diffusion Models
Rate-Distortion-Perception
Idempotence Constraint
Image Quality Metrics
Learned Codecs

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.