Dual-Latent Collaborative Decoding for Fidelity-Perception Balanced Image Compression

2026-05-15 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

Researchers Qi Mao, Zijian Wang, Zhengxue Cheng, Lingyu Zhu, and Siwei Ma introduce Mixture of Decoder Experts (MoDE), a novel dual-latent collaborative decoding framework designed to achieve a favorable fidelity–perception balance in learned image compression (LIC). Existing LIC methods often struggle to simultaneously maintain structural fidelity and perceptual realism, especially across varying bitrates, because they rely on a single latent representation. MoDE addresses this by treating scalar-quantized (SQ) continuous latents as a fidelity-oriented expert and vector-quantized (VQ) discrete tokens as a perception-oriented expert. The framework coordinates these two frozen decoders via two learned decoder-side modules: Expert-Specific Enhancement (ESE) for preserving branch-specific references and Cross-Expert Modulation (CEM) for selective complementary transfer. MoDE supports both fidelity-anchored (MoDE-F) and perception-anchored (MoDE-P) decoding under a shared dual-stream bitstream, demonstrating superior performance against various baselines on datasets like Kodak, CLIC2020, and Tecnick.

Key takeaway

For research scientists developing advanced image compression techniques, MoDE offers a robust framework to overcome the fidelity–perception trade-off. By explicitly separating fidelity and perception responsibilities into distinct decoder experts and coordinating them through ESE and CEM, you can achieve superior image quality across a wide range of bitrates. Consider adopting this dual-latent, decoder-side collaboration approach to enhance both structural accuracy and perceptual realism in your next-generation codecs, particularly when dealing with diverse bitrate requirements.

Key insights

Decomposing image compression into fidelity and perception experts via dual-latent decoding balances conflicting reconstruction goals.

Principles

Single latent representations overload in image compression.
SQ and VQ latents offer complementary reconstruction strengths.
Decoder-side collaboration preserves expert specialization.

Method

MoDE coordinates frozen SQ (fidelity) and VQ (perception) decoders using Expert-Specific Enhancement (ESE) to maintain branch references and Cross-Expert Modulation (CEM) for gated, selective cross-expert information transfer during reconstruction.

In practice

Use SQ for scalable structural fidelity.
Employ VQ for compact semantic and perceptual cues.
Implement gated residual modulation for controlled feature transfer.

Topics

Learned Image Compression
Fidelity-Perception Balance
Dual-Latent Decoding
Mixture of Decoder Experts
Scalar Quantization

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.