Dual-Latent Collaborative Decoding for Fidelity-Perception Balanced Image Compression
Summary
Researchers Qi Mao, Zijian Wang, Zhengxue Cheng, Lingyu Zhu, and Siwei Ma introduce Mixture of Decoder Experts (MoDE), a novel dual-latent collaborative decoding framework designed to achieve a favorable fidelity–perception balance in learned image compression (LIC). Existing LIC methods often struggle to simultaneously maintain structural fidelity and perceptual realism, especially across varying bitrates, because they rely on a single latent representation. MoDE addresses this by treating scalar-quantized (SQ) continuous latents as a fidelity-oriented expert and vector-quantized (VQ) discrete tokens as a perception-oriented expert. The framework coordinates these two frozen decoders via two learned decoder-side modules: Expert-Specific Enhancement (ESE) for preserving branch-specific references and Cross-Expert Modulation (CEM) for selective complementary transfer. MoDE supports both fidelity-anchored (MoDE-F) and perception-anchored (MoDE-P) decoding under a shared dual-stream bitstream, demonstrating superior performance against various baselines on datasets like Kodak, CLIC2020, and Tecnick.
Key takeaway
For research scientists developing advanced image compression techniques, MoDE offers a robust framework to overcome the fidelity–perception trade-off. By explicitly separating fidelity and perception responsibilities into distinct decoder experts and coordinating them through ESE and CEM, you can achieve superior image quality across a wide range of bitrates. Consider adopting this dual-latent, decoder-side collaboration approach to enhance both structural accuracy and perceptual realism in your next-generation codecs, particularly when dealing with diverse bitrate requirements.
Key insights
Decomposing image compression into fidelity and perception experts via dual-latent decoding balances conflicting reconstruction goals.
Principles
- Single latent representations overload in image compression.
- SQ and VQ latents offer complementary reconstruction strengths.
- Decoder-side collaboration preserves expert specialization.
Method
MoDE coordinates frozen SQ (fidelity) and VQ (perception) decoders using Expert-Specific Enhancement (ESE) to maintain branch references and Cross-Expert Modulation (CEM) for gated, selective cross-expert information transfer during reconstruction.
In practice
- Use SQ for scalable structural fidelity.
- Employ VQ for compact semantic and perceptual cues.
- Implement gated residual modulation for controlled feature transfer.
Topics
- Learned Image Compression
- Fidelity-Perception Balance
- Dual-Latent Decoding
- Mixture of Decoder Experts
- Scalar Quantization
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.