Adapting Diffusion Language Models for Lossless Pixel-Level Image Transmission

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, Communication Systems · Depth: Expert, extended

Summary

The paper introduces DDM-SSCC, a discrete-diffusion-model-based separate source-channel coding framework designed for lossless pixel-level image transmission. This approach addresses the limitations of semantic communications, which often incur reconstruction errors, by aiming for exact image recovery crucial in fidelity-sensitive applications like remote medical imaging. DDM-SSCC adapts a diffusion language model for pixel-token restoration, employing synchronized reverse arithmetic coding with bidirectional attention. Key innovations include a Halton-guided denoising order for improved spatial coverage, a mask-ratio-aware cosine schedule to adapt the denoising pace, and a lightweight temperature calibration module for probability tables. Experiments on CIFAR10, DIV2K-LR-X4, and Kodak datasets under AWGN and Rayleigh fading channels show DDM-SSCC achieves superior exact-recovery performance, reaching perfect reconstruction above 2 dB unified SNR, outperforming representative lossless and semantic communication baselines.

Key takeaway

For Machine Learning Engineers building robust image transmission systems, DDM-SSCC enables pixel-level lossless recovery, vital for fields like remote medical imaging. You should integrate discrete diffusion models with synchronized arithmetic coding. Leverage Halton-guided denoising and mask-ratio-aware schedules to boost reliability and compression in noisy channels. This approach consistently outperforms other baselines for exact image reconstruction.

Key insights

DDM-SSCC enables lossless pixel-level image transmission by adapting discrete diffusion models for synchronized arithmetic coding.

Principles

Method

DDM-SSCC uses a synchronized discrete-diffusion source coding protocol. It starts from a fully masked sequence, progressively restoring tokens via a Halton-guided denoising order, a cosine schedule for token processing, and mask-ratio-aware temperature calibration for probability tables.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.