Learned JPEG Compression for DNN Vision

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

Learned JPEG Compression for DNN Vision (J4D) is a novel training framework designed to optimize JPEG encoding parameters specifically for deep neural network (DNN) inference performance, rather than human perception. Recognizing that a significant portion of image data is consumed by AI, J4D addresses the challenge of representing the JPEG codec and compression rate in a differentiable, closed-form manner. It achieves this by integrating a differentiable soft quantizer based on a probabilistic quantization scheme, enabling both a differentiable JPEG proxy and analytical computation of the coded source's entropy, which closely estimates the actual compression rate. This setup allows the optimization problem to be solved using backpropagation. Experimental results demonstrate J4D's superior performance, achieving an accuracy increase of up to 11.60% at the same compression rate, or a compression rate reduction of up to 80.05% at the same accuracy, compared to default JPEG. J4D also shows promise for designing universal JPEG encoding parameters for various DNN architectures.

Key takeaway

For Machine Learning Engineers optimizing image data pipelines for deep neural networks, J4D offers a significant advancement over traditional JPEG. You should consider integrating this learned compression framework. It can achieve up to 11.60% higher model accuracy at existing compression rates. Alternatively, you can reduce your data storage and bandwidth needs by up to 80.05% without sacrificing performance. This approach also opens avenues for developing universal compression settings across diverse DNN architectures.

Key insights

J4D optimizes JPEG for DNNs using a differentiable codec and rate estimator, significantly boosting accuracy or reducing file size.

Principles

Method

J4D trains JPEG encoding parameters by incorporating a differentiable soft quantizer and an information-theoretic rate estimator, allowing backpropagation to minimize rate while maximizing DNN inference performance.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.