Residual Modeling for High-Fidelity Learned Compression of Scientific Data

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new approach to high-fidelity learned compression for scientific data, Residual Modeling, addresses the inefficiency of existing Guaranteed Autoencoder (GAE) methods at tight accuracy targets (NRMSE between $10^{-6}$ and $10^{-4}$). GAE's per-block SVD/PCA-style residual correction becomes rate-dominant, eroding the learned base model's advantage. This work introduces two complementary residual coders: Lorenzo-Based Residual Coding (LBRC) and Neural-Guided Lorenzo Residual Coding (NGLR). LBRC is a training-free pipeline that adaptively quantizes the learned residual and losslessly encodes it using 3D Lorenzo differencing and bit-plane coding. NGLR enhances LBRC with a causal neural bias predictor, whose 0.94 MB weights are included in the bitstream, to further reduce residual code entropy while maintaining deterministic decoding. Across E3SM, JHTDB, and ERA5 datasets, LBRC improves compression ratio over GAE by 30–60% and competes with SZ. NGLR further boosts CR by 10–40% over LBRC, outperforming SZ in the evaluated high-fidelity regime.

Key takeaway

For Machine Learning Engineers and Research Scientists developing high-fidelity lossy compression for scientific data, you should re-evaluate residual correction strategies. If your applications demand per-block NRMSE targets between $10^{-6}$ and $10^{-4}$, traditional GAE-style SVD/PCA correction will likely become the rate bottleneck. Implement LBRC for a training-free, efficient solution competitive with SZ. For superior compression ratios, adopt NGLR. This method uses neural guidance for residual prediction, though it incurs a collection-adaptive training cost.

Key insights

High-fidelity learned compression requires specialized residual coding to maintain efficiency, not generic global correction.

Principles

Method

LBRC adaptively quantizes learned residuals, then applies 3D Lorenzo differencing, zigzag mapping, bit-plane, and entropy coding. NGLR adds a causal neural bias predictor to refine Lorenzo estimates.

In practice

Topics

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.