Residual Modeling for High-Fidelity Learned Compression of Scientific Data

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new approach to high-fidelity learned compression for scientific data, Residual Modeling, addresses the inefficiency of existing Guaranteed Autoencoder (GAE) methods at tight accuracy targets (NRMSE between $10^{-6}$ and $10^{-4}$). GAE's per-block SVD/PCA-style residual correction becomes rate-dominant, eroding the learned base model's advantage. This work introduces two complementary residual coders: Lorenzo-Based Residual Coding (LBRC) and Neural-Guided Lorenzo Residual Coding (NGLR). LBRC is a training-free pipeline that adaptively quantizes the learned residual and losslessly encodes it using 3D Lorenzo differencing and bit-plane coding. NGLR enhances LBRC with a causal neural bias predictor, whose 0.94 MB weights are included in the bitstream, to further reduce residual code entropy while maintaining deterministic decoding. Across E3SM, JHTDB, and ERA5 datasets, LBRC improves compression ratio over GAE by 30–60% and competes with SZ. NGLR further boosts CR by 10–40% over LBRC, outperforming SZ in the evaluated high-fidelity regime.

Key takeaway

For Machine Learning Engineers and Research Scientists developing high-fidelity lossy compression for scientific data, you should re-evaluate residual correction strategies. If your applications demand per-block NRMSE targets between $10^{-6}$ and $10^{-4}$, traditional GAE-style SVD/PCA correction will likely become the rate bottleneck. Implement LBRC for a training-free, efficient solution competitive with SZ. For superior compression ratios, adopt NGLR. This method uses neural guidance for residual prediction, though it incurs a collection-adaptive training cost.

Key insights

High-fidelity learned compression requires specialized residual coding to maintain efficiency, not generic global correction.

Principles

High-fidelity compression bottlenecks shift to residual correction.
Learned residuals demand specialized, local predictive coding.
Separate fidelity control from learned entropy reduction.

Method

LBRC adaptively quantizes learned residuals, then applies 3D Lorenzo differencing, zigzag mapping, bit-plane, and entropy coding. NGLR adds a causal neural bias predictor to refine Lorenzo estimates.

In practice

Use LBRC for training-free, high-fidelity scientific data compression.
Apply NGLR for higher CR, accepting neural model training cost.
Integrate into CAESAR-V or similar learned block compressors.

Topics

Lossy Compression
Scientific Data
Residual Coding
Neural Compression
Lorenzo Prediction
Error-Bounded Compression

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.