Residual Modeling for High-Fidelity Learned Compression of Scientific Data

2026-06-03 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

Residual Modeling for High-Fidelity Learned Compression of Scientific Data introduces two novel residual coders, LBRC and NGLR, to address limitations of existing Guaranteed Autoencoder (GAE) methods in high-fidelity lossy compression of massive spatiotemporal scientific data. GAE struggles at block-level NRMSE targets from 10^-6 to 10^-4, where per-block residual corrections dominate the total bit rate. LBRC is a deterministic, training-free pipeline that adaptively quantizes and losslessly encodes learned residuals using techniques like 3D Lorenzo differencing and entropy coding. NGLR extends LBRC by incorporating a causal neural predictor to further reduce residual code entropy. Evaluated across E3SM, JHTDB, and ERA5 datasets, LBRC improves compression ratio over GAE by 30-60% and competes with SZ. NGLR achieves an additional 10-40% improvement over LBRC, outperforming SZ in the high-fidelity regime.

Key takeaway

For research scientists optimizing high-fidelity lossy compression of massive scientific datasets, traditional Guaranteed Autoencoder (GAE) methods become inefficient due to rate-dominant residual corrections. You should investigate implementing LBRC or NGLR, which offer 30-60% and 10-40% compression ratio improvements respectively over GAE and SZ in the 10^-6 to 10^-4 NRMSE range. These tailored residual coding techniques preserve learned compression advantages, making them crucial for achieving stringent accuracy targets efficiently.

Key insights

Tailored residual representations preserve learned compression advantages in high-fidelity scientific data scenarios.

Principles

Learned residuals require specialized coding.
High-fidelity compression demands efficient residual handling.
Deterministic decoding is achievable with neural predictors.

Method

LBRC uses adaptive quantization and lossless encoding via 3D Lorenzo differencing; NGLR enhances this with a causal neural predictor for entropy reduction.

In practice

Apply LBRC for 30-60% compression gains over GAE.
Utilize NGLR for further 10-40% gains, outperforming SZ.
Target block-level NRMSE 10^-6 to 10^-4 for high fidelity.

Topics

Learned Compression
Residual Modeling
High-Fidelity Data
Scientific Simulation Data
Lossy Compression Algorithms
NRMSE

Best for: AI Scientist, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.