Residual Modeling for High-Fidelity Learned Compression of Scientific Data
Summary
Residual Modeling for High-Fidelity Learned Compression of Scientific Data introduces two novel residual coders, LBRC and NGLR, to address limitations of existing Guaranteed Autoencoder (GAE) methods in high-fidelity lossy compression of massive spatiotemporal scientific data. GAE struggles at block-level NRMSE targets from 10^-6 to 10^-4, where per-block residual corrections dominate the total bit rate. LBRC is a deterministic, training-free pipeline that adaptively quantizes and losslessly encodes learned residuals using techniques like 3D Lorenzo differencing and entropy coding. NGLR extends LBRC by incorporating a causal neural predictor to further reduce residual code entropy. Evaluated across E3SM, JHTDB, and ERA5 datasets, LBRC improves compression ratio over GAE by 30-60% and competes with SZ. NGLR achieves an additional 10-40% improvement over LBRC, outperforming SZ in the high-fidelity regime.
Key takeaway
For research scientists optimizing high-fidelity lossy compression of massive scientific datasets, traditional Guaranteed Autoencoder (GAE) methods become inefficient due to rate-dominant residual corrections. You should investigate implementing LBRC or NGLR, which offer 30-60% and 10-40% compression ratio improvements respectively over GAE and SZ in the 10^-6 to 10^-4 NRMSE range. These tailored residual coding techniques preserve learned compression advantages, making them crucial for achieving stringent accuracy targets efficiently.
Key insights
Tailored residual representations preserve learned compression advantages in high-fidelity scientific data scenarios.
Principles
- Learned residuals require specialized coding.
- High-fidelity compression demands efficient residual handling.
- Deterministic decoding is achievable with neural predictors.
Method
LBRC uses adaptive quantization and lossless encoding via 3D Lorenzo differencing; NGLR enhances this with a causal neural predictor for entropy reduction.
In practice
- Apply LBRC for 30-60% compression gains over GAE.
- Utilize NGLR for further 10-40% gains, outperforming SZ.
- Target block-level NRMSE 10^-6 to 10^-4 for high fidelity.
Topics
- Learned Compression
- Residual Modeling
- High-Fidelity Data
- Scientific Simulation Data
- Lossy Compression Algorithms
- NRMSE
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.