Residual Modeling for High-Fidelity Learned Compression of Scientific Data
Summary
A new approach to high-fidelity learned compression for scientific data, Residual Modeling, addresses the inefficiency of existing Guaranteed Autoencoder (GAE) methods at tight accuracy targets (NRMSE between $10^{-6}$ and $10^{-4}$). GAE's per-block SVD/PCA-style residual correction becomes rate-dominant, eroding the learned base model's advantage. This work introduces two complementary residual coders: Lorenzo-Based Residual Coding (LBRC) and Neural-Guided Lorenzo Residual Coding (NGLR). LBRC is a training-free pipeline that adaptively quantizes the learned residual and losslessly encodes it using 3D Lorenzo differencing and bit-plane coding. NGLR enhances LBRC with a causal neural bias predictor, whose 0.94 MB weights are included in the bitstream, to further reduce residual code entropy while maintaining deterministic decoding. Across E3SM, JHTDB, and ERA5 datasets, LBRC improves compression ratio over GAE by 30–60% and competes with SZ. NGLR further boosts CR by 10–40% over LBRC, outperforming SZ in the evaluated high-fidelity regime.
Key takeaway
For Machine Learning Engineers and Research Scientists developing high-fidelity lossy compression for scientific data, you should re-evaluate residual correction strategies. If your applications demand per-block NRMSE targets between $10^{-6}$ and $10^{-4}$, traditional GAE-style SVD/PCA correction will likely become the rate bottleneck. Implement LBRC for a training-free, efficient solution competitive with SZ. For superior compression ratios, adopt NGLR. This method uses neural guidance for residual prediction, though it incurs a collection-adaptive training cost.
Key insights
High-fidelity learned compression requires specialized residual coding to maintain efficiency, not generic global correction.
Principles
- High-fidelity compression bottlenecks shift to residual correction.
- Learned residuals demand specialized, local predictive coding.
- Separate fidelity control from learned entropy reduction.
Method
LBRC adaptively quantizes learned residuals, then applies 3D Lorenzo differencing, zigzag mapping, bit-plane, and entropy coding. NGLR adds a causal neural bias predictor to refine Lorenzo estimates.
In practice
- Use LBRC for training-free, high-fidelity scientific data compression.
- Apply NGLR for higher CR, accepting neural model training cost.
- Integrate into CAESAR-V or similar learned block compressors.
Topics
- Lossy Compression
- Scientific Data
- Residual Coding
- Neural Compression
- Lorenzo Prediction
- Error-Bounded Compression
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.