GB-LSR: A Fast Local Spectral Image Representation with a Single Global Bandwidth for Continuous Reconstruction and Super-Resolution

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

GB-LSR (Global-Bandwidth Local Spectral Representation) is a novel fixed-grid local spectral representation designed for continuous image reconstruction and super-resolution. It partitions images into non-overlapping square patches, each utilizing coefficients for a truncated Fourier basis derived from shared convolutional-encoder features. A single trainable scalar bandwidth is applied globally across all patches and images, ensuring reconstruction cost remains independent of image size. In native-reconstruction benchmarks, GB-LSR's main variant surpasses amortized LIIF, LTE, and WIRE re-implementations by 2.8-3.6 dB PSNR and 0.11-0.15 LPIPS, while operating at approximately one-quarter of the slowest baseline's inference cost. For arbitrary-scale super-resolution (ASR), it achieves competitive PSNR-Y, running 1.44x faster than LIIF-RDN and 3.25x faster than LTE-SwinIR at x4, with further speedups and memory reductions possible by optimizing the encoder and averaging techniques.

Key takeaway

For Machine Learning Engineers developing continuous image reconstruction or super-resolution models, you should evaluate GB-LSR as a high-performance, efficient alternative. Its global scalar bandwidth approach significantly outperforms existing methods like LIIF and LTE in PSNR and LPIPS, while offering substantial inference speedups (up to 3.25x faster). Consider implementing this fixed-grid local spectral representation to reduce computational costs and improve model accuracy for your applications.

Key insights

A global scalar bandwidth in local spectral representations can achieve superior image reconstruction and super-resolution performance with high efficiency.

Principles

Method

GB-LSR partitions images into patches, predicts Fourier basis coefficients from shared convolutional-encoder features, and uses a single trainable global scalar bandwidth for continuous reconstruction.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.