LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

LESSViT, or Low-rank Efficient Spatial-Spectral ViT, is a new sensor-flexible architecture designed to improve the robustness and generalization of hyperspectral imagery (HSI) models across different sensors. Traditional Vision Transformer (ViT) methods struggle with varying wavelength coverage, band sampling, and channel dimensionality, often failing to generalize when spectral configurations shift. LESSViT addresses this by using LESS Attention, a structured low-rank factorization that models joint spatial-spectral interactions through separable spatial and spectral components. This reduces the computational complexity of full spatial-spectral attention from O(N^2 C^2) to O(rNC). The architecture also features channel-agnostic patch embedding and wavelength-aware positional encoding for flexible spectral inputs. For efficient pretraining, LESSViT incorporates HyperMAE, a hyperspectral masked autoencoder with decoupled spatial-spectral masking and hierarchical channel sampling. Experiments on the SpectralEarth benchmark confirm LESSViT's improved robustness under spectral shifts while maintaining competitive in-distribution performance.

Key takeaway

For research scientists developing hyperspectral imagery models, LESSViT offers a robust solution to the pervasive challenge of spectral configuration shifts across different sensors. You should consider integrating its LESS Attention mechanism and HyperMAE pretraining approach to build models that generalize more effectively. This architecture directly addresses the trade-off between efficiency and expressiveness, enabling more scalable and adaptable HSI representation learning in real-world applications.

Key insights

LESSViT enhances hyperspectral image model generalization across sensors via efficient low-rank spatial-spectral attention.

Principles

Method

LESSViT uses LESS Attention for joint spatial-spectral modeling, channel-agnostic patch embedding, and wavelength-aware positional encoding. It employs HyperMAE with decoupled spatial-spectral masking and hierarchical channel sampling for pretraining.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.