GMBFormer: An NDVI-Guided Global Memory Bank Transformer for Urban Green-Space Extraction from Ultra-High-Resolution Imagery
Summary
GMBFormer is a SegFormer-based framework designed for urban green-space extraction from ultra-high-resolution (UHR) imagery, addressing limitations of patch-by-patch processing and direct Normalized Difference Vegetation Index (NDVI) injection. It replaces adjacency-driven feature propagation with selective, similarity-driven prototype retrieval. The framework decouples NDVI as a physics-informed gate, admitting high-confidence vegetation descriptors into a compact global memory bank via momentum updates, while only RGB channels enter the backbone. During training and inference, the current patch queries stored prototypes through memory-mediated cross-attention. GMBFormer achieved mean intersection over union (mIoU)/mean Dice (mDice) scores of 89.25%/94.31%, 92.17%/95.92%, and 83.72%/90.86% on a Chengdu UHR dataset and two ISPRS Potsdam settings, outperforming the SegFormer-B4 baseline.
Key takeaway
For Computer Vision Engineers developing semantic segmentation models for ultra-high-resolution imagery, particularly in remote sensing, you should consider GMBFormer's approach. Its decoupled NDVI gating and global memory bank effectively overcome the limitations of traditional patch-by-patch processing, improving semantic reuse and overall accuracy. This method offers a robust way to integrate physical indices without blurring their role with visual features, potentially enhancing your model's performance on similar complex datasets.
Key insights
GMBFormer improves urban green-space extraction by decoupling NDVI and using a global memory bank for semantic reuse.
Principles
- Patch-by-patch processing limits semantic reuse among similar patterns.
- Direct NDVI injection blurs roles of visual appearance and physical confidence.
Method
RGB channels process visual appearance; NDVI acts as a gate for high-confidence vegetation descriptors into a momentum-updated global memory bank; patches query prototypes via cross-attention.
In practice
- Implement a global memory bank for cross-patch semantic reuse.
- Decouple physical indices from visual backbones in segmentation.
Topics
- Urban Green-Space Extraction
- Semantic Segmentation
- Transformers
- NDVI
- Memory Networks
- Ultra-High-Resolution Imagery
Best for: AI Scientist, Computer Vision Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.