GMBFormer: An NDVI-Guided Global Memory Bank Transformer for Urban Green-Space Extraction from Ultra-High-Resolution Imagery

2026-06-04 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Environmental Science & Earth Systems · Depth: Expert, quick

Summary

GMBFormer is a SegFormer-based framework designed for urban green-space extraction from ultra-high-resolution (UHR) imagery, addressing limitations of patch-by-patch processing and direct Normalized Difference Vegetation Index (NDVI) injection. It replaces adjacency-driven feature propagation with selective, similarity-driven prototype retrieval. The framework decouples NDVI as a physics-informed gate, admitting high-confidence vegetation descriptors into a compact global memory bank via momentum updates, while only RGB channels enter the backbone. During training and inference, the current patch queries stored prototypes through memory-mediated cross-attention. GMBFormer achieved mean intersection over union (mIoU)/mean Dice (mDice) scores of 89.25%/94.31%, 92.17%/95.92%, and 83.72%/90.86% on a Chengdu UHR dataset and two ISPRS Potsdam settings, outperforming the SegFormer-B4 baseline.

Key takeaway

For Computer Vision Engineers developing semantic segmentation models for ultra-high-resolution imagery, particularly in remote sensing, you should consider GMBFormer's approach. Its decoupled NDVI gating and global memory bank effectively overcome the limitations of traditional patch-by-patch processing, improving semantic reuse and overall accuracy. This method offers a robust way to integrate physical indices without blurring their role with visual features, potentially enhancing your model's performance on similar complex datasets.

Key insights

GMBFormer improves urban green-space extraction by decoupling NDVI and using a global memory bank for semantic reuse.

Principles

Patch-by-patch processing limits semantic reuse among similar patterns.
Direct NDVI injection blurs roles of visual appearance and physical confidence.

Method

RGB channels process visual appearance; NDVI acts as a gate for high-confidence vegetation descriptors into a momentum-updated global memory bank; patches query prototypes via cross-attention.

In practice

Implement a global memory bank for cross-patch semantic reuse.
Decouple physical indices from visual backbones in segmentation.

Topics

Urban Green-Space Extraction
Semantic Segmentation
Transformers
NDVI
Memory Networks
Ultra-High-Resolution Imagery

Best for: AI Scientist, Computer Vision Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.