MemOVCD: Training-Free Open-Vocabulary Change Detection via Cross-Temporal Memory Reasoning and Global-Local Adaptive Rectification
Summary
MemOVCD is a training-free open-vocabulary change detection framework designed for bi-temporal remote sensing images, addressing the challenge of identifying semantic changes without predefined categories. Existing methods often process timestamps independently or interact only at the final comparison, leading to insufficient temporal coupling and fragmented change regions in high-resolution images. MemOVCD tackles these limitations by reformulating change detection as a two-frame tracking problem, employing weighted bidirectional propagation to aggregate semantic evidence across time. It also utilizes histogram-aligned transition frames to smooth abrupt appearance changes and stabilize memory propagation. Furthermore, a global-local adaptive rectification strategy fuses local and global-view predictions to enhance spatial consistency while retaining fine-grained details. Experiments on five benchmarks confirm MemOVCD's effectiveness and generalization across diverse open-vocabulary settings for two change detection tasks.
Key takeaway
For research scientists developing open-vocabulary change detection systems, MemOVCD's approach offers a robust, training-free alternative. You should consider integrating cross-temporal memory reasoning and global-local adaptive rectification into your models to improve semantic change identification and spatial consistency, especially when dealing with high-resolution remote sensing imagery and diverse open-vocabulary scenarios.
Key insights
MemOVCD improves open-vocabulary change detection by integrating cross-temporal memory reasoning and global-local adaptive rectification.
Principles
- Reformulate change detection as tracking.
- Aggregate semantic evidence bidirectionally.
- Smooth abrupt changes with transition frames.
Method
MemOVCD uses weighted bidirectional propagation for cross-temporal memory reasoning, constructs histogram-aligned transition frames, and applies a global-local adaptive rectification strategy to fuse predictions for improved spatial consistency.
In practice
- Apply two-frame tracking to bi-temporal images.
- Use histogram alignment for temporal smoothing.
- Combine global and local predictions.
Topics
- Open-Vocabulary Change Detection
- MemOVCD
- Cross-Temporal Memory Reasoning
- Global-Local Adaptive Rectification
- Remote Sensing Images
Best for: Research Scientist, AI Scientist, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.