MemOVCD: Training-Free Open-Vocabulary Change Detection via Cross-Temporal Memory Reasoning and Global-Local Adaptive Rectification

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

MemOVCD is a training-free open-vocabulary change detection framework designed for bi-temporal remote sensing images, addressing the challenge of identifying semantic changes without predefined categories. Existing methods often process timestamps independently or interact only at the final comparison, leading to insufficient temporal coupling and fragmented change regions in high-resolution images. MemOVCD tackles these limitations by reformulating change detection as a two-frame tracking problem, employing weighted bidirectional propagation to aggregate semantic evidence across time. It also utilizes histogram-aligned transition frames to smooth abrupt appearance changes and stabilize memory propagation. Furthermore, a global-local adaptive rectification strategy fuses local and global-view predictions to enhance spatial consistency while retaining fine-grained details. Experiments on five benchmarks confirm MemOVCD's effectiveness and generalization across diverse open-vocabulary settings for two change detection tasks.

Key takeaway

For research scientists developing open-vocabulary change detection systems, MemOVCD's approach offers a robust, training-free alternative. You should consider integrating cross-temporal memory reasoning and global-local adaptive rectification into your models to improve semantic change identification and spatial consistency, especially when dealing with high-resolution remote sensing imagery and diverse open-vocabulary scenarios.

Key insights

MemOVCD improves open-vocabulary change detection by integrating cross-temporal memory reasoning and global-local adaptive rectification.

Principles

Method

MemOVCD uses weighted bidirectional propagation for cross-temporal memory reasoning, constructs histogram-aligned transition frames, and applies a global-local adaptive rectification strategy to fuse predictions for improved spatial consistency.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.