Road Maps as Free Geometric Priors: Weather-Invariant Drone Geo-Localization with GeoFuse

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

GeoFuse is a novel cross-modal fusion framework designed for drone-view geo-localization, specifically addressing challenges posed by adverse weather conditions like rain, snow, and fog. Traditional methods struggle with weather-induced degradations in drone images, which amplify the domain gap between drone and satellite views. GeoFuse integrates readily available road map data, which offers weather-invariant geometric layout cues such as road networks and building footprints, with geo-tagged satellite imagery. The framework augments existing University-1652 and DenseUAV benchmarks with geo-aligned road maps and employs a flexible fusion module that combines satellite and road map features through token-level and channel-level interactions. A lightweight dynamic gating mechanism adaptively weights modality contributions per instance. GeoFuse utilizes class-level cross-view contrastive learning to align weather-degraded drone features with the fused satellite-roadmap representations, achieving significant performance gains of +3.46% and +23.18% Recall@1 accuracy on the University-1652 and DenseUAV benchmarks, respectively.

Key takeaway

For research scientists developing drone geo-localization systems, GeoFuse demonstrates that incorporating free, weather-invariant road map data can substantially improve accuracy under challenging atmospheric conditions. You should consider augmenting your existing datasets with geo-aligned road maps and explore cross-modal fusion architectures to enhance the robustness of your models against environmental degradations.

Key insights

Integrating weather-invariant road map data with satellite imagery significantly enhances drone geo-localization accuracy in adverse conditions.

Principles

Method

GeoFuse combines satellite and road map features via token-level and channel-level interactions, using dynamic gating and class-level cross-view contrastive learning for robust alignment.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.