DAMAGESCOPE: How We Built a Dual-Encoder Damage Detection Model That Works Through Clouds
Summary
DAMAGESCOPE is a dual-encoder model designed for detecting structural damage from cloud-covered satellite imagery, addressing challenges like cloud occlusion, noisy ground truth labels, and inference speed. The model employs a dual-encoder architecture, combining EfficientNet-B3 for RGB visual features and a modified ResNet-34 for auxiliary spectral/contextual features. Its core innovation is Adaptive Cross-Modal Attention Fusion, which dynamically weights each encoder's contribution based on input quality, enabling 78% accuracy even under full cloud occlusion. To handle inconsistent crowdsourced labels from the BRIGHT dataset, DAMAGESCOPE utilizes Independent Bayesian Classifier Combination (IBCC) for probabilistic label aggregation. This approach also achieves a 111x speedup over standard single-encoder baselines through architectural efficiency and inference optimization. Future development, DAMAGESCOPE V2, targets edge deployment and integration with the larger xBD dataset.
Key takeaway
For Machine Learning Engineers building robust models with imperfect real-world data, DAMAGESCOPE demonstrates that architectural innovation, like adaptive cross-modal attention fusion, can significantly improve performance under degraded conditions. You should consider dynamic weighting of multimodal inputs and rigorous label preprocessing, such as Bayesian aggregation, to overcome data limitations and achieve operational speed, even with constrained datasets.
Key insights
Adaptive cross-modal attention fusion enables robust damage detection from satellite imagery despite cloud occlusion.
Principles
- Architecture can compensate for limited data.
- Label quality is upstream of everything.
- Rejected ideas can find future viability.
Method
A dual-encoder architecture (EfficientNet-B3, Modified ResNet-34) with adaptive cross-modal attention fusion, preprocessed with Independent Bayesian Classifier Combination (IBCC) for noisy labels.
In practice
- Implement dual encoders for multimodal inputs.
- Dynamically weight sensor contributions based on quality.
- Use Bayesian methods for noisy crowdsourced labels.
Topics
- Damage Detection
- Satellite Imagery
- Dual-Encoder Models
- Adaptive Attention
- Noisy Labels
- IBCC
Code references
Best for: AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.