DAMAGESCOPE: How We Built a Dual-Encoder Damage Detection Model That Works Through Clouds

2026-06-23 · Source: Machine Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Robotics & Autonomous Systems · Depth: Advanced, short

Summary

DAMAGESCOPE is a dual-encoder model designed for detecting structural damage from cloud-covered satellite imagery, addressing challenges like cloud occlusion, noisy ground truth labels, and inference speed. The model employs a dual-encoder architecture, combining EfficientNet-B3 for RGB visual features and a modified ResNet-34 for auxiliary spectral/contextual features. Its core innovation is Adaptive Cross-Modal Attention Fusion, which dynamically weights each encoder's contribution based on input quality, enabling 78% accuracy even under full cloud occlusion. To handle inconsistent crowdsourced labels from the BRIGHT dataset, DAMAGESCOPE utilizes Independent Bayesian Classifier Combination (IBCC) for probabilistic label aggregation. This approach also achieves a 111x speedup over standard single-encoder baselines through architectural efficiency and inference optimization. Future development, DAMAGESCOPE V2, targets edge deployment and integration with the larger xBD dataset.

Key takeaway

For Machine Learning Engineers building robust models with imperfect real-world data, DAMAGESCOPE demonstrates that architectural innovation, like adaptive cross-modal attention fusion, can significantly improve performance under degraded conditions. You should consider dynamic weighting of multimodal inputs and rigorous label preprocessing, such as Bayesian aggregation, to overcome data limitations and achieve operational speed, even with constrained datasets.

Key insights

Adaptive cross-modal attention fusion enables robust damage detection from satellite imagery despite cloud occlusion.

Principles

Architecture can compensate for limited data.
Label quality is upstream of everything.
Rejected ideas can find future viability.

Method

A dual-encoder architecture (EfficientNet-B3, Modified ResNet-34) with adaptive cross-modal attention fusion, preprocessed with Independent Bayesian Classifier Combination (IBCC) for noisy labels.

In practice

Implement dual encoders for multimodal inputs.
Dynamically weight sensor contributions based on quality.
Use Bayesian methods for noisy crowdsourced labels.

Topics

Damage Detection
Satellite Imagery
Dual-Encoder Models
Adaptive Attention
Noisy Labels
IBCC

Code references

Tharun007-TK/disaster-detection

Best for: AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.