Multi-modality Image Fusion under Adverse Weather: Mask-Guided Feature Restoration and Interaction

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A novel mask-guided Multi-modality Image Fusion (MMIF) method addresses image degradation caused by adverse weather, which disrupts feature representation and complicates cross-modal complementarity. This approach introduces "Pseudo Ground Truth" to simplify training and accelerate feature learning. It also employs a mask generation mechanism that quantifies each modality's contribution by mapping fused results to source images. A mask-guided cross-modal cross-attention mechanism then selectively focuses on informative features, preventing overfitting to the "Pseudo Ground Truth" distribution. Furthermore, the method integrates mask-guided learning and a task-coupled degradation-aware learning strategy to balance feature restoration and interaction. Extensive experiments on synthetic and real-world datasets demonstrate its superior performance over state-of-the-art approaches in visual quality, quantitative metrics, and downstream tasks. The source code is available on GitHub.

Key takeaway

For computer vision engineers developing robust perception systems for adverse weather conditions, this mask-guided multi-modality image fusion method offers a significant performance uplift. Its use of "Pseudo Ground Truth" and selective attention mechanisms effectively addresses image degradation and enhances cross-modal complementarity. You should consider integrating these mask-guided strategies to improve visual quality and downstream task performance in challenging environments, especially given the open-source code availability.

Key insights

Mask-guided fusion with pseudo ground truth and attention improves multi-modality image processing under adverse weather.

Principles

Method

The method uses "Pseudo Ground Truth" for training, generates masks based on fused-to-source image mapping, and applies mask-guided cross-modal cross-attention. It balances feature restoration and interaction via mask-guided and task-coupled degradation-aware learning strategies.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.