Multi-modality Image Fusion under Adverse Weather: Mask-Guided Feature Restoration and Interaction

2026-06-25 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A novel mask-guided Multi-modality Image Fusion (MMIF) method addresses image degradation caused by adverse weather, which disrupts feature representation and complicates cross-modal complementarity. This approach introduces "Pseudo Ground Truth" to simplify training and accelerate feature learning. It also employs a mask generation mechanism that quantifies each modality's contribution by mapping fused results to source images. A mask-guided cross-modal cross-attention mechanism then selectively focuses on informative features, preventing overfitting to the "Pseudo Ground Truth" distribution. Furthermore, the method integrates mask-guided learning and a task-coupled degradation-aware learning strategy to balance feature restoration and interaction. Extensive experiments on synthetic and real-world datasets demonstrate its superior performance over state-of-the-art approaches in visual quality, quantitative metrics, and downstream tasks. The source code is available on GitHub.

Key takeaway

For computer vision engineers developing robust perception systems for adverse weather conditions, this mask-guided multi-modality image fusion method offers a significant performance uplift. Its use of "Pseudo Ground Truth" and selective attention mechanisms effectively addresses image degradation and enhances cross-modal complementarity. You should consider integrating these mask-guided strategies to improve visual quality and downstream task performance in challenging environments, especially given the open-source code availability.

Key insights

Mask-guided fusion with pseudo ground truth and attention improves multi-modality image processing under adverse weather.

Principles

Quantify modality contribution via mask generation.
Balance feature restoration and interaction.
Simplify training with pseudo ground truth.

Method

The method uses "Pseudo Ground Truth" for training, generates masks based on fused-to-source image mapping, and applies mask-guided cross-modal cross-attention. It balances feature restoration and interaction via mask-guided and task-coupled degradation-aware learning strategies.

In practice

Enhance autonomous driving perception.
Improve surveillance in poor visibility.
Apply mask-guided attention in fusion.

Topics

Multi-modality Image Fusion
Adverse Weather Perception
Mask-Guided Learning
Cross-Modal Attention
Feature Restoration
Computer Vision

Code references

ixilai/AMG-Fuse

Best for: Research Scientist, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.