CMDS-AD: Cross-Modal Dual-Stream Decoupling for Few-Shot Anomaly Detection

2026-06-18 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

CMDS-AD, a Cross-Modal Dual-Stream Anomaly Detection framework, addresses the challenges of few-shot anomaly detection, particularly in multi-modal settings where limited training data and spatially uniform feature processing in existing methods lead to cross-modal misalignment and high false-positive rates. The framework employs a LoRA-guided diffusion model to generate diverse RGB samples, mitigating extreme data scarcity. For 3D normal augmentation, a pre-trained diffusion model functions as a non-linear low-pass filter, extracting low-frequency normal representations from RGB inputs. This creates an auxiliary stream for robust structural templates, aiding the uncompressed real stream in isolating micro-defects. Further enhancements include a Coordinate-Aware Hierarchical Feature Mapper for semantic alignment and a multiplicative scoring mechanism to filter modality-specific noise. Under a 1-shot setting, CMDS-AD achieved absolute performance gains of 5.7% (I-AUROC) and 2.0% (AUPRO) on MVTec 3D-AD, and 7.7% and 5.6% on EyeCandies, establishing new performance benchmarks.

Key takeaway

For Machine Learning Engineers developing few-shot anomaly detection systems, particularly with multi-modal inputs, CMDS-AD offers a robust approach to overcome data scarcity and reduce false positives. You should consider its dual-stream decoupling strategy, which separates structural and defect signals, and its use of diffusion models for data augmentation and normal estimation. This method significantly improves detection accuracy on complex datasets like MVTec 3D-AD and EyeCandies.

Key insights

CMDS-AD uses dual-stream decoupling and diffusion models to enhance few-shot multi-modal anomaly detection by separating structural and defect signals.

Principles

Decouple low-frequency structures from high-frequency defects.
Leverage diffusion models for data augmentation and normal estimation.
Filter modality-specific noise with multiplicative scoring.

Method

CMDS-AD generates RGB samples via LoRA-guided diffusion, estimates low-frequency normals using a pre-trained diffusion model, and aligns cross-modal semantics with a hierarchical feature mapper, then scores anomalies.

In practice

Apply dual-stream processing for defect isolation.
Use diffusion models for synthetic normal data generation.
Implement coordinate-aware feature mapping for multi-modal alignment.

Topics

Few-Shot Anomaly Detection
Multi-Modal Anomaly Detection
Diffusion Models
Cross-Modal Alignment
Computer Vision
MVTec 3D-AD

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.