CMDS-AD: Cross-Modal Dual-Stream Decoupling for Few-Shot Anomaly Detection
Summary
CMDS-AD, a Cross-Modal Dual-Stream Anomaly Detection framework, addresses the challenges of few-shot anomaly detection, particularly in multi-modal settings where limited training data and spatially uniform feature processing in existing methods lead to cross-modal misalignment and high false-positive rates. The framework employs a LoRA-guided diffusion model to generate diverse RGB samples, mitigating extreme data scarcity. For 3D normal augmentation, a pre-trained diffusion model functions as a non-linear low-pass filter, extracting low-frequency normal representations from RGB inputs. This creates an auxiliary stream for robust structural templates, aiding the uncompressed real stream in isolating micro-defects. Further enhancements include a Coordinate-Aware Hierarchical Feature Mapper for semantic alignment and a multiplicative scoring mechanism to filter modality-specific noise. Under a 1-shot setting, CMDS-AD achieved absolute performance gains of 5.7% (I-AUROC) and 2.0% (AUPRO) on MVTec 3D-AD, and 7.7% and 5.6% on EyeCandies, establishing new performance benchmarks.
Key takeaway
For Machine Learning Engineers developing few-shot anomaly detection systems, particularly with multi-modal inputs, CMDS-AD offers a robust approach to overcome data scarcity and reduce false positives. You should consider its dual-stream decoupling strategy, which separates structural and defect signals, and its use of diffusion models for data augmentation and normal estimation. This method significantly improves detection accuracy on complex datasets like MVTec 3D-AD and EyeCandies.
Key insights
CMDS-AD uses dual-stream decoupling and diffusion models to enhance few-shot multi-modal anomaly detection by separating structural and defect signals.
Principles
- Decouple low-frequency structures from high-frequency defects.
- Leverage diffusion models for data augmentation and normal estimation.
- Filter modality-specific noise with multiplicative scoring.
Method
CMDS-AD generates RGB samples via LoRA-guided diffusion, estimates low-frequency normals using a pre-trained diffusion model, and aligns cross-modal semantics with a hierarchical feature mapper, then scores anomalies.
In practice
- Apply dual-stream processing for defect isolation.
- Use diffusion models for synthetic normal data generation.
- Implement coordinate-aware feature mapping for multi-modal alignment.
Topics
- Few-Shot Anomaly Detection
- Multi-Modal Anomaly Detection
- Diffusion Models
- Cross-Modal Alignment
- Computer Vision
- MVTec 3D-AD
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.