Toward Training-Free Zero-Shot Anomaly Detection in 3D Medical Images: A Batch-Based Approach Using 2D Foundation Models
Summary
CS3F is a novel training-free batch-based framework designed for zero-shot anomaly detection (ZSAD) in 3D medical images, using existing 2D foundation models. It addresses challenges like heterogeneous acquisition protocols and scarce annotated data in medical imaging. The framework decomposes 3D volumes along multiple anatomical axes, encoding slices with a 2D vision transformer. These are then converted into localized volumetric tokens via pooling, with anomaly scores derived from cross-subject mutual similarity. A coarse-to-fine tokenization strategy is introduced to mitigate focal lesion signal attenuation. CS3F was evaluated on brain MRI for metastases, glioma, and stroke, and validated on lung CT, demonstrating that frozen 2D foundation models can localize anomalies in 3D medical images. The effectiveness of fine tokenization varies with lesion contrast and imaging modality.
Key takeaway
For AI Scientists or Research Scientists developing anomaly detection in 3D medical imaging, CS3F offers a compelling training-free, zero-shot approach by effectively repurposing readily available 2D foundation models. This method directly addresses the scarcity of annotated 3D data and diverse clinical scenarios. You should carefully evaluate the coarse-to-fine tokenization strategy, as its benefit for anomaly localization is highly dependent on the specific lesion contrast and imaging modality of your application.
Key insights
CS3F enables training-free zero-shot anomaly detection in 3D medical images by adapting 2D foundation models.
Principles
- Cross-subject mutual similarity identifies anomalies.
- 2D vision transformers adapt for 3D volumes.
- Coarse-to-fine tokenization improves focal lesion detection.
Method
Decompose 3D volumes, encode slices with 2D vision transformers, pool into volumetric tokens, then score anomalies via cross-subject similarity, using a coarse-to-fine tokenization strategy.
In practice
- Apply 2D vision transformers for 3D medical image analysis.
- Use cross-subject similarity for ZSAD in heterogeneous data.
- Evaluate coarse-to-fine tokenization for subtle lesions.
Topics
- Zero-shot Anomaly Detection
- 3D Medical Imaging
- 2D Foundation Models
- Vision Transformers
- Brain MRI
- Lung CT
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.