Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks
Summary
A study benchmarked nine self-supervised learning (SSL) methods across four pretext-task families, pretraining them on 10,412 3D CT scans (1.89 million 2D axial slices). Each pretrained Swin Transformer encoder was integrated into a SwinUNETR-style segmentation network and fine-tuned on nine public segmentation tasks, including large abdominal organs, head-and-neck structures, and tumors from CT and MRI. Performance was evaluated using the Dice similarity coefficient (DSC), convergence speed, cross-modality transferability (CT-to-MRI), and feature-reuse patterns. Self-distilled masked image transformer (SMIT) achieved the highest overall segmentation accuracy, fastest fine-tuning convergence, and smallest few-shot-to-many-shot performance gap, demonstrating superior data efficiency and consistent feature reuse.
Key takeaway
For research scientists developing medical image segmentation models, especially with limited labeled data, you should prioritize self-distilled masked image transformer (SMIT) pretraining. SMIT demonstrates superior data efficiency and faster convergence, making it a strong candidate for achieving high accuracy and consistent feature reuse across various anatomical structures and modalities like CT and MRI.
Key insights
SMIT, combining MIM and self-distillation, excels in medical image segmentation transfer learning.
Principles
- MIM and self-distillation outperform contrastive learning.
- SSL choice matters most with limited annotation budgets.
Method
Pretrain Swin Transformer encoders with SSL on 3D CT scans, then fine-tune within a SwinUNETR-style network for segmentation tasks.
In practice
- Prioritize SMIT for medical image segmentation.
- Focus on SSL method selection for few-shot scenarios.
Topics
- Self-Supervised Learning
- Image Segmentation
- Swin Transformer
- Masked Image Modeling
- SMIT
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.