Parameter-Efficient Adaptation of SAM 3 for Automated ITV Generation from 4DCT Images
Summary
A new lightweight framework enhances automated Internal Target Volume (ITV) generation from four-dimensional computed tomography (4DCT) images, addressing limitations of current phase-isolated contouring workflows. This framework applies parameter-efficient fine-tuning, specifically low-rank adaptation (LoRA), to the Segment Anything Model 3 (SAM 3) using only seven annotated 3D CT volumes to align its text-prompted segmentation with medical imaging. It integrates a hard negative mining strategy to improve boundary discrimination in low-contrast thoracic regions. During inference, phase-wise predictions are refined via phase-coherent temporal filtering and spatial connectivity analysis, effectively suppressing transient artifacts by leveraging the continuous nature of respiratory motion. Experiments on pulmonary and cardiac structures achieved median Dice scores of 0.968 and 0.910, with 95th-percentile Hausdorff distances of 0.998 mm and 2.931 mm, respectively. The solution eliminates severe false-positive predictions inherent in unadapted SAM 3 zero-shot inference, retaining over 95% of full-data accuracy while being trainable on a single consumer-grade GPU.
Key takeaway
For Computer Vision Engineers developing automated contouring solutions in adaptive radiotherapy, this framework offers a highly efficient path to deploy robust Internal Target Volume generation. You can achieve over 95% full-data accuracy by fine-tuning SAM 3 with LoRA on as few as seven annotated 3D CT volumes, even on a single consumer-grade GPU. Consider integrating temporal filtering and hard negative mining to eliminate false positives and improve boundary precision in challenging low-contrast regions.
Key insights
Parameter-efficient fine-tuning of SAM 3 with temporal coherence effectively automates ITV generation from 4DCT images using minimal data.
Principles
- Leverage temporal coherence in 4DCT to suppress artifacts.
- Parameter-efficient fine-tuning adapts large models with limited data.
- Hard negative mining enhances low-contrast boundary discrimination.
Method
Fine-tune SAM 3 via LoRA with text prompts and hard negative mining using seven 3D CT volumes. Refine phase-wise predictions with temporal filtering and spatial connectivity analysis.
In practice
- Adapt SAM 3 with LoRA using few annotated 3D CT volumes.
- Implement hard negative mining for low-contrast thoracic regions.
- Apply phase-coherent temporal filtering to 4DCT segmentations.
Topics
- Parameter-Efficient Fine-tuning
- Segment Anything Model 3
- 4DCT Imaging
- Internal Target Volume
- Low-Rank Adaptation
- Medical Image Segmentation
- Adaptive Radiotherapy
Best for: AI Scientist, Research Scientist, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.