Parameter-Efficient Adaptation of SAM 3 for Automated ITV Generation from 4DCT Images

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition, Medical Imaging AI · Depth: Expert, quick

Summary

A new lightweight framework enhances automated Internal Target Volume (ITV) generation from four-dimensional computed tomography (4DCT) images, addressing limitations of current phase-isolated contouring workflows. This framework applies parameter-efficient fine-tuning, specifically low-rank adaptation (LoRA), to the Segment Anything Model 3 (SAM 3) using only seven annotated 3D CT volumes to align its text-prompted segmentation with medical imaging. It integrates a hard negative mining strategy to improve boundary discrimination in low-contrast thoracic regions. During inference, phase-wise predictions are refined via phase-coherent temporal filtering and spatial connectivity analysis, effectively suppressing transient artifacts by leveraging the continuous nature of respiratory motion. Experiments on pulmonary and cardiac structures achieved median Dice scores of 0.968 and 0.910, with 95th-percentile Hausdorff distances of 0.998 mm and 2.931 mm, respectively. The solution eliminates severe false-positive predictions inherent in unadapted SAM 3 zero-shot inference, retaining over 95% of full-data accuracy while being trainable on a single consumer-grade GPU.

Key takeaway

For Computer Vision Engineers developing automated contouring solutions in adaptive radiotherapy, this framework offers a highly efficient path to deploy robust Internal Target Volume generation. You can achieve over 95% full-data accuracy by fine-tuning SAM 3 with LoRA on as few as seven annotated 3D CT volumes, even on a single consumer-grade GPU. Consider integrating temporal filtering and hard negative mining to eliminate false positives and improve boundary precision in challenging low-contrast regions.

Key insights

Parameter-efficient fine-tuning of SAM 3 with temporal coherence effectively automates ITV generation from 4DCT images using minimal data.

Principles

Method

Fine-tune SAM 3 via LoRA with text prompts and hard negative mining using seven 3D CT volumes. Refine phase-wise predictions with temporal filtering and spatial connectivity analysis.

In practice

Topics

Best for: AI Scientist, Research Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.