Parameter-Efficient Adaptation of SAM 3 for Automated ITV Generation from 4DCT Images

2026-06-14 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition, Medical Imaging AI · Depth: Expert, quick

Summary

A new lightweight framework enhances automated Internal Target Volume (ITV) generation from four-dimensional computed tomography (4DCT) images, addressing limitations of current phase-isolated contouring workflows. This framework applies parameter-efficient fine-tuning, specifically low-rank adaptation (LoRA), to the Segment Anything Model 3 (SAM 3) using only seven annotated 3D CT volumes to align its text-prompted segmentation with medical imaging. It integrates a hard negative mining strategy to improve boundary discrimination in low-contrast thoracic regions. During inference, phase-wise predictions are refined via phase-coherent temporal filtering and spatial connectivity analysis, effectively suppressing transient artifacts by leveraging the continuous nature of respiratory motion. Experiments on pulmonary and cardiac structures achieved median Dice scores of 0.968 and 0.910, with 95th-percentile Hausdorff distances of 0.998 mm and 2.931 mm, respectively. The solution eliminates severe false-positive predictions inherent in unadapted SAM 3 zero-shot inference, retaining over 95% of full-data accuracy while being trainable on a single consumer-grade GPU.

Key takeaway

For Computer Vision Engineers developing automated contouring solutions in adaptive radiotherapy, this framework offers a highly efficient path to deploy robust Internal Target Volume generation. You can achieve over 95% full-data accuracy by fine-tuning SAM 3 with LoRA on as few as seven annotated 3D CT volumes, even on a single consumer-grade GPU. Consider integrating temporal filtering and hard negative mining to eliminate false positives and improve boundary precision in challenging low-contrast regions.

Key insights

Parameter-efficient fine-tuning of SAM 3 with temporal coherence effectively automates ITV generation from 4DCT images using minimal data.

Principles

Leverage temporal coherence in 4DCT to suppress artifacts.
Parameter-efficient fine-tuning adapts large models with limited data.
Hard negative mining enhances low-contrast boundary discrimination.

Method

Fine-tune SAM 3 via LoRA with text prompts and hard negative mining using seven 3D CT volumes. Refine phase-wise predictions with temporal filtering and spatial connectivity analysis.

In practice

Adapt SAM 3 with LoRA using few annotated 3D CT volumes.
Implement hard negative mining for low-contrast thoracic regions.
Apply phase-coherent temporal filtering to 4DCT segmentations.

Topics

Parameter-Efficient Fine-tuning
Segment Anything Model 3
4DCT Imaging
Internal Target Volume
Low-Rank Adaptation
Medical Image Segmentation
Adaptive Radiotherapy

Best for: AI Scientist, Research Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.