Align then Refine: Text-Guided 3D Prostate Lesion Segmentation
Summary
A new multi-encoder U-Net architecture has been developed for automated 3D segmentation of prostate lesions from biparametric MRI (bp-MRI), addressing challenges in integrating cross-modal information and fine-grained lesion semantics. The model incorporates three innovations: an alignment loss to enhance foreground text-image similarity for lesion semantics, a heatmap loss to calibrate similarity maps and suppress background activations, and a confidence-gated multi-head cross-attention refiner for localized boundary edits in high-confidence regions. A phase-scheduled training regime stabilizes the optimization of these components. This method achieved Dice 0.7326 and NSD 0.7541 on the PI-CAI dataset, outperforming prior approaches like nnU-Net (Dice 0.7192, NSD 0.7320) and various transformer/SAM-based baselines, establishing a new state-of-the-art.
Key takeaway
For Computer Vision Engineers developing medical image segmentation models, this research demonstrates that integrating text-guided semantic alignment with confidence-gated refinement, orchestrated through phase-scheduled training, significantly improves 3D prostate lesion segmentation. You should consider adopting a similar multi-stage, text-guided approach to enhance both accuracy and interpretability in your volumetric segmentation tasks, especially where fine-grained boundary precision is critical.
Key insights
Text-guided, phase-scheduled training improves 3D prostate lesion segmentation by integrating semantic alignment and confidence-aware refinement.
Principles
- Decouple semantic grounding from boundary correction.
- Calibrate similarity maps to suppress background activations.
- Apply localized edits only in high-confidence regions.
Method
A multi-encoder U-Net uses a bottleneck similarity head for text-image alignment via alignment and heatmap losses, followed by a final-stage, confidence-gated cross-attention refiner, all optimized with a three-phase training curriculum.
In practice
- Use "prostate lesion" as a text prompt for BiomedCLIP.
- Set similarity head temperature to 1.
- Configure gating hyperparameters $\tau=0.35$ and $\alpha=0.25$.
Topics
- Prostate Lesion Segmentation
- Biparametric MRI
- Multi-encoder U-Net
- Text-Guided Segmentation
- Cross-Attention Refinement
Code references
Best for: Computer Vision Engineer, AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.