Evidential learning driven Breast Tumor Segmentation with Stage-divided Vision-Language Interaction

2026-03-13 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Expert, extended

Summary

Researchers have developed TextBCS, a novel text-guided breast tumor segmentation model designed to improve the accuracy of breast cancer detection in Magnetic Resonance Imaging (MRI). Breast cancer remains a leading cause of death among women, and traditional deep learning methods struggle with low contrast and blurred boundaries in MRI scans. TextBCS addresses these challenges by integrating a Stage-divided Vision-Language Interaction (SVLI) module, which facilitates mutual information exchange between visual and text features at each down-sampling stage, and an Evidential Learning (EL) strategy to quantify segmentation uncertainty for blurred boundaries. The model utilizes variational Dirichlet distribution to characterize segmentation probabilities, enhancing boundary precision. Extensive experiments on a public breast cancer segmentation dataset, Duke-Breast-Cancer-MRI, demonstrate TextBCS's superior performance compared to existing UNet-based and text-guided segmentation networks, achieving a 2.19% increase in Dice score over TransUNet and 1.05% over MGCA.

Key takeaway

For AI Scientists developing medical image segmentation models, TextBCS offers a robust framework for breast tumor detection. Your implementation should consider integrating text prompts via a stage-divided vision-language interaction module and incorporating evidential learning to explicitly quantify and reduce segmentation uncertainty, especially for ambiguous boundaries. This approach significantly outperforms traditional methods, but ensure text prompts are precise; ambiguous prompts can lead to segmentation errors. Future work should explore fine-grained text prompts and diverse training data to enhance generalization.

Key insights

TextBCS enhances breast tumor segmentation in MRI by combining text prompts with evidential learning to address low contrast and blurred boundaries.

Principles

Text prompts guide models to specific regions.
Evidential learning quantifies segmentation uncertainty.
Stage-divided interaction improves multimodal fusion.

Method

TextBCS uses a Stage-divided Vision-Language Interaction (SVLI) module for cross-attention and alignment between image and text features at each down-sampling stage, combined with Evidential Learning (EL) to estimate pixel-level uncertainty using a variational Dirichlet distribution.

In practice

Use multimodal LLMs for automated text prompt generation.
Extract text prompts from existing radiological reports.
Prioritize internal deployment for LLMs to protect patient privacy.

Topics

Breast Tumor Segmentation
Text-Guided Learning
Evidential Learning
Vision-Language Interaction
Medical MRI

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Researcher, Deep Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.