BreastGPT: A Multimodal Large Language Model for the Full Spectrum of Breast Cancer Clinical Routine

2026-06-03 · Source: Takara TLDR - Daily AI Papers · Field: Health & Wellbeing — Artificial Intelligence & Machine Learning, Medical Specialties & Subspecialties, Medical Devices & Health Technology · Depth: Expert, quick

Summary

BreastGPT, a new multimodal large language model (MLLM), has been developed to support the entire breast cancer clinical workflow, encompassing screening, diagnosis, and treatment planning. Existing medical MLLMs often struggle with data scarcity and limited versatility across diverse tasks and imaging modalities. To overcome these challenges, the researchers introduced BreastStage, a comprehensive workflow-aligned breast imaging instruction corpus containing 1.86 million instruction-following pairs derived from 17 sub-datasets, 5 imaging modalities, and 136 task templates. A held-out split, BreastStage-Bench, serves as a benchmark for evaluating multimodal reasoning. BreastGPT itself features a dual-branch visual encoder and concept-preserving token compression, enabling it to process both standard radiology and gigapixel pathology images. On BreastStage-Bench, BreastGPT achieved 75.66% closed-ended accuracy and an 89.92% open-ended score, surpassing other general-purpose and medical-specific MLLMs. This performance underscores the importance of workflow-aligned data and cross-scale visual modeling for clinically relevant medical MLLMs.

Key takeaway

For AI Scientists developing medical MLLMs for oncology, you should prioritize integrating workflow-aligned data and cross-scale visual modeling. Utilizing resources like BreastStage and BreastStage-Bench can significantly improve model performance across the breast cancer care continuum. Consider adopting dual-branch visual encoders to effectively handle diverse imaging scales, from standard radiology to gigapixel pathology, ensuring comprehensive clinical utility.

Key insights

Workflow-aligned data and cross-scale visual modeling are critical for clinically grounded medical MLLMs.

Principles

Clinical MLLMs need workflow-aligned data.
Cross-scale visual modeling is critical.
Multimodal reasoning spans screening, diagnosis, treatment.

Method

BreastGPT uses a dual-branch visual encoder and concept-preserving token compression to bridge the scale gap between radiology and gigapixel pathology images, trained on 1.86M instruction-following pairs.

In practice

Utilize BreastStage corpus for MLLM training.
Evaluate MLLMs with BreastStage-Bench.
Implement dual-branch visual encoders.

Topics

Breast Cancer
Multimodal LLMs
Medical Imaging
Clinical Workflow AI
Gigapixel Pathology
BreastStage Dataset

Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.