BreastGPT: A Multimodal Large Language Model for the Full Spectrum of Breast Cancer Clinical Routine
Summary
BreastGPT, a new multimodal large language model (MLLM), has been developed to support the entire breast cancer clinical workflow, encompassing screening, diagnosis, and treatment planning. Existing medical MLLMs often struggle with data scarcity and limited versatility across diverse tasks and imaging modalities. To overcome these challenges, the researchers introduced BreastStage, a comprehensive workflow-aligned breast imaging instruction corpus containing 1.86 million instruction-following pairs derived from 17 sub-datasets, 5 imaging modalities, and 136 task templates. A held-out split, BreastStage-Bench, serves as a benchmark for evaluating multimodal reasoning. BreastGPT itself features a dual-branch visual encoder and concept-preserving token compression, enabling it to process both standard radiology and gigapixel pathology images. On BreastStage-Bench, BreastGPT achieved 75.66% closed-ended accuracy and an 89.92% open-ended score, surpassing other general-purpose and medical-specific MLLMs. This performance underscores the importance of workflow-aligned data and cross-scale visual modeling for clinically relevant medical MLLMs.
Key takeaway
For AI Scientists developing medical MLLMs for oncology, you should prioritize integrating workflow-aligned data and cross-scale visual modeling. Utilizing resources like BreastStage and BreastStage-Bench can significantly improve model performance across the breast cancer care continuum. Consider adopting dual-branch visual encoders to effectively handle diverse imaging scales, from standard radiology to gigapixel pathology, ensuring comprehensive clinical utility.
Key insights
Workflow-aligned data and cross-scale visual modeling are critical for clinically grounded medical MLLMs.
Principles
- Clinical MLLMs need workflow-aligned data.
- Cross-scale visual modeling is critical.
- Multimodal reasoning spans screening, diagnosis, treatment.
Method
BreastGPT uses a dual-branch visual encoder and concept-preserving token compression to bridge the scale gap between radiology and gigapixel pathology images, trained on 1.86M instruction-following pairs.
In practice
- Utilize BreastStage corpus for MLLM training.
- Evaluate MLLMs with BreastStage-Bench.
- Implement dual-branch visual encoders.
Topics
- Breast Cancer
- Multimodal LLMs
- Medical Imaging
- Clinical Workflow AI
- Gigapixel Pathology
- BreastStage Dataset
Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.