A Chain-of-thought Reasoning Breast Ultrasound Dataset Covering All Histopathology Categories
Summary
The BUS-CoT dataset, released in February 2026, provides 11,439 breast ultrasound images from 11,850 lesions across 4,838 patients, specifically designed for chain-of-thought (CoT) reasoning analysis in AI development. This dataset is notable for covering all 99 WHO-defined histopathology categories, addressing a critical limitation in existing public benchmarks regarding data scale and annotation richness. A high-quality subset of 5,163 lesion-focused images, annotated and verified by experienced radiologists, is included for model training and evaluation. The dataset also features reasoning processes constructed from observation, feature, diagnosis, and pathology labels, aiming to foster robust AI systems capable of handling rare clinical cases that are often error-prone. The data and associated Python 3.11+ code, based on PyTorch 2.4.0+cu121 and ModelScope SWIFT 2.0+, are publicly available on Figshare and Zenodo.
Key takeaway
For research scientists and computer vision engineers developing diagnostic AI, the BUS-CoT dataset offers an unprecedented resource for training models on a comprehensive range of breast lesion histopathology. You should integrate this dataset to build AI systems that not only classify but also explain their reasoning, particularly for rare or challenging cases, thereby improving diagnostic accuracy and interpretability in clinical settings.
Key insights
BUS-CoT is a large, richly annotated breast ultrasound dataset enabling AI development for chain-of-thought reasoning across all histopathology types.
Principles
- Comprehensive histopathology coverage improves AI robustness for rare cases.
- Chain-of-thought annotations enhance AI reasoning capabilities.
Method
The dataset constructs reasoning processes by linking observation, feature, diagnosis, and pathology labels, annotated and verified by expert radiologists, to facilitate CoT analysis in AI models.
In practice
- Train vision-language models using PyTorch and ModelScope SWIFT.
- Develop AI for rare disease detection using the full histopathology range.
Topics
- Breast Ultrasound
- Chain-of-Thought Reasoning
- Medical Imaging Datasets
- AI Diagnosis
- Histopathology
Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.