A Chain-of-thought Reasoning Breast Ultrasound Dataset Covering All Histopathology Categories

· Source: Machine learning : nature.com subject feeds · Field: Health & Wellbeing — Artificial Intelligence & Machine Learning, Health & Medical Research, Medical Specialties & Subspecialties · Depth: Expert, extended

Summary

The BUS-CoT dataset, released in February 2026, provides 11,439 breast ultrasound images from 11,850 lesions across 4,838 patients, specifically designed for chain-of-thought (CoT) reasoning analysis in AI development. This dataset is notable for covering all 99 WHO-defined histopathology categories, addressing a critical limitation in existing public benchmarks regarding data scale and annotation richness. A high-quality subset of 5,163 lesion-focused images, annotated and verified by experienced radiologists, is included for model training and evaluation. The dataset also features reasoning processes constructed from observation, feature, diagnosis, and pathology labels, aiming to foster robust AI systems capable of handling rare clinical cases that are often error-prone. The data and associated Python 3.11+ code, based on PyTorch 2.4.0+cu121 and ModelScope SWIFT 2.0+, are publicly available on Figshare and Zenodo.

Key takeaway

For research scientists and computer vision engineers developing diagnostic AI, the BUS-CoT dataset offers an unprecedented resource for training models on a comprehensive range of breast lesion histopathology. You should integrate this dataset to build AI systems that not only classify but also explain their reasoning, particularly for rare or challenging cases, thereby improving diagnostic accuracy and interpretability in clinical settings.

Key insights

BUS-CoT is a large, richly annotated breast ultrasound dataset enabling AI development for chain-of-thought reasoning across all histopathology types.

Principles

Method

The dataset constructs reasoning processes by linking observation, feature, diagnosis, and pathology labels, annotated and verified by expert radiologists, to facilitate CoT analysis in AI models.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.