RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers
Summary
A recent study submitted on March 16, 2026, details a deep learning approach for multi-label classification of RARE diseases from capsule endoscopic videos (CEV). The researchers fine-tuned a Google Vision Transformer (ViT) with a batch size of 16 and 224x224 pixel resolution for this task. The model was trained to classify 17 distinct labels, including anatomical locations like mouth, esophagus, and stomach, as well as pathological findings such as active bleeding, angiectasia, erosion, polyp, and ulcer. On a test dataset comprising three videos, the system achieved an overall mean Average Precision (mAP) of 0.0205 at an Intersection over Union (IoU) threshold of 0.5, and 0.0196 at an IoU threshold of 0.95.
Key takeaway
For computer vision engineers developing diagnostic tools for gastroenterology, this work demonstrates the application of Vision Transformers to identify multiple rare diseases from capsule endoscopic videos. You should consider fine-tuning pre-trained ViT models for similar multi-label classification tasks in medical imaging, especially when dealing with diverse pathological findings. Evaluate performance using mAP @0.5 and mAP @0.95 for comprehensive assessment.
Key insights
Vision Transformers can be fine-tuned for multi-label classification of gastrointestinal diseases from capsule endoscopic videos.
Principles
- ViT models are adaptable for medical image analysis.
- Multi-label classification addresses diverse pathologies.
Method
The method involves fine-tuning a Google Vision Transformer (ViT) with batch16 and 224x224 resolution on capsule endoscopic videos to classify 17 specific gastrointestinal labels.
In practice
- Apply ViT for medical video analysis.
- Consider multi-label classification for complex diagnoses.
Topics
- Vision Transformers
- Capsule Endoscopy
- Medical Image Classification
- Multi-label Classification
- Gastrointestinal Disease Detection
Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.