SegMoTE: Token-Level Mixture of Experts for Medical Image Segmentation

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Medical Imaging · Depth: Expert, extended

Summary

SegMoTE is an efficient and adaptive framework for medical image segmentation that extends the Segment Anything Model (SAM). It addresses challenges in medical imaging, such as modality heterogeneity and high annotation costs, by introducing a token-level Mixture of Experts (MoTE) mechanism and Progressive Prompt Tokenization (PPT). SegMoTE preserves SAM's zero-shot generalization and efficient inference while adding only 17M learnable parameters, approximately 1.4% of SAM's total. It was trained on MedSeg-HQ, a curated dataset of 0.15M high-quality masks, which is less than 1% the size of existing large-scale datasets. The model achieves state-of-the-art performance, improving 1% to 6% over the second-best approach on diverse in-domain and out-of-domain medical imaging tasks, including a 7% improvement on the ISLES dataset.

Key takeaway

For AI Scientists and Computer Vision Engineers developing medical image segmentation solutions, SegMoTE offers a path to superior performance with significantly reduced data and computational overhead. You should consider integrating token-level Mixture of Experts and progressive prompt tokenization into your SAM-based workflows. This approach allows for robust, modality-adaptive segmentation and interaction-free inference for binary tasks, even when working with limited, high-quality annotated data.

Key insights

SegMoTE adapts SAM for medical image segmentation using token-level experts and progressive prompting, achieving high performance with minimal data.

Principles

Modality-adaptive experts enhance generalization.
High-quality data trumps large-scale data.
Progressive prompting reduces annotation burden.

Method

SegMoTE freezes the SAM encoder, uses MoTE for dynamic expert token selection and updating, and employs Progressive Prompt Tokenization for automatic segmentation, all guided by a load balancing loss.

In practice

Use token-level MoE for multimodal adaptation.
Curate high-quality datasets over raw scale.
Implement progressive prompting for automation.

Topics

Medical Image Segmentation
Segment Anything Model
Mixture of Experts
Token-Level Experts
Progressive Prompt Tokenization
MedSeg-HQ Dataset

Best for: AI Scientist, Computer Vision Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.