Mask to Concept: Auto-Promptable SAM3 via Efficient Test-Time Concept Embedding Search for Few-Shot Annotation

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision, Health & Medical Research · Depth: Expert, quick

Summary

Mask to Concept (M2C) is an efficient framework designed to adapt the SAM3 foundation segmentation model for medical few-shot annotation. This approach eliminates the need for external modules, parameter retraining, or manual text engineering. M2C operates by initializing a learnable concept embedding, using it to prompt segmentation, and iteratively updating the embedding through gradient-based minimization of concept segmentation error, all within SAM3's frozen architecture. Furthermore, M2C incorporates a Hybrid Uncertainty Estimation (HUE) module. HUE calculates prediction entropy and maps concept predictions to box prompts to measure concept-geometry prompting inconsistency, flagging highly uncertain samples for human correction. These corrected masks then feed back into M2C, establishing a self-enhancing annotation loop. Experiments on medical segmentation benchmarks demonstrate that M2C achieves state-of-the-art few-shot segmentation performance and outstanding annotation efficiency.

Key takeaway

For Machine Learning Engineers tasked with scaling medical data annotation, M2C offers a pathway to significantly improve efficiency. You should consider integrating this framework to adapt SAM3 for few-shot annotation without extensive retraining or manual text engineering. This approach allows you to leverage a self-enhancing loop with Hybrid Uncertainty Estimation, reducing expert effort and achieving state-of-the-art performance in medical image labeling.

Key insights

M2C adapts SAM3 for few-shot medical annotation by searching for transferable visual concepts within its frozen architecture.

Principles

Method

M2C initializes a learnable concept embedding, prompts SAM3 for segmentation, and updates the embedding via gradients minimizing segmentation error. HUE flags uncertain samples for human feedback.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.