ARTEMIS: Agent-guided Reliability-aware Temporal Mask Evolution for Imperfectly Supervised Video Polyp Segmentation
Summary
ARTEMIS, a unified framework for imperfectly supervised video polyp segmentation (VPS), addresses challenges like weak contrast, motion blur, and sparse pixel-level guidance in medical imaging. While SAM2 can generate initial dense masks from weak annotations (points, scribbles) or semi-supervision, direct pseudo-labeling often results in geometry-degraded masks and underutilizes temporal consistency. ARTEMIS overcomes these by initializing coarse masks, then employing a debate-and-judge vision-language agent to select reliable temporal anchors. These anchors are bidirectionally propagated with SAM2 to refine unreliable or unlabeled frames. Finally, the framework trains the segmenter using temporal reliability-aware robust learning, incorporating reliability-guided reference selection, a Reference Prototype Transport Module, and a reliability-aware robust loss. Experiments on SUN-SEG and CVC-ClinicDB-612 datasets demonstrate that ARTEMIS achieves leading performance across scribble, point, and limited-label settings.
Key takeaway
For Computer Vision Engineers developing medical image segmentation with limited labels, ARTEMIS offers a robust framework. You should consider its agent-guided reliability and temporal mask evolution to improve accuracy and consistency in video polyp segmentation, especially when dealing with weak annotations or semi-supervision. Explore the upcoming code release to integrate these advanced techniques into your projects.
Key insights
ARTEMIS improves imperfectly supervised video polyp segmentation by integrating agent-guided reliability and temporal mask evolution.
Principles
- Reliability assessment improves weak supervision.
- Temporal consistency refines sparse labels.
- Robust learning down-weights noisy data.
Method
ARTEMIS initializes masks, uses a vision-language agent to select reliable temporal anchors, propagates them bidirectionally with SAM2, and trains with reliability-aware robust learning.
In practice
- Apply SAM2 for initial weak mask generation.
- Use agent-guided selection for anchor reliability.
- Implement robust loss for noisy labels.
Topics
- Video Polyp Segmentation
- Imperfect Supervision
- Temporal Consistency
- Reliability-aware Learning
- Vision-Language Agents
- SAM2
Code references
Best for: AI Scientist, Computer Vision Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.