SAM3 Self-Distillation for Fine-Grained GOOSE 2D Semantic Segmentation
Summary
A 4th-place entry in the ICRA 2026 GOOSE 2D Fine-Grained Semantic Segmentation Challenge achieved a composite mean Intersection-over-Union (mIoU) of 69.73% on the 1,815-image test set. The model adapts the Segment Anything Model 3 (SAM3) image encoder with a lightweight decoder. Key contributions include a novel self-distillation scheme that re-uses SAM3, prompted with ground-truth boxes, as a teacher for classes where it outperforms the custom model. Additionally, an image-level multi-scale test-time augmentation method was introduced, which restores multi-scale inference for fixed-input-size models by rescaling the image itself. An aggressive photometric distortion technique, transplanted from a winning 2025 GOOSE 2D entry, was identified as the single largest source of performance improvement.
Key takeaway
For Machine Learning Engineers optimizing fine-grained semantic segmentation, consider integrating a self-distillation strategy where powerful foundation models like SAM3 act as teachers. You should also explore image-level multi-scale test-time augmentation to restore multi-scale inference without complex model changes. Furthermore, aggressively applying photometric distortions, even those from prior challenge winners, can provide substantial performance uplifts for your pipeline.
Key insights
Self-distillation using SAM3 as a teacher, combined with image-level multi-scale augmentation and aggressive photometric distortion, significantly boosts fine-grained semantic segmentation.
Principles
- Re-use powerful foundation models as teachers.
- Image-level scaling enables multi-scale inference.
- Aggressive photometric distortion can yield large gains.
Method
The approach involves adapting a SAM3 image encoder with a lightweight decoder, applying a self-distillation scheme using SAM3 as a teacher, and employing image-level multi-scale test-time augmentation.
In practice
- Apply SAM3 as a self-distillation teacher.
- Implement image-level multi-scale TTA.
- Experiment with aggressive photometric distortions.
Topics
- Semantic Segmentation
- SAM3
- Self-Distillation
- Test-Time Augmentation
- Photometric Distortion
- Computer Vision Challenges
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.