SAM3 Self-Distillation for Fine-Grained GOOSE 2D Semantic Segmentation

2026-06-18 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

A 4th-place entry in the ICRA 2026 GOOSE 2D Fine-Grained Semantic Segmentation Challenge achieved a composite mean Intersection-over-Union (mIoU) of 69.73% on the 1,815-image test set. The model adapts the Segment Anything Model 3 (SAM3) image encoder with a lightweight decoder. Key contributions include a novel self-distillation scheme that re-uses SAM3, prompted with ground-truth boxes, as a teacher for classes where it outperforms the custom model. Additionally, an image-level multi-scale test-time augmentation method was introduced, which restores multi-scale inference for fixed-input-size models by rescaling the image itself. An aggressive photometric distortion technique, transplanted from a winning 2025 GOOSE 2D entry, was identified as the single largest source of performance improvement.

Key takeaway

For Machine Learning Engineers optimizing fine-grained semantic segmentation, consider integrating a self-distillation strategy where powerful foundation models like SAM3 act as teachers. You should also explore image-level multi-scale test-time augmentation to restore multi-scale inference without complex model changes. Furthermore, aggressively applying photometric distortions, even those from prior challenge winners, can provide substantial performance uplifts for your pipeline.

Key insights

Self-distillation using SAM3 as a teacher, combined with image-level multi-scale augmentation and aggressive photometric distortion, significantly boosts fine-grained semantic segmentation.

Principles

Re-use powerful foundation models as teachers.
Image-level scaling enables multi-scale inference.
Aggressive photometric distortion can yield large gains.

Method

The approach involves adapting a SAM3 image encoder with a lightweight decoder, applying a self-distillation scheme using SAM3 as a teacher, and employing image-level multi-scale test-time augmentation.

In practice

Apply SAM3 as a self-distillation teacher.
Implement image-level multi-scale TTA.
Experiment with aggressive photometric distortions.

Topics

Semantic Segmentation
SAM3
Self-Distillation
Test-Time Augmentation
Photometric Distortion
Computer Vision Challenges

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.