Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, medium

Summary

A new hierarchical VFM-driven knowledge distillation framework is proposed to stabilize point-supervised Infrared Small Target Detection (ISTD). This method addresses the issue of lightweight CNN detectors lacking sufficient semantics, which leads to noisy pseudo-masks and unstable optimization when using point supervision. The framework formulates point-supervised learning as a bilevel optimization process, where an inner loop adapts a Vision Foundation Model (VFM)-embedded teacher on reweighted samples, and an outer loop transfers validation-guided knowledge to a lightweight student. Additionally, the framework introduces Semantic-Conditioned Affine Modulation (SCAM) to inject VFM semantics into CNN features and employs a dynamic collaborative learning strategy with cluster-level sample reweighting to enhance robustness. Experiments across multiple ISTD backbones demonstrate consistent improvements in detection accuracy and training stability.

Key takeaway

For research scientists developing lightweight CNN detectors for ISTD with limited annotations, this framework offers a robust approach to overcome noisy pseudo-masks and unstable optimization. By adopting the hierarchical VFM-driven knowledge distillation and Semantic-Conditioned Affine Modulation, you can significantly improve detection accuracy and training stability, making your models more reliable for real-world applications.

Key insights

Hierarchical knowledge distillation with VFM semantics stabilizes point-supervised infrared small target detection.

Principles

Method

The method uses a bilevel optimization with a VFM-embedded teacher and a lightweight student, incorporating Semantic-Conditioned Affine Modulation (SCAM) and dynamic collaborative learning with cluster-level sample reweighting.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.