ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Computer Vision · Depth: Expert, quick

Summary

ViCrop-Det is a training-free inference framework designed to enhance small-object detection in Transformer-based architectures by addressing spatial heterogeneity. It introduces adaptive spatial trust region shrinkage, inspired by attention entropy used in anomaly segmentation. The framework utilizes the detection decoder's cross-attention distribution as an endogenous probe, employing Spatial Attention Entropy (SAE) to evaluate local spatial ambiguity. This allows ViCrop-Det to dynamically route a fixed computational budget to regions with high target saliency and cognitive uncertainty, thereby recovering fine-grained features without architectural modifications. Evaluations on VisDrone and DOTA-v1.5 datasets show competitive performance enhancements, adding +1-3 mAP@50 to RT-DETR-R50 and Deformable DETR with a 20-23% latency overhead. On MS COCO, it improves $AP_{S}$ while maintaining $AP_{M}/AP_{L}$ stability.

Key takeaway

For research scientists developing or deploying Transformer-based object detection models, ViCrop-Det offers a training-free method to significantly improve small-object detection performance. You should consider integrating this adaptive spatial cropping technique to enhance fine-grained feature recovery, especially in dense scenes, without incurring major architectural changes or prohibitive latency increases.

Key insights

ViCrop-Det improves small-object detection by adaptively cropping regions using spatial attention entropy, without retraining.

Principles

Method

ViCrop-Det uses a detection decoder's cross-attention distribution to calculate Spatial Attention Entropy (SAE), guiding dynamic spatial routing and shrinking the trust region to inject localized observations.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.