ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection
Summary
ViCrop-Det is a training-free inference framework designed to enhance small-object detection in Transformer-based architectures by addressing spatial heterogeneity. It introduces adaptive spatial trust region shrinkage, inspired by attention entropy used in anomaly segmentation. The framework utilizes the detection decoder's cross-attention distribution as an endogenous probe, employing Spatial Attention Entropy (SAE) to evaluate local spatial ambiguity. This allows ViCrop-Det to dynamically route a fixed computational budget to regions with high target saliency and cognitive uncertainty, thereby recovering fine-grained features without architectural modifications. Evaluations on VisDrone and DOTA-v1.5 datasets show competitive performance enhancements, adding +1-3 mAP@50 to RT-DETR-R50 and Deformable DETR with a 20-23% latency overhead. On MS COCO, it improves $AP_{S}$ while maintaining $AP_{M}/AP_{L}$ stability.
Key takeaway
For research scientists developing or deploying Transformer-based object detection models, ViCrop-Det offers a training-free method to significantly improve small-object detection performance. You should consider integrating this adaptive spatial cropping technique to enhance fine-grained feature recovery, especially in dense scenes, without incurring major architectural changes or prohibitive latency increases.
Key insights
ViCrop-Det improves small-object detection by adaptively cropping regions using spatial attention entropy, without retraining.
Principles
- Spatial heterogeneity degrades local features.
- Attention entropy can guide spatial routing.
- Dynamic cropping resolves spatial ambiguity.
Method
ViCrop-Det uses a detection decoder's cross-attention distribution to calculate Spatial Attention Entropy (SAE), guiding dynamic spatial routing and shrinking the trust region to inject localized observations.
In practice
- Enhance small-object detection in existing DETR models.
- Apply attention entropy for spatial ambiguity assessment.
Topics
- ViCrop-Det
- Small-Object Detection
- Spatial Attention Entropy
- Training-Free Inference
- Adaptive Spatial Routing
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.