A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition
Summary
A large-scale study involving over 2000 experiments investigates the accuracy-versus-cost trade-offs in fine-grained image recognition (FGIR) across various training and evaluation settings. The research utilized 9 pretrained backbones and 17 datasets, focusing on aspects beyond just backbone selection. Key findings highlight the effectiveness of data augmentation for fine-grained training. The study extends Counterfactual Attention Learning (CAL), a method employing data-aware cropping and masking augmentations, by integrating cross-image discriminative region mixing. Furthermore, an efficient evaluation-only variant is proposed, which maintains competitive accuracy while significantly reducing inference costs by eliminating the forward pass on discriminative crops typically used by CAL and similar FGIR methods. The results demonstrate that data-aware augmentations during training alone can achieve high accuracy without requiring crops during inference.
Key takeaway
For AI Engineers and Research Scientists optimizing fine-grained image recognition models, prioritize data-aware augmentations during training. This approach can yield excellent accuracy while substantially reducing inference costs by eliminating the need for discriminative crops during evaluation. Evaluate the proposed efficient evaluation-only variant to achieve competitive performance with lower operational expenses.
Key insights
Data-aware augmentations during training can significantly reduce FGIR inference costs while maintaining high accuracy.
Principles
- Data augmentation is crucial for fine-grained training.
- Inference cost can be reduced by optimizing evaluation settings.
Method
The study extends Counterfactual Attention Learning (CAL) with cross-image discriminative region mixing and proposes an evaluation-only variant that skips discriminative crop forward passes.
In practice
- Use data-aware augmentations during FGIR training.
- Consider evaluation-only variants for cost-efficient inference.
Topics
- Fine-Grained Image Recognition
- Accuracy-Cost Trade-offs
- Data Augmentation
- Counterfactual Attention Learning
- Inference Cost Reduction
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.