A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Advanced, quick

Summary

A large-scale study involving over 2000 experiments investigates the accuracy-versus-cost trade-offs in fine-grained image recognition (FGIR) across various training and evaluation settings. The research utilized 9 pretrained backbones and 17 datasets, focusing on aspects beyond just backbone selection. Key findings highlight the effectiveness of data augmentation for fine-grained training. The study extends Counterfactual Attention Learning (CAL), a method employing data-aware cropping and masking augmentations, by integrating cross-image discriminative region mixing. Furthermore, an efficient evaluation-only variant is proposed, which maintains competitive accuracy while significantly reducing inference costs by eliminating the forward pass on discriminative crops typically used by CAL and similar FGIR methods. The results demonstrate that data-aware augmentations during training alone can achieve high accuracy without requiring crops during inference.

Key takeaway

For AI Engineers and Research Scientists optimizing fine-grained image recognition models, prioritize data-aware augmentations during training. This approach can yield excellent accuracy while substantially reducing inference costs by eliminating the need for discriminative crops during evaluation. Evaluate the proposed efficient evaluation-only variant to achieve competitive performance with lower operational expenses.

Key insights

Data-aware augmentations during training can significantly reduce FGIR inference costs while maintaining high accuracy.

Principles

Method

The study extends Counterfactual Attention Learning (CAL) with cross-image discriminative region mixing and proposes an evaluation-only variant that skips discriminative crop forward passes.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.