Class-Specific Branch Attention for Mitigating Gradient Interference under Class Imbalance

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

Class-Specific Branch Attention (CSBA) addresses performance degradation in deep neural networks trained under severe class imbalance, specifically targeting inter-class gradient interference within shared representations. Researchers identified that gradients from majority classes suppress minority-class learning. They introduced a diagnostic framework, including a Gradient Conflict Matrix, to quantify this interference using cosine similarity. CSBA, a lightweight modification for multi-branch convolutional architectures, enables branch-specific channel reweighting to reduce gradient coupling and promote implicit feature decoupling. Empirically, CSBA improved the F1 score for the minority Physical-Damage class from 0.261 to 0.522, a 100% relative gain, while maintaining overall accuracy. Validation on CIFAR-10-LT also showed Macro-F1 improving from 0.595 to 0.655. This approach adds a 32.6% parameter overhead, increasing model size from 1.35M to 1.79M.

Key takeaway

For Machine Learning Engineers designing deep learning models for imbalanced classification tasks, particularly those involving multi-branch architectures and critical minority classes, you should consider implementing Class-Specific Branch Attention (CSBA). This architectural modification directly mitigates inter-class gradient interference, a pathology not fully addressed by traditional data or loss reweighting. CSBA significantly improves minority-class F1 scores with a modest 32.6% parameter increase, offering a practical balance between performance gains and computational efficiency for deployment.

Key insights

Mitigating inter-class gradient interference in shared representations significantly improves minority-class learning under severe imbalance.

Principles

Method

Class-Specific Branch Attention (CSBA) computes branch-specific channel attention vectors from global average pooled features via a two-layer MLP, then applies them to the shared feature map using Hadamard multiplication.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.