QG-MIL: A Gated Transformer Aggregator for Domain-Agnostic Multiple Instance Learning in Medical Imaging

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Medical Devices & Health Technology · Depth: Expert, quick

Summary

QG-MIL introduces a gated transformer aggregator designed to overcome attention concentration issues in Multiple Instance Learning (MIL) aggregators used in medical imaging, which often lead to overconfident and unstable predictions. This novel architecture incorporates four key components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. These elements collectively stabilize training and distribute attention more uniformly across instances without requiring auxiliary losses or multi-stage regularization. Evaluated across six benchmarks covering whole-slide pathology and cell-level hematology, QG-MIL variants consistently outperformed leading baselines, achieving an average improvement of +6.1 mean macro F1 points. The design ensures consistent cross-domain performance and reduced variance.

Key takeaway

For Machine Learning Engineers developing medical imaging diagnostics, QG-MIL offers a robust solution to common Multiple Instance Learning challenges. If your current attention-based MIL models suffer from overconfident or unstable predictions, you should consider integrating QG-MIL's gated transformer aggregator. Its design, confirmed by +6.1 mean macro F1 improvement across diverse benchmarks, provides more distributed attention and consistent cross-domain performance, enhancing diagnostic reliability.

Key insights

QG-MIL stabilizes attention in MIL aggregators for medical imaging, improving prediction stability and cross-domain performance.

Principles

Method

QG-MIL integrates RMSNorm pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules to stabilize attention.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.