Towards Accurate and Calibrated Classification: Regularizing Cross-Entropy From A Generative Perspective

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

Researchers from the University of Pennsylvania introduce Generative Cross-Entropy (GCE), a novel loss function designed to improve both predictive accuracy and calibration in deep neural networks. Modern DNNs often suffer from overconfidence due to overfitting the negative log-likelihood, and while existing solutions like focal loss improve calibration, they typically reduce accuracy. GCE addresses this by maximizing the posterior likelihood $p(xy)$, which is equivalent to cross-entropy augmented with a class-level confidence regularizer. The method is strictly proper under mild conditions and, when combined with an adaptive piecewise temperature scaling (ATS) technique, achieves calibration competitive with focal-loss variants without sacrificing accuracy. Experiments on CIFAR-10/100, Tiny-ImageNet, and a medical imaging benchmark (Tau PET AV1451) demonstrate GCE's consistent improvements in both metrics, especially in long-tailed scenarios, and its negligible computational overhead compared to standard cross-entropy.

Key takeaway

For AI Engineers deploying deep learning models in high-stakes applications like medical diagnosis or autonomous driving, you should consider integrating Generative Cross-Entropy (GCE) into your training pipeline. GCE offers a principled way to achieve better-calibrated confidence estimates and higher predictive accuracy simultaneously, overcoming the traditional trade-off seen with focal loss variants. Your models will be more trustworthy, especially when combined with adaptive temperature scaling, leading to more reliable downstream decision-making.

Key insights

Generative Cross-Entropy improves DNN accuracy and calibration by regularizing class-level confidence from a generative perspective.

Principles

Method

GCE reformulates the training objective to maximize $p(xy)$, equivalent to cross-entropy plus a class-level confidence regularizer. It is complemented by an adaptive piecewise temperature scaling for post-hoc calibration.

In practice

Topics

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.