Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Medical AI & Imaging · Depth: Expert, quick

Summary

A new metric, the C-Score (Consistency Score), has been proposed to quantify the intra-class explanation reproducibility of Class Activation Mapping (CAM) methods in medical image classification. Unlike existing evaluation frameworks that focus on localization fidelity against radiologist annotations, the C-Score is confidence-weighted and annotation-free, measuring whether a model applies consistent spatial reasoning for the same pathology across different patients. The metric uses intensity-emphasized pairwise soft IoU across correctly classified instances. Six CAM techniques (GradCAM, GradCAM++, LayerCAM, EigenCAM, ScoreCAM, MS GradCAM++) were evaluated across three CNN architectures (DenseNet201, InceptionV3, ResNet50V2) over thirty training epochs on the Kermany chest X-ray dataset. The study identified three mechanisms of AUC-consistency dissociation and demonstrated that C-Score can provide an early warning for model instability, detecting ScoreCAM deterioration on ResNet50V2 one full checkpoint before catastrophic AUC collapse.

Key takeaway

For AI Scientists developing medical imaging classifiers, integrating the C-Score into your evaluation pipeline is crucial. This metric provides an early warning signal for model instability and can inform architecture-specific clinical deployment recommendations based on explanation quality, not just predictive accuracy. You should consider C-Score alongside traditional AUC metrics to ensure robust and reliable model behavior in critical applications.

Key insights

The C-Score quantifies explanation consistency in CAM methods, revealing model instability beyond classification metrics.

Principles

Method

The C-Score quantifies intra-class explanation reproducibility using confidence-weighted, annotation-free, intensity-emphasized pairwise soft IoU across correctly classified instances.

In practice

Topics

Best for: AI Scientist, Research Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.