Trusting Right Predictions for Wrong Reasons: A LIME Based Analysis of Deep Learning Interpretability in Lung Cancer Diagnosis

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision, Medical Imaging Analysis · Depth: Expert, quick

Summary

A study investigating deep learning interpretability in lung cancer diagnosis compared a Convolutional Neural Network (CNN), a pretrained ResNet50, and a Vision Transformer (ViT) trained on the IQ-OTH/NCCD lung cancer CT dataset. While all three models achieved strong classification performance—ResNet50 with 98.61% accuracy, CNN 97.91%, and ViT 93.75%, alongside ROC-AUC scores of 0.99—their decision-making processes were analyzed using Local Interpretable Model-Agnostic Explanations (LIME). The research found high prediction correlations exceeding 0.99 across model pairs, yet LIME explanation correlations remained below 0.26, indicating significant differences in the image regions used for predictions. Misclassified samples consistently showed attention outside the lung parenchyma, contrasting with correct predictions focusing within lung regions. This highlights that prediction agreement does not imply consistent reasoning.

Key takeaway

For AI Scientists developing deep learning models for lung cancer diagnosis, you must independently validate model interpretability alongside predictive performance. Relying solely on high accuracy or prediction agreement risks deploying models that make correct predictions for clinically irrelevant reasons. Integrate tools like LIME to scrutinize the specific image regions your models use, ensuring their reasoning aligns with medical expertise and focuses on relevant anatomical structures like lung parenchyma.

Key insights

Deep learning models can achieve high predictive accuracy in medical imaging for different reasons, requiring independent interpretability validation.

Principles

Method

The study applied Local Interpretable Model-Agnostic Explanations (LIME) to CNN, ResNet50, and ViT models. A dual-correlation framework measured both prediction and explanation agreement.

In practice

Topics

Best for: AI Scientist, Computer Vision Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.