Trusting Right Predictions for Wrong Reasons: A LIME Based Analysis of Deep Learning Interpretability in Lung Cancer Diagnosis
Summary
A study investigating deep learning interpretability in lung cancer diagnosis compared a Convolutional Neural Network (CNN), a pretrained ResNet50, and a Vision Transformer (ViT) trained on the IQ-OTH/NCCD lung cancer CT dataset. While all three models achieved strong classification performance—ResNet50 with 98.61% accuracy, CNN 97.91%, and ViT 93.75%, alongside ROC-AUC scores of 0.99—their decision-making processes were analyzed using Local Interpretable Model-Agnostic Explanations (LIME). The research found high prediction correlations exceeding 0.99 across model pairs, yet LIME explanation correlations remained below 0.26, indicating significant differences in the image regions used for predictions. Misclassified samples consistently showed attention outside the lung parenchyma, contrasting with correct predictions focusing within lung regions. This highlights that prediction agreement does not imply consistent reasoning.
Key takeaway
For AI Scientists developing deep learning models for lung cancer diagnosis, you must independently validate model interpretability alongside predictive performance. Relying solely on high accuracy or prediction agreement risks deploying models that make correct predictions for clinically irrelevant reasons. Integrate tools like LIME to scrutinize the specific image regions your models use, ensuring their reasoning aligns with medical expertise and focuses on relevant anatomical structures like lung parenchyma.
Key insights
Deep learning models can achieve high predictive accuracy in medical imaging for different reasons, requiring independent interpretability validation.
Principles
- Prediction agreement does not imply reasoning consistency.
- Interpretability evaluation is an independent validation criterion.
- Model attention outside lung parenchyma correlates with misclassification.
Method
The study applied Local Interpretable Model-Agnostic Explanations (LIME) to CNN, ResNet50, and ViT models. A dual-correlation framework measured both prediction and explanation agreement.
In practice
- Use LIME to analyze model decision regions.
- Validate model reasoning beyond accuracy metrics.
- Examine attention maps for clinical relevance.
Topics
- Lung Cancer Diagnosis
- Deep Learning Interpretability
- LIME Explanations
- Medical Imaging AI
- Convolutional Neural Networks
- Vision Transformers
Best for: AI Scientist, Computer Vision Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.