When SHAP and LIME Fail: Lessons from Predicting Quality in the Automotive Industry
Summary
This article examines the failure modes of popular machine learning explainability tools, SHAP and LIME, specifically within the context of predicting material and composition defects in the automotive industry. Using the UCI Automobile dataset as a reproducible example, it demonstrates how these tools can produce misleading, unstable, or incorrect explanations, unlike predictions that can be verified against ground truth. The analysis reveals that LIME explanations can be unstable across multiple runs for the same instance, SHAP can inaccurately distribute credit among highly correlated features, and global SHAP plots can obscure critical part-specific patterns. The content highlights the absence of ground truth for explanations, making it difficult to validate their accuracy and potentially leading to flawed engineering decisions.
Key takeaway
For Machine Learning Engineers and Data Scientists working on quality prediction in manufacturing, you must approach explainability tools with skepticism. Do not treat SHAP or LIME outputs as definitive answers without rigorous validation. Implement stability checks for LIME, analyze feature correlations before interpreting SHAP, and always segment your data to reveal nuanced patterns. Consider simpler, more interpretable models if explainability is a hard requirement, as an understandable model with slightly lower accuracy often outweighs a black box with unreliable explanations.
Key insights
Explainability tools like SHAP and LIME have critical failure modes that can lead to misleading insights, especially with correlated data.
Principles
- Explanation stability is crucial for actionable insights.
- Correlated features distort SHAP credit distribution.
- Global explanations can hide segment-specific patterns.
Method
The article demonstrates failure modes by applying SHAP, LIME, and built-in feature importance to the UCI Automobile dataset, comparing their outputs and analyzing their behavior under specific conditions like feature correlation and repeated LIME runs.
In practice
- Triangulate explanations across multiple methods.
- Stability-test LIME (70%+ consistency for top feature).
- Check feature correlations before interpreting SHAP (|r| > 0.7).
Topics
- Machine Learning Explainability
- SHAP
- LIME
- Feature Importance
- Automotive Quality Prediction
Best for: Machine Learning Engineer, Data Scientist, AI Operations Specialist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.