Metamorphic Testing with the Rashomon Set: Explanation Faithfulness in Machine Learning
Summary
A new framework leverages metamorphic testing to assess the faithfulness of machine learning model explanations, addressing the "Rashomon effect" where models with comparable predictive performance produce divergent feature-based explanations. This approach formalizes expected consistency between model behavior and feature attributions through five distinct metamorphic relations, eliminating the need for ground-truth labels. The framework was demonstrated on two tabular regression datasets using two common post-hoc explainers, SHAP and LIME. This model-agnostic tool provides a practical method for identifying accurate models that also offer reliable and trustworthy explanations, enhancing confidence in explainable AI systems.
Key takeaway
For Machine Learning Engineers evaluating model explainability, this framework offers a robust method to assess explanation faithfulness. You should integrate metamorphic testing, specifically using the five proposed relations, into your model selection pipeline to ensure explanations from methods like SHAP or LIME are trustworthy, even without ground-truth labels. This enhances confidence in deploying models where interpretability is critical.
Key insights
Metamorphic testing with the Rashomon Set assesses ML explanation faithfulness without ground truth.
Principles
- Models with similar performance can have divergent explanations.
- Explanation faithfulness can be tested without ground truth.
- Consistency properties formalize model behavior and attributions.
Method
The framework applies metamorphic testing using five relations to formalize consistency properties between model behavior and feature attributions, exploring attributed feature importance from post-hoc explanation methods.
In practice
- Apply to tabular regression datasets.
- Evaluate SHAP and LIME explainers.
- Select models with reliable explanations.
Topics
- Metamorphic Testing
- Explanation Faithfulness
- Rashomon Effect
- Explainable AI
- SHAP
- LIME
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.