Explainable Outlier Detection for Interval-valued Data
Summary
A novel framework introduces explainable outlier detection for interval-valued data, a complex data structure common in aggregated datasets. Building on the robust Interval Minimum Covariance Determinant (IMCD) estimator, the approach utilizes the Shapley value to explain outlyingness. It provides a closed-form expression for the Shapley value of the squared robust Interval-Mahalanobis distance, allowing efficient computation. This method decomposes outlier contributions into centers, ranges, and cross-terms of interval observations, offering fine-grained interpretation. It also connects to cellwise outliers and extends to a Shapley interaction index for pairwise variable effects. The utility is demonstrated using two real-world datasets: Cars and Spotify, with cutoff values like 0.9 and 0.95 used for outlier flagging.
Key takeaway
For data scientists working with complex interval-valued datasets, this framework offers a critical tool for understanding *why* specific observations are flagged as outliers. You can move beyond mere detection to pinpoint which features, or combinations thereof, drive atypical behavior, enabling more informed decision-making and targeted data quality improvements. Consider integrating the AIDA R package to apply this explainability to your interval data analysis workflows.
Key insights
Shapley values explain interval-valued data outliers by decomposing their robust Mahalanobis distance contributions.
Principles
- Negative Shapley values indicate a stabilizing effect on outlyingness.
- Shapley interaction index quantifies joint variable contributions to outlyingness.
Method
Derive a closed-form Shapley value for squared robust Interval-Mahalanobis distance, decomposing contributions into centers, ranges, and cross-terms, then extend to a Shapley interaction index.
In practice
- Identify cellwise outliers by analyzing variable-specific Shapley values.
- Use the Shapley interaction index to uncover joint variable effects on outlyingness.
Topics
- Explainable AI
- Outlier Detection
- Interval-valued Data
- Shapley Value
- Symbolic Data Analysis
- Robust Statistics
Code references
Best for: Research Scientist, AI Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.