Explainable Outlier Detection for Interval-valued Data

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A novel framework introduces explainable outlier detection for interval-valued data, a complex data structure common in aggregated datasets. Building on the robust Interval Minimum Covariance Determinant (IMCD) estimator, the approach utilizes the Shapley value to explain outlyingness. It provides a closed-form expression for the Shapley value of the squared robust Interval-Mahalanobis distance, allowing efficient computation. This method decomposes outlier contributions into centers, ranges, and cross-terms of interval observations, offering fine-grained interpretation. It also connects to cellwise outliers and extends to a Shapley interaction index for pairwise variable effects. The utility is demonstrated using two real-world datasets: Cars and Spotify, with cutoff values like 0.9 and 0.95 used for outlier flagging.

Key takeaway

For data scientists working with complex interval-valued datasets, this framework offers a critical tool for understanding *why* specific observations are flagged as outliers. You can move beyond mere detection to pinpoint which features, or combinations thereof, drive atypical behavior, enabling more informed decision-making and targeted data quality improvements. Consider integrating the AIDA R package to apply this explainability to your interval data analysis workflows.

Key insights

Shapley values explain interval-valued data outliers by decomposing their robust Mahalanobis distance contributions.

Principles

Method

Derive a closed-form Shapley value for squared robust Interval-Mahalanobis distance, decomposing contributions into centers, ranges, and cross-terms, then extend to a Shapley interaction index.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.