What Is the Prediction Actually Made Of?
Summary
This post examines the limitations of SHAP (SHapley Additive exPlanations) for retrieval-based machine learning models, particularly in the context of tabular in-context learning (ICL). It argues that while SHAP provides additive feature attributions, it fails to explain the underlying mechanism of how predictions are formed in models like k-nearest neighbors (kNN), TabPFN, and TabICL. Using soft kNN as a transparent teaching model, the author demonstrates four critical aspects SHAP misses: neighbor identity, label disagreement, neighbor switching versus structural effects, and local data density/extrapolation. The article introduces an "inner explanation" framework that focuses on the weighted neighborhood of training examples, providing metrics like Neff (effective neighbor count), Δy (label dispersion), and Δx (local radius) to offer a more complete understanding of model predictions and their stability.
Key takeaway
For Machine Learning Engineers building or deploying retrieval-based models, relying solely on SHAP for interpretability can be misleading. You should integrate "inner explanations" by analyzing neighbor weights, effective neighbor count (Neff), label dispersion (Δy), and local radius (Δx) to understand prediction stability and confidence. This dual-layer approach provides a more robust understanding of model behavior, especially when predictions are accidental averages or based on sparse data, informing better model debugging and risk assessment.
Key insights
SHAP alone is insufficient for explaining retrieval-based model predictions, which require an "inner explanation" of neighbor influence.
Principles
- Prediction is an average over weighted training examples.
- SHAP explains query features, not underlying retrieval mechanisms.
- Soft kNN weights are exact local label influences.
Method
The proposed explanation workflow prioritizes inner explanation (neighbor weights, Neff, Δy, Δx) first, followed by SHAP as an outer feature summary, correlating SHAP with neighborhood-formation scores.
In practice
- Use Neff to gauge effective neighbor count.
- Monitor Δy for label disagreement in neighborhoods.
- Assess Δx for query support by nearby data.
Topics
- SHAP Feature Attribution
- Soft kNN Explanations
- Tabular In-Context Learning
- Model Interpretability
- Prediction Stability
Code references
Best for: AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.