Path-Sampled Integrated Gradients
Summary
Path-Sampled Integrated Gradients (PS-IG) is a new framework designed to improve feature attribution in deep neural networks by addressing the limitations of standard Integrated Gradients (IG), specifically its dependence on baseline choice and susceptibility to gradient noise. PS-IG computes the expected attribution over baselines sampled along a linear interpolation path, rather than relying on a single reference point or global dataset sampling. The framework proves mathematically equivalent to Path-Weighted Integrated Gradients (PWIG), where the weighting function matches the cumulative distribution function of the sampling density. This equivalence allows for a deterministic Riemann sum evaluation, enhancing the error convergence rate from O(m^-1/2) to O(m^-1) for smooth models. Furthermore, PS-IG acts as a variance-reducing filter, strictly lowering attribution variance by a factor of 1/3 under uniform sampling, while preserving key axiomatic properties like linearity and implementation invariance.
Key takeaway
For research scientists developing or applying interpretability methods, PS-IG offers a robust and computationally efficient alternative to standard Integrated Gradients. You should consider adopting PS-IG to mitigate issues of baseline sensitivity and gradient noise, as its deterministic implementation provides faster convergence and significantly reduced attribution variance without additional computational overhead. This can lead to more reliable and trustworthy explanations for deep learning model predictions.
Key insights
PS-IG unifies stochastic baseline sampling with deterministic path weighting for robust, efficient feature attribution.
Principles
- Stochastic sampling can be deterministically weighted.
- Weighting functions reduce gradient noise variance.
- Completeness axiom adapts for stochastic baselines.
Method
PS-IG computes expected IG over baselines sampled along a linear path, then converts this expectation into a deterministic, weighted path integral using the sampling density's CDF for efficient calculation.
In practice
- Use PS-IG for more stable attribution maps.
- Apply deterministic Riemann sum for faster convergence.
- Consider uniform sampling for 1/3 variance reduction.
Topics
- Path-Sampled Integrated Gradients
- Feature Attribution
- Integrated Gradients
- Variance Reduction
- Computational Efficiency
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.