MinShap: A Modified Shapley Value Approach for Feature Selection
Summary
MinShap is a novel feature selection algorithm that adapts Shapley values for improved performance in machine learning models, particularly when dealing with unknown non-linear relationships and dependent features. Unlike traditional Shapley values, which average marginal contributions, MinShap considers the minimum marginal contribution across feature permutations. This approach is theoretically grounded in the faithfulness assumption of Directed Acyclic Graphical (DAG) models and offers a guarantee for Type I error. Numerical simulations and real-world data experiments demonstrate that MinShap surpasses existing feature selection algorithms like LOCO, GCM, and Lasso in both accuracy and stability. The framework also includes related algorithms that leverage a multiple testing/p-value perspective, enhancing performance in low-sample environments with supporting theoretical guarantees.
Key takeaway
For research scientists and AI engineers working on feature selection in complex, non-linear models with highly dependent features, MinShap offers a robust alternative to traditional methods. You should consider integrating MinShap into your workflow, especially when accuracy and stability are paramount, as it has demonstrated superior performance over algorithms like LOCO, GCM, and Lasso. Explore its related algorithms for improved results in data-scarce scenarios.
Key insights
MinShap adapts Shapley values for feature selection by focusing on minimum marginal contributions, improving accuracy and stability.
Principles
- Minimum marginal contribution enhances feature selection.
- Faithfulness assumption supports theoretical guarantees.
Method
MinShap calculates the minimum marginal contribution across feature permutations, rather than the average, to identify relevant features.
In practice
- Apply MinShap for non-linear models with dependent features.
- Utilize MinShap's related algorithms in low-sample settings.
Topics
- MinShap
- Feature Selection
- Shapley Values
- Directed Acyclic Graphical Models
- Multiple Testing
Best for: AI Engineer, Research Scientist, AI Scientist, Data Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.