Aggregate Models, Not Explanations: Improving Feature Importance Estimation
Summary
A new theoretical analysis reveals that ensembling machine learning models at the model level, rather than aggregating individual model explanations, significantly improves the accuracy of feature importance estimates. This approach, particularly beneficial for expressive models with slow convergence rates, reduces the leading error term, known as excess risk. The study validates these findings on classical benchmarks like Friedman 1, G-function, and Ishigami datasets, using Multi-Layer Perceptrons (MLP) and Random Forest (RF) architectures. Furthermore, a real-world application using UK Biobank proteomic data (n=46,382 participants) to predict Body Mass Index (BMI) demonstrated that model-level ensembling with LightGBM models (achieving an R^2 score of 0.62 +/- 0.001) more accurately identified key metabolic proteins such as FABP4, LEP, ADM, IGFBP-1, and IGFBP-2, compared to aggregating individual model importances.
Key takeaway
Research Scientists developing or deploying complex ML models for scientific discovery, especially in biomedical applications, should prioritize model-level ensembling for feature importance estimation. This strategy, particularly effective for LOCO and SAGE methods, directly reduces model bias and yields more accurate and reliable feature rankings and selections, as demonstrated in proteomic signature identification for BMI. You should consider implementing bagging or voting ensembles to mitigate sampling instability and algorithmic stochasticity, respectively, to improve the robustness of your insights.
Key insights
Ensembling models directly improves feature importance estimation by reducing excess risk, especially for complex ML models.
Principles
- Model-level ensembling reduces excess risk more effectively than aggregating explanations.
- Excess risk is the primary driver of feature importance inaccuracy for complex models.
- Model diversity (lower correlation) enhances ensemble benefits.
Method
The proposed method involves training an ensemble of models (e.g., via bagging or voting) and then deriving feature importance from this aggregated ensemble, rather than averaging importance scores from individual sub-models.
In practice
- Use model-level ensembling for LOCO and SAGE methods.
- Prioritize bagging for sampling instability, voting for algorithmic stochasticity.
- Apply to high-dimensional biomedical data for robust biomarker discovery.
Topics
- Feature Importance Estimation
- Ensemble Learning
- Explainable AI
- Excess Risk
- Biomedical Applications
Best for: Research Scientist, AI Researcher, AI Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.