Aggregate Models, Not Explanations: Improving Feature Importance Estimation

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new theoretical analysis reveals that ensembling machine learning models at the model level, rather than aggregating individual model explanations, significantly improves the accuracy of feature importance estimates. This approach, particularly beneficial for expressive models with slow convergence rates, reduces the leading error term, known as excess risk. The study validates these findings on classical benchmarks like Friedman 1, G-function, and Ishigami datasets, using Multi-Layer Perceptrons (MLP) and Random Forest (RF) architectures. Furthermore, a real-world application using UK Biobank proteomic data (n=46,382 participants) to predict Body Mass Index (BMI) demonstrated that model-level ensembling with LightGBM models (achieving an R^2 score of 0.62 +/- 0.001) more accurately identified key metabolic proteins such as FABP4, LEP, ADM, IGFBP-1, and IGFBP-2, compared to aggregating individual model importances.

Key takeaway

Research Scientists developing or deploying complex ML models for scientific discovery, especially in biomedical applications, should prioritize model-level ensembling for feature importance estimation. This strategy, particularly effective for LOCO and SAGE methods, directly reduces model bias and yields more accurate and reliable feature rankings and selections, as demonstrated in proteomic signature identification for BMI. You should consider implementing bagging or voting ensembles to mitigate sampling instability and algorithmic stochasticity, respectively, to improve the robustness of your insights.

Key insights

Ensembling models directly improves feature importance estimation by reducing excess risk, especially for complex ML models.

Principles

Method

The proposed method involves training an ensemble of models (e.g., via bagging or voting) and then deriving feature importance from this aggregated ensemble, rather than averaging importance scores from individual sub-models.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.