EMA-FS: Accelerating GBDT Training via Gain-Informed Feature Screening
Summary
EMA-based Feature Screening (EMA-FS) is an algorithm-level optimization designed to accelerate Gradient Boosted Decision Tree (GBDT) training, particularly for LightGBM, which typically spends 65-70% of its training time on per-feature histogram construction. Unlike existing random feature subsampling methods, EMA-FS maintains an exponential moving average (EMA) of per-feature split gains and, after a warmup, restricts histogram construction to the top-K features ranked by historical gain. This informed approach retains high-gain features while screening out low-gain ones, preserving compatibility with LightGBM's histogram subtraction trick. Evaluated on datasets with 29 to 968 features, EMA-FS achieved significant speedups on dense, moderate-to-high-dimensional data, including 2.61x on a 500-feature synthetic benchmark and 1.45x on the 432-feature IEEE-CIS Fraud dataset at 30% retention. At 70% retention, it improved AUC by 0.11 points with a 1.34x speedup. A variant, Stochastic EMA-FS (S-EMA-FS), introduces gain-weighted random sampling. Both are implemented in ~120 lines of C++ for LightGBM.
Key takeaway
For Machine Learning Engineers optimizing GBDT model training, implementing EMA-FS can significantly reduce training times without sacrificing accuracy. If your LightGBM models are bottlenecked by histogram construction on dense, moderate-to-high-dimensional datasets, you should consider integrating this gain-informed feature screening. This approach offers up to 2.61x speedups and can even improve AUC by 0.11 points at higher retention, providing a clear path to more efficient model development.
Key insights
EMA-FS accelerates GBDT training by intelligently screening low-gain features based on historical split gains.
Principles
- Feature utility varies; prioritize high-gain features.
- Informed feature selection outperforms random subsampling.
- Exponential moving averages track feature importance over time.
Method
EMA-FS maintains an EMA of per-feature split gains, then restricts histogram construction to the top-K features after a warmup period. S-EMA-FS uses gain-weighted random sampling.
In practice
- Implement EMA-FS in LightGBM for GBDT speedups.
- Use S-EMA-FS to balance deterministic selection and randomness.
- Apply to dense, moderate-to-high-dimensional datasets.
Topics
- GBDT Training
- LightGBM
- Feature Screening
- Model Acceleration
- Exponential Moving Average
- Machine Learning Performance
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.