EMA-FS: Accelerating GBDT Training via Gain-Informed Feature Screening

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

EMA-based Feature Screening (EMA-FS) is an algorithm-level optimization designed to accelerate Gradient Boosted Decision Tree (GBDT) training, particularly for LightGBM, which typically spends 65-70% of its training time on per-feature histogram construction. Unlike existing random feature subsampling methods, EMA-FS maintains an exponential moving average (EMA) of per-feature split gains and, after a warmup, restricts histogram construction to the top-K features ranked by historical gain. This informed approach retains high-gain features while screening out low-gain ones, preserving compatibility with LightGBM's histogram subtraction trick. Evaluated on datasets with 29 to 968 features, EMA-FS achieved significant speedups on dense, moderate-to-high-dimensional data, including 2.61x on a 500-feature synthetic benchmark and 1.45x on the 432-feature IEEE-CIS Fraud dataset at 30% retention. At 70% retention, it improved AUC by 0.11 points with a 1.34x speedup. A variant, Stochastic EMA-FS (S-EMA-FS), introduces gain-weighted random sampling. Both are implemented in ~120 lines of C++ for LightGBM.

Key takeaway

For Machine Learning Engineers optimizing GBDT model training, implementing EMA-FS can significantly reduce training times without sacrificing accuracy. If your LightGBM models are bottlenecked by histogram construction on dense, moderate-to-high-dimensional datasets, you should consider integrating this gain-informed feature screening. This approach offers up to 2.61x speedups and can even improve AUC by 0.11 points at higher retention, providing a clear path to more efficient model development.

Key insights

EMA-FS accelerates GBDT training by intelligently screening low-gain features based on historical split gains.

Principles

Method

EMA-FS maintains an EMA of per-feature split gains, then restricts histogram construction to the top-K features after a warmup period. S-EMA-FS uses gain-weighted random sampling.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.