Gradient Regularized Newton Boosting Trees with Global Convergence

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This paper introduces Restricted Newton Descent to analyze the global convergence of Gradient Boosting Decision Trees (GBDTs), specifically focusing on vanilla Newton boosting and a novel gradient regularized Newton boosting scheme. GBDTs, including implementations like XGBoost and LightGBM, rely on Newton boosting, a second-order descent method, but its global convergence has been poorly understood. The authors prove that vanilla Newton boosting achieves a linear convergence rate for smooth, strongly convex losses under a Hessian-dominance condition. For general convex losses with Lipschitz Hessians, they extend a gradient regularized Newton (GRN) scheme, which minimally modifies the classical algorithm by adding an adaptive $\ell_{2}$-regularization term proportional to the square root of the gradient norm. This GRN scheme achieves a global $\mathcal{O}(\frac{1}{k^{2}})$ convergence rate, matching first-order boosting with Nesterov momentum, and demonstrates convergence in numerical experiments where vanilla Newton boosting may diverge, particularly with Charbonnier loss.

Key takeaway

For Research Scientists developing or deploying GBDT models, understanding the theoretical convergence guarantees is crucial. The Gradient Regularized Newton (GRN) scheme offers a robust alternative to vanilla Newton boosting, providing global $\mathcal{O}(\frac{1}{k^{2}})$ convergence for general convex losses. You should consider implementing GRN, which incurs negligible computational overhead, especially when working with loss functions prone to divergence or when seeking stronger theoretical guarantees for model stability and performance.

Key insights

A gradient-regularized Newton boosting scheme achieves global convergence for GBDTs, addressing prior theoretical gaps.

Principles

Method

The Gradient Regularized Newton (GRN) scheme modifies Newton boosting by adding an adaptive $\ell_{2}$-regularization term, $\lambda_{k}=\lambda_{\text{base}}+\sqrt{M\|g_{k}\|}$, to the objective function at each iteration.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.