XGBoost for Tabular Time Series: A Multi-Stage Framework for Inventory Recovery Forecasting

· Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, E-commerce & Digital Commerce · Depth: Intermediate, long

Summary

A multi-stage XGBoost framework addresses inventory recovery forecasting for a large e-commerce network, predicting the fraction of inventory failing to sell at full price. The target variable is severely zero-inflated and bounded between 0 and 1, necessitating a two-stage pipeline. Stage 1 employs an XGBoost binary classifier to predict the probability of any recovery, while Stage 2 uses an XGBoost regressor, trained on non-zero recovery rows with a logit-transformed target, to estimate recovery magnitude. Key to this approach is extensive feature engineering, including a dense weekly grid, lagged recovery rates (e.g., 1, 2, 4, 8, 13, 26, 52 weeks), rolling window statistics (e.g., 4, 8, 13, 26, 52 weeks), and entity-level baseline statistics. The system was developed using a dataset spanning 2022-2025, with 2025 held out for testing, comprising over 6 million observations. Hyperparameter optimization is performed using Optuna's TPE algorithm, and SHAP values are utilized for model interpretability, decomposing predictions into long-run baseline and current-week deviation components.

Key takeaway

For Data Scientists building forecasting systems with zero-inflated and bounded targets, you should adopt a multi-stage modeling approach. This strategy, exemplified by the two-stage XGBoost pipeline, significantly improves accuracy and interpretability by addressing each target characteristic separately. Ensure your feature engineering includes comprehensive temporal lags and rolling statistics, and always use temporal train-test splits to prevent data leakage and ensure realistic performance evaluation.

Key insights

Complex target distributions like zero-inflated and bounded data require multi-stage modeling for accurate forecasting.

Principles

Method

A two-stage XGBoost pipeline: binary classification for funnel entry, then regression on logit-transformed non-zero recovery rates. Features include lags, rolling windows, and entity baselines.

In practice

Topics

Best for: Machine Learning Engineer, Data Scientist, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.