XGBoost for Tabular Time Series: A Multi-Stage Framework for Inventory Recovery Forecasting
Summary
A multi-stage XGBoost framework addresses inventory recovery forecasting for a large e-commerce network, predicting the fraction of inventory failing to sell at full price. The target variable is severely zero-inflated and bounded between 0 and 1, necessitating a two-stage pipeline. Stage 1 employs an XGBoost binary classifier to predict the probability of any recovery, while Stage 2 uses an XGBoost regressor, trained on non-zero recovery rows with a logit-transformed target, to estimate recovery magnitude. Key to this approach is extensive feature engineering, including a dense weekly grid, lagged recovery rates (e.g., 1, 2, 4, 8, 13, 26, 52 weeks), rolling window statistics (e.g., 4, 8, 13, 26, 52 weeks), and entity-level baseline statistics. The system was developed using a dataset spanning 2022-2025, with 2025 held out for testing, comprising over 6 million observations. Hyperparameter optimization is performed using Optuna's TPE algorithm, and SHAP values are utilized for model interpretability, decomposing predictions into long-run baseline and current-week deviation components.
Key takeaway
For Data Scientists building forecasting systems with zero-inflated and bounded targets, you should adopt a multi-stage modeling approach. This strategy, exemplified by the two-stage XGBoost pipeline, significantly improves accuracy and interpretability by addressing each target characteristic separately. Ensure your feature engineering includes comprehensive temporal lags and rolling statistics, and always use temporal train-test splits to prevent data leakage and ensure realistic performance evaluation.
Key insights
Complex target distributions like zero-inflated and bounded data require multi-stage modeling for accurate forecasting.
Principles
- Stage models for zero-inflated, bounded targets.
- Engineer temporal features for XGBoost time series.
- Use temporal train-test splits for time series.
Method
A two-stage XGBoost pipeline: binary classification for funnel entry, then regression on logit-transformed non-zero recovery rates. Features include lags, rolling windows, and entity baselines.
In practice
- Implement 1, 2, 4, 8, 13, 26, 52-week lags.
- Apply logit transform for bounded targets.
- Use Optuna for hyperparameter tuning.
Topics
- XGBoost
- Time Series Forecasting
- Inventory Recovery
- Feature Engineering
- Multi-Stage Modeling
- SHAP Interpretability
Best for: Machine Learning Engineer, Data Scientist, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.