Choosing ER Time Series Models (part 2): How to Fairly Compare ARIMA and XGBoost?
Summary
This article compares the performance of ARIMA and XGBoost models for forecasting emergency department patient arrivals, building on a previous analysis of ARIMA. The comparison focuses on ensuring fairness by engineering features for XGBoost to account for temporal dependencies, including lag features (lag 1 and lag 7), a 7-day rolling mean, and various calendar features (day of week, month, holidays, week of year). Both models were trained using an 80/20 temporal split of the data and evaluated on a hold-out test set using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). XGBoost showed a marginal improvement over ARIMA across all metrics, with MAE differing by 1 patient and MAPE by 1% on average.
Key takeaway
For data scientists comparing traditional time series models like ARIMA with machine learning models such as XGBoost, ensure a fair comparison by explicitly engineering temporal features for the ML model. Your approach should include lag features, rolling means, and calendar variables, and use a temporal train/test split. While XGBoost may offer marginal gains, consider if a 1-patient difference in daily predictions provides sufficient operational value to justify increased model complexity for hospital decision-makers.
Key insights
Fair comparison of time series models requires careful feature engineering and consistent evaluation metrics.
Principles
- Temporal splitting preserves time series data correlation.
- Feature engineering can adapt ML models for time series.
- Multiple metrics offer comprehensive model performance insight.
Method
The method involves adding lag and rolling mean features to XGBoost, incorporating calendar features into both models, using an 80/20 temporal train/test split, and evaluating with MAE, RMSE, and MAPE.
In practice
- Add lag features to capture autoregressive components.
- Include rolling averages to smooth short-term fluctuations.
- Incorporate calendar features for seasonal patterns.
Topics
- Emergency Department Forecasting
- Time Series Models
- ARIMA
- XGBoost
- Feature Engineering
Code references
Best for: Machine Learning Engineer, Data Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.