Model selection with proper scoring rules on data sets of time series

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

A study investigates model selection for probabilistic models on time series datasets, focusing on proper scoring rules. It reveals that common summary statistics—mean score, median score, and mean rank—can yield conflicting decisions due to the skewness of score distributions. While these criteria converge with larger test sets ($n_{te}$), for short test sets, only the mean score reliably identifies the true model. The research illustrates this using intermittent time series, including the M5 competition dataset, comparing Poisson and negative binomial distributions. It highlights that mean rank decisions are sensitive to $n_{te}$ and high quantile levels (e.g., $\text{QS}_{0.975}$, $\text{QS}_{0.995}$), often selecting a misspecified model, whereas the mean scaled score remains robust across varying $n_{te}$ and scaling factors.

Key takeaway

For data scientists evaluating probabilistic time series models, prioritize the mean scaled score over mean rank, especially when dealing with short test sets or high quantile scores like $\text{QS}_{0.975}$ or $\text{QS}_{0.995}$. Conflicting results often stem from skewed score distributions, where mean rank can misidentify the best model. Always validate your model selection by checking results across at least two different scaling factors to ensure robustness.

Key insights

Skewed score distributions cause conflicting model selection outcomes, making mean scaled score more reliable than mean rank for time series.

Principles

Method

The paper compares mean score, median score, and mean rank for aggregating scores across multiple time series, analyzing their convergence and sensitivity to test set length ($n_{te}$) and scaling factors.

In practice

Topics

Best for: Research Scientist, AI Engineer, Machine Learning Engineer, Data Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.