FAME: Forecastability-Aware Mixture of Experts for Heterogeneous Time Series Forecasting
Summary
FAME, a novel sparse mixture-of-experts framework, addresses the challenge of forecasting heterogeneous time series in large-scale retail and industrial systems. It introduces forecastability-aware expert routing, learning how data characteristics determine the suitability of forecasting experts. The system represents each time series with a multidimensional "forecastability fingerprint," mines expert-suitability targets from validation performance, and trains a cost-aware sparse router to activate a small, budgeted set of experts for each series. Evaluated on a production-scale vending-machine sales dataset from Shandong New Beiyang (SNBC) with over 5,000 machines and 60 million transactions, FAME Top-2 reduced Mean Squared Error (MSE) by 12.4% compared to LightGBM, the strongest single expert, while executing an average of 1.92 experts per series. This framework transforms heterogeneous sales forecasting from heuristic model selection into data mining of forecastability patterns and expert specialization.
Key takeaway
For Machine Learning Engineers managing large-scale time series forecasting systems, FAME offers a robust approach to improve accuracy and efficiency. You should consider implementing forecastability-aware expert routing to dynamically select specialized models for diverse series. This method reduces Mean Squared Error by identifying optimal expert combinations, moving beyond heuristic model selection. Integrate this framework to enhance demand forecasts and optimize inventory planning.
Key insights
FAME uses forecastability fingerprints and a sparse router to dynamically select optimal experts for heterogeneous time series.
Principles
- Heterogeneous series benefit from specialized experts.
- Forecastability patterns guide expert selection.
- Sparse routing reduces inference cost.
Method
FAME represents series with forecastability fingerprints, mines expert-suitability targets from validation, and trains a cost-aware sparse router to activate budgeted experts.
In practice
- Apply forecastability fingerprints to series.
- Train sparse routers for expert selection.
- Integrate into replenishment-planning pipelines.
Topics
- Time Series Forecasting
- Mixture-of-Experts
- Forecastability Fingerprint
- Sparse Routing
- Retail Forecasting
- Demand Forecasting
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.