How Good Can Linear Models Be for Time-Series Forecasting?
Summary
A study investigates the potential of linear models for time-series forecasting, challenging the assumption that larger architectures are solely responsible for accuracy gains. Using Ridge regression as a testbed, researchers demonstrate that optimizing preprocessing techniques can close most of the performance gap at a significantly lower cost than scaling models. The research explored context length, local normalization, regularization, and augmentation across eight standard benchmarks. Key findings include that optimal lookback is highly series-specific and often non-monotonic with forecast horizon, with power-law exponents ranging from +0.46 on ETTm2 to -0.19 on Exchange and Traffic. Additionally, normalizing over a learned trailing fraction of the context is generally preferred, and the optimal degree of cross-series hyperparameter sharing varies widely. These optimized linear models surpassed previous linear forecasters and outperformed Transformer, MLP, and CNN baselines on six of eight benchmarks, also providing diagnostic insights into data structures.
Key takeaway
For Machine Learning Engineers developing time-series forecasting solutions, you should critically re-evaluate the necessity of complex deep learning models. Instead, focus on rigorously optimizing preprocessing techniques for simpler linear models like Ridge regression. This approach can yield superior accuracy, outperforming Transformer and MLP baselines on many datasets, while significantly reducing computational costs and offering better interpretability. Prioritize tuning context length, normalization strategies, and regularization before scaling model capacity.
Key insights
Tuning preprocessing for linear models can achieve competitive time-series forecasting accuracy against larger deep learning architectures.
Principles
- Optimal lookback varies by series, not always increasing with forecast horizon.
- Normalizing over a trailing context fraction is often preferred.
- Hyperparameter sharing across series should be empirically determined.
Method
Ridge regression was used to search context length, local normalization, regularization, and augmentation on eight benchmarks.
In practice
- Prioritize preprocessing optimization for linear time-series models.
- Test local normalization using a learned trailing context fraction.
- Determine optimal hyperparameter sharing per-series or across datasets.
Topics
- Time-Series Forecasting
- Linear Models
- Ridge Regression
- Preprocessing Optimization
- Hyperparameter Tuning
- Model Benchmarking
Best for: AI Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.