How Good Can Linear Models Be for Time-Series Forecasting?

2026-06-25 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study investigates the potential of linear models for time-series forecasting, challenging the assumption that larger architectures are solely responsible for accuracy gains. Using Ridge regression as a testbed, researchers demonstrate that optimizing preprocessing techniques can close most of the performance gap at a significantly lower cost than scaling models. The research explored context length, local normalization, regularization, and augmentation across eight standard benchmarks. Key findings include that optimal lookback is highly series-specific and often non-monotonic with forecast horizon, with power-law exponents ranging from +0.46 on ETTm2 to -0.19 on Exchange and Traffic. Additionally, normalizing over a learned trailing fraction of the context is generally preferred, and the optimal degree of cross-series hyperparameter sharing varies widely. These optimized linear models surpassed previous linear forecasters and outperformed Transformer, MLP, and CNN baselines on six of eight benchmarks, also providing diagnostic insights into data structures.

Key takeaway

For Machine Learning Engineers developing time-series forecasting solutions, you should critically re-evaluate the necessity of complex deep learning models. Instead, focus on rigorously optimizing preprocessing techniques for simpler linear models like Ridge regression. This approach can yield superior accuracy, outperforming Transformer and MLP baselines on many datasets, while significantly reducing computational costs and offering better interpretability. Prioritize tuning context length, normalization strategies, and regularization before scaling model capacity.

Key insights

Tuning preprocessing for linear models can achieve competitive time-series forecasting accuracy against larger deep learning architectures.

Principles

Optimal lookback varies by series, not always increasing with forecast horizon.
Normalizing over a trailing context fraction is often preferred.
Hyperparameter sharing across series should be empirically determined.

Method

Ridge regression was used to search context length, local normalization, regularization, and augmentation on eight benchmarks.

In practice

Prioritize preprocessing optimization for linear time-series models.
Test local normalization using a learned trailing context fraction.
Determine optimal hyperparameter sharing per-series or across datasets.

Topics

Time-Series Forecasting
Linear Models
Ridge Regression
Preprocessing Optimization
Hyperparameter Tuning
Model Benchmarking

Best for: AI Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.