Forecast collapse of transformer-based models under squared loss in financial time series

2026-04-02 · Source: stat.ML updates on arXiv.org · Field: Finance & Economics — FinTech & Digital Financial Services, Capital Markets & Investment Management · Depth: Expert, quick

Summary

A study by Pierre Andreoletti forecasts the collapse of Transformer-based models when applied to financial time series under squared loss. The research highlights that in regimes where the conditional expectation of future trajectories is effectively degenerate, such as in standard financial settings, the Bayes-optimal predictor becomes trivial (flat for prices, zero for returns). In these scenarios, increasing model expressivity, like with Transformers, does not enhance predictive accuracy. Instead, it introduces spurious fluctuations around the optimal predictor due to noise reuse, leading to increased prediction variance without bias reduction. Numerical experiments using high-frequency EUR/USD exchange rate data support these theoretical findings, demonstrating that Transformer models produce larger forecasting errors than a simple linear benchmark across most forecasting windows, consistent with a variance-driven degradation mechanism.

Key takeaway

For AI Engineers developing financial forecasting models, you should reconsider using Transformer-based architectures for time series with weak conditional structure, especially under squared loss. Your models may suffer from increased prediction variance and larger errors compared to simpler linear benchmarks, as high expressivity can amplify noise rather than capture signal. Focus on model parsimony and robust error analysis to avoid performance degradation.

Key insights

Highly expressive models like Transformers degrade on financial time series due to increased variance from noise reuse.

Principles

Increased model expressivity does not improve accuracy in degenerate conditional expectation regimes.
Noise reuse introduces spurious fluctuations and increased prediction variance.

Method

The study combines classical characterization of squared-loss risk minimization with numerical experiments on high-frequency EUR/USD exchange rate data to analyze trajectory-level forecasting errors.

In practice

Avoid complex models for financial time series with weak conditional structure.
Prioritize simple linear models for financial forecasting under squared loss.

Topics

Transformer Models
Financial Time Series
Squared Loss
Trajectory Forecasting
Prediction Variance

Best for: AI Engineer, Machine Learning Engineer, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.