MSMixer: Learned Multi-Scale Temporal Mixing with Complementary Linear Shortcut for Long-Term Time Series Forecasting

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

MSMixer is a new channel-independent multi-scale MLP architecture designed for long-term time series forecasting, addressing the challenge of capturing diverse temporal patterns from rapid oscillations to macro-trends within a fixed look-back window. It integrates three parallel scale branches operating at down-sample factors of 1x, 4x, and 16x, each with independent MLP blocks. A learnable softmax gate dynamically weighs the outputs from these branches, and a DLinear complementary shortcut provides full-window trend and seasonality context. MSMixer features only 112K parameters at H=96 and maintains O(T) complexity. Benchmarked on four ETT datasets using standard chronological splits and three random seeds, MSMixer achieved an average MSE of 0.357, outperforming DLinear (0.386) by 7.4% and NLinear (0.365) by 2.1%, winning 12 out of 16 configurations. It also secured best or second-best MSE in 9 of 16 configurations against five Transformer-based baselines, using 5x fewer parameters than PatchTST.

Key takeaway

For AI Engineers and Research Scientists developing long-term time series forecasting models, MSMixer offers a parameter-efficient and high-performing alternative to existing MLP and Transformer-based solutions. Its multi-scale approach and DLinear shortcut significantly improve accuracy while maintaining low computational complexity. Consider integrating similar multi-scale and linear shortcut components into your next-generation forecasting architectures to enhance performance and efficiency.

Key insights

Multi-scale MLP architectures with dynamic weighting and linear shortcuts improve long-term time series forecasting.

Principles

Method

MSMixer uses three parallel MLP branches at 1x, 4x, 16x down-sampling, a learnable softmax gate for output weighting, and a DLinear shortcut for trend and seasonality context.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.