MSMixer: Learned Multi-Scale Temporal Mixing with Complementary Linear Shortcut for Long-Term Time Series Forecasting
Summary
MSMixer is a new channel-independent multi-scale MLP architecture designed for long-term time series forecasting, addressing the challenge of capturing diverse temporal patterns from rapid oscillations to macro-trends within a fixed look-back window. It integrates three parallel scale branches operating at down-sample factors of 1x, 4x, and 16x, each with independent MLP blocks. A learnable softmax gate dynamically weighs the outputs from these branches, and a DLinear complementary shortcut provides full-window trend and seasonality context. MSMixer features only 112K parameters at H=96 and maintains O(T) complexity. Benchmarked on four ETT datasets using standard chronological splits and three random seeds, MSMixer achieved an average MSE of 0.357, outperforming DLinear (0.386) by 7.4% and NLinear (0.365) by 2.1%, winning 12 out of 16 configurations. It also secured best or second-best MSE in 9 of 16 configurations against five Transformer-based baselines, using 5x fewer parameters than PatchTST.
Key takeaway
For AI Engineers and Research Scientists developing long-term time series forecasting models, MSMixer offers a parameter-efficient and high-performing alternative to existing MLP and Transformer-based solutions. Its multi-scale approach and DLinear shortcut significantly improve accuracy while maintaining low computational complexity. Consider integrating similar multi-scale and linear shortcut components into your next-generation forecasting architectures to enhance performance and efficiency.
Key insights
Multi-scale MLP architectures with dynamic weighting and linear shortcuts improve long-term time series forecasting.
Principles
- Combine multiple temporal resolutions.
- Dynamically weigh multi-scale outputs.
- Integrate linear trend/seasonality context.
Method
MSMixer uses three parallel MLP branches at 1x, 4x, 16x down-sampling, a learnable softmax gate for output weighting, and a DLinear shortcut for trend and seasonality context.
In practice
- Implement parallel down-sampled branches.
- Use a learnable gate for branch fusion.
- Add a DLinear shortcut for context.
Topics
- Long-Term Time Series Forecasting
- MSMixer Architecture
- Multi-Scale Temporal Mixing
- DLinear Shortcut
- MLP Models
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.