TTM: Tiny Foundation Models for Multivariate Time-Series Forecasting
Summary
Tiny Time Mixers (TTMs) are a new family of compact pre-trained models designed for zero-shot and few-shot multivariate time-series forecasting. Built on the lightweight TSMixer architecture, TTMs incorporate adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle heterogeneous multi-resolution pre-training. The models employ a multi-level strategy, pre-training the backbone channel-independently and fine-tuning a decoder for cross-channel correlations and exogenous effects. TTM variants, starting from around 1M parameters (TTM-Base), achieve strong transfer learning performance. For instance, TTMA (5M parameters) outperforms Moirai by 4–10% and TimesFM by 19% in zero-shot forecasting. TTMB (1M parameters) outperforms Chronos by 17–32% and Lag-Llama by 40%. Pre-trained on 1B samples using 6 A100 GPUs for 24–30 hours, TTMB demonstrates significantly faster inference, requiring only 0.01 seconds per batch on CPU, making it suitable for resource-constrained environments.
Key takeaway
For MLOps Engineers deploying time-series forecasting solutions, TTMs offer a compelling alternative to large foundation models. You can achieve strong zero-shot or few-shot performance with significantly reduced computational overhead, enabling faster inference on CPU-only or resource-constrained systems. Consider TTM-Base for its 1M parameters and 0.01 seconds/batch CPU inference. This approach allows you to adapt pre-trained models efficiently to specific multivariate datasets, including those with exogenous variables, without expensive full-model fine-tuning.
Key insights
Tiny Time Mixers (TTMs) enable efficient, accurate time-series forecasting via compact pre-trained models and multi-level adaptation.
Principles
- Channel-independent pre-training generalizes temporal dynamics.
- Resolution diversity is critical for time-series foundation models.
- Lightweight decoders adapt to target-specific multivariate structures.
Method
TTMs pre-train a TSMixer-based backbone on diverse, multi-resolution data using adaptive patching, diverse resolution sampling, and resolution prefix tuning, then fine-tune a small decoder for target-specific multivariate adaptation.
In practice
- Use TTMs for zero-shot or few-shot time-series forecasting.
- Employ the exogenous mixer for known future covariates.
- Adapt pre-trained models to new forecast lengths via pruning.
Topics
- Multivariate Time Series
- Foundation Models
- Zero-shot Forecasting
- Few-shot Learning
- TSMixer Architecture
- Model Efficiency
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Deep Learning on Medium.