Impermanent: A Live Benchmark for Temporal Generalization in Time Series Forecasting
Summary
Impermanent is a new live benchmark designed to evaluate time-series forecasting models, particularly pre-trained foundation-style models, under open-world temporal change. Unlike traditional static train-test splits that risk data contamination and inflated performance, Impermanent scores forecasts sequentially on continuously updated data streams. This approach enables the study of temporal robustness, distributional shift, and performance stability over time. The benchmark is instantiated using GitHub open-source activity, focusing on the top 400 repositories by star count. It constructs time series from metrics like issues opened, pull requests opened, push events, and new stargazers, evaluated daily over a rolling window. Standardized protocols and leaderboards facilitate reproducible, ongoing comparisons, aiming to provide meaningful assessment of foundation-level generalization in time-series forecasting.
Key takeaway
For research scientists developing or deploying time-series forecasting models, you should consider integrating live benchmarks like Impermanent into your evaluation protocols. Relying solely on static train-test splits risks overestimating model generalization and robustness to real-world data shifts. Adopting continuous evaluation will provide a more accurate understanding of your model's sustained performance and its ability to adapt to evolving temporal dynamics.
Key insights
Live benchmarks are crucial for evaluating time-series models under continuous temporal change and avoiding data contamination.
Principles
- Evaluate models sequentially over time.
- Use continuously updated data streams.
- Focus on performance stability.
Method
Impermanent evaluates forecasting models by scoring them sequentially on continuously updated data streams, using a rolling window with daily updates, to assess temporal robustness and performance stability.
In practice
- Utilize GitHub activity for live data.
- Track issues, PRs, pushes, stargazers.
- Implement daily rolling window evaluation.
Topics
- Time Series Forecasting
- Foundation Models
- Temporal Generalization
- Live Benchmarking
- Distributional Shift
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.