Impermanent: A Live Benchmark for Temporal Generalization in Time Series Forecasting

2026-03-09 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

Impermanent is a new live benchmark designed to evaluate time-series forecasting models, particularly pre-trained foundation-style models, under open-world temporal change. Unlike traditional static train-test splits that risk data contamination and inflated performance, Impermanent scores forecasts sequentially on continuously updated data streams. This approach enables the study of temporal robustness, distributional shift, and performance stability over time. The benchmark is instantiated using GitHub open-source activity, focusing on the top 400 repositories by star count. It constructs time series from metrics like issues opened, pull requests opened, push events, and new stargazers, evaluated daily over a rolling window. Standardized protocols and leaderboards facilitate reproducible, ongoing comparisons, aiming to provide meaningful assessment of foundation-level generalization in time-series forecasting.

Key takeaway

For research scientists developing or deploying time-series forecasting models, you should consider integrating live benchmarks like Impermanent into your evaluation protocols. Relying solely on static train-test splits risks overestimating model generalization and robustness to real-world data shifts. Adopting continuous evaluation will provide a more accurate understanding of your model's sustained performance and its ability to adapt to evolving temporal dynamics.

Key insights

Live benchmarks are crucial for evaluating time-series models under continuous temporal change and avoiding data contamination.

Principles

Evaluate models sequentially over time.
Use continuously updated data streams.
Focus on performance stability.

Method

Impermanent evaluates forecasting models by scoring them sequentially on continuously updated data streams, using a rolling window with daily updates, to assess temporal robustness and performance stability.

In practice

Utilize GitHub activity for live data.
Track issues, PRs, pushes, stargazers.
Implement daily rolling window evaluation.

Topics

Time Series Forecasting
Foundation Models
Temporal Generalization
Live Benchmarking
Distributional Shift

Code references

TimeCopilot/impermanent

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.