Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

2014-01-01 · Source: stat.ML updates on arXiv.org · Field: Science & Research — Environmental Science & Earth Systems, Mathematics & Computational Sciences · Depth: Expert, long

Summary

Researchers have introduced WEATHER-5K, a new large-scale dataset designed to improve Global Station Weather Forecasting (GSWF) and general time-series benchmarking. This dataset addresses critical limitations of existing public meteorological datasets, which often suffer from small sizes, limited temporal coverage, and insufficient variables. WEATHER-5K comprises comprehensive data from 5,672 weather stations globally, covering a 10-year period from 2014 to 2023 with one-hour intervals, and includes multiple crucial weather elements. The data was meticulously collected from the National Centers for Environmental Information (NCEI) Integrated Surface Database and underwent rigorous post-processing, including gap-filling with linear interpolation and ERA5 reanalysis. This resource provides a robust foundation for evaluating and advancing various time-series forecasting models, with the dataset and benchmark implementation publicly available.

Key takeaway

For Machine Learning Engineers developing global weather or general time-series forecasting models, you should integrate the new WEATHER-5K dataset into your evaluation pipeline. Its comprehensive 10-year, hourly data from 5,672 stations worldwide offers a robust benchmark for model generalization and identifying complex patterns. Utilizing this dataset will help you overcome limitations of smaller, outdated datasets, enabling more accurate and reliable predictions for operational services.

Key insights

The WEATHER-5K dataset provides a comprehensive, large-scale benchmark for global station weather and general time-series forecasting, addressing prior data limitations.

Principles

Comprehensive datasets improve model generalization.
Wind speed and direction are non-stationary, hard to predict.
Sea Level Pressure shows stable, predictable distribution.

Method

WEATHER-5K was created by selecting 5,672 operational, hourly reporting stations from NCEI ISD (2014-2023). Missing hourly data was estimated using nearest 30-minute observations, followed by linear interpolation and ERA5 reanalysis for remaining gaps.

In practice

Benchmark various time-series forecasting models.
Develop models for long-term weather patterns.

Topics

Global Station Weather Forecasting
Time-Series Forecasting
WEATHER-5K Dataset
Meteorological Data
Weather Benchmarking
Deep Learning Models

Code references

taohan10200/WEATHER-5K

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.