Understanding how time series foundation models are evaluated and compared
Explore the datasets commonly used for training and evaluating time series models
Comprehensive collection of time series datasets across various domains including energy, economics, traffic, and more.
The standard repository for time series classification problems, widely used in academic research.
Dataset containing 2 years of data from electricity transformers, used for long-term forecasting benchmarks.
Meteorological data including temperature, humidity, wind speed, and other weather parameters.
Standard train/validation/test splits are used with chronological ordering preserved. Typical splits are 70%/10%/20% or similar ratios.
Data is normalized using training set statistics to ensure fair comparison across models.
Models are evaluated with multiple random seeds to account for variability and ensure result stability.
Statistical significance tests (e.g., paired t-tests) are conducted to validate performance differences.
State-of-the-art performance, significantly outperforms baselines
Competitive performance, on par with current best methods
Below average performance, may have limitations