Benchmark Datasets & Metrics

Understanding how time series foundation models are evaluated and compared

Time Series Datasets

Explore the datasets commonly used for training and evaluating time series models

Standard Datasets

Monash Time Series Archive

20+ datasets Multiple domains Standard benchmark

Comprehensive collection of time series datasets across various domains including energy, economics, traffic, and more.

Energy Economics Traffic Weather

UEA/UCR Time Series Classification

100+ datasets Classification Since 2002

The standard repository for time series classification problems, widely used in academic research.

ECG Motion Spectrograms Sensor

ETT (Electricity Transformer Temperature)

Electricity data Hourly recordings Temperature

Dataset containing 2 years of data from electricity transformers, used for long-term forecasting benchmarks.

Variants: ETTh1, ETTh2, ETTm1, ETTm2

Weather Dataset

Weather data Multiple locations 21 features

Meteorological data including temperature, humidity, wind speed, and other weather parameters.

Features: Temperature, Humidity, Wind Speed, Pressure, etc.

Evaluation Protocol

1

Data Splitting

Standard train/validation/test splits are used with chronological ordering preserved. Typical splits are 70%/10%/20% or similar ratios.

2

Normalization

Data is normalized using training set statistics to ensure fair comparison across models.

3

Multiple Runs

Models are evaluated with multiple random seeds to account for variability and ensure result stability.

4

Statistical Testing

Statistical significance tests (e.g., paired t-tests) are conducted to validate performance differences.

Interpreting Performance Scores

0.9 - 1.0 Excellent

State-of-the-art performance, significantly outperforms baselines

0.8 - 0.89 Good

Competitive performance, on par with current best methods

0.0 - 0.79 Needs Improvement

Below average performance, may have limitations