Monday, March 20, 2017

GEFCom2014 Load Forecasting Data

The load forecasting track of GEFCom2014 was about probabilistic load forecasting. We asked the contestants to provide one-month ahead hourly probabilistic forecasts on a rolling basis for 15 rounds. In the first round, we provided 69 months of hourly load data and 117 months of hourly temperature data. Incremental load and temperature data was provided in each of the future rounds.

Where to download the data?

The complete data was published as the appendix of our GEFCom2014 paper. If you don't have access to Science Direct, you can downloaded from my Dropbox link HERE. Regardless where you get the data, you should cite this paper to acknowledge the source:

  • Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli and Rob J. Hyndman, "Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond", International Journal of Forecasting, vol.32, no.3, pp 896-913, July-September, 2016.


What's in the package?

Unzip the file, you will see the folder "GEFCom2014 Data", which includes five zip files. The data for the probabilistic load forecasting track of GFECom2014 is in the file "GEFCom2014-L_V2.zip". Unzip it, you will see the folder "load", which includes an "Instructions.txt" file and 15 other subfolders. In each folder named as "Task n", there are two files, Ln-train.csv and Ln-benchmark.csv. The train file, together with the train files released in previous rounds, can be used to generate forecasts. The benchmark file includes the forecast generated from the benchmark method.

How to use the data?

Apparently the most straightforward way of using this dataset is to replicate the competition setup and compare results directly with the top entries. Because the data published through GEFCom2014 is quite long (totally 7 years of matching load and temperature data), we can also use this dataset to test methods and models for short term load forecasting.

GEFCom2014-E data

After GEFCom2014, I organized an in-class probabilistic load forecasting competition in Fall 2015 that was open to external participants. My in-class competition setup was very similar to that of GEFCom2014, so I denoted the data for this in-class load forecasting competition as GEFCom2014-E, where E is the abbreviation of "extended". In total, this dataset covers 11 years of hourly temperature and 9 years of hourly load. A top team Florian Ziel was invited to contribute a paper to IJF (see HERE). The readers may replicate the same competition setup and compare results with Ziel's.

Caution

Note that the data I used for GEFCom2014-E was created using ISO New England data. If you want to validate a method using two independent sources, you should not use GEFCom2014-E together with ISO New England data.

Back to Datasets for Energy Forecasting.

1 comment:

  1. I have a question about the GEFCom2012 dataset:

    As I can see from the dataset with solution data you have added a file called "windpowermeasurements.csv" that is the train and test set combined into one series. But the windforecast data is not updated (i.e. there are still a lot of missing values in the data set). Is this a mistake or is there something I do not see?

    Thanks!

    /Sebastian

    ReplyDelete

Note that you may link to your LinkedIn profile if you choose Name/URL option.