Showing posts with label GEFCom2012. Show all posts
Showing posts with label GEFCom2012. Show all posts

Thursday, July 7, 2016

GEFCom2012 Load Forecasting Data

The load forecasting track of GEFCom2012 was about hierarchical load forecasting. We asked the contestants to forecast and backcast (check out THIS POST for the definitions of forecasting and backcasting) the electricity demand for 21 zones, of which the Zone 21 was the sum of the other 20 zones.

Where to download the data?

You can also download an incomplete dataset from Kaggle, which does not have the solution data. The complete data was published as the appendix of our GEFCom2012 paper. If you don't have access to Science Direct, you can downloaded from my Dropbox link HERE. Regardless where you get the data, you should cite this paper to acknowledge the source:
  • Tao Hong, Pierre Pinson and Shu Fan, "Global energy forecasting competition 2012", International Journal of Forecasting, vol.30, no.2, pp 357-363, April-June, 2014. 

What's in the package?

Unzip the file, and navigate to "GEFCOM2012_Data\Load\" folder, you will see 6 files:
  • load_history
  • temperature_history
  • holiday_list
  • load_benchmark
  • load_solution
  • temperature_solution
Our GEFCom2012 paper has introduced the first five datasets but not the last one. The "temperature_solution" dataset includes the temperature data from 2008/6/30 7:00 to 2008/7/7 24:00, while the "load_solution" dataset does not include the load data from 2008/6/30 7:00 to 2008/6/30 24:00.

What's not working?

Before using the data, please understand that
there is no way to restore the exact Kaggle setup for you to make direct comparison on the error score. 
The main reason is that Kaggle pick a random subset of the solution data to calculate the scores for public leaderboard, and the rest for private leaderboard. We do not know which data was used for which leaderboard.

Nevertheless, it was never our intention to let you make comparisons in a Kaggle way. It is because the GEFCom2012 was set up more like a data mining competition than a forecasting competition. The contestants can submit their forecasts many times, while Kaggle was picking the best score. This is not a realistic forecasting process.

How to use the data?

Instead, we encourage you to use these 4.5 years of hourly data without considering the Kaggle setup. You can even keep 4 full calendar years and get rid of the last half a year in your case studies. With four years of data, you can perform one-year ahead ex post forecasting (see my weather station selection paper). You can also perform short term ex post forecasting on rolling basis (see my recency effect paper).

Then the question is whether the accuracy is "good enough". According to Table 3 of our GEFCom2012 paper, the winning teams improved the benchmark by about 30% - see the "test" column, which is the private leaderboard of Kaggle. In other words, if your model is getting about 30% error reduction comparing to the Vanilla benchmark on this dataset, it is a decent model.

Please also understand that this 30% is gained from a forecasting system with many bells and whistles, such as detailed modeling of temperature, and special treatment of holidays. If your research is focus on one components, the error reduction may be much smaller than 30%. You can find a more detailed arguments in my response to the second review comment in THIS POST.

It's been over two years since we published the GEFCom2012 data. Many researchers have already used it to test their models. You can also replicate the experiment setup in the recently published papers that used this GEFCom2012 data, and compare your results with the results on those papers.

Back to Datasets for Energy Forecasting.

Saturday, July 2, 2016

Datasets for Energy Forecasting

Reproducible research is a key to advancing knowledge. In energy forecasting, it is necessary and crucial that researchers compare their models and methods using the same datasets. Five years ago when we founded the IEEE Working Group on Energy Forecasting, "lack of benchmark data pool" was one of the issues we identified. Fortunately, things have been changing toward the right direction over the past few years. More and more datasets are being made available to and recognized by the energy forecasting community.

This post will serve as the starting point of a blog series on datasets. In each post, I will feature a dataset and discuss how to use it. I will also host the datasets on Dropbox and provide the links in these posts. Meanwhile, I would like to take a crowd-sourcing approach to making a comprehensive and widely accessible data pool:
  • If you can host the datasets through other channels, please contact me. 
  • If you know of some public datasets that are not on my list, please contact me. 
  • If you have some private datasets that can be made available to the energy forecasting community, please contact me. 
Here is a list of 9 posts with the publicly available data that I have used in my papers. I will update the list with links and additional data sources, so check this page from time to time to see if there is something you need.

Electric load forecasting
  1. GEFCom2012
  2. GEFCom2014
  3. ISO New England
  4. RWE npower forecasting challenge 2015
Gas load forecasting
  1. RWE npower forecasting challenge 2015
Electricity price forecasting
  1. GEFCom2014
Wind power forecasting
  1. GEFCom2012
  2. GEFCom2014
Solar power forecasting
  1. GEFCom2014
Stay tuned...

Friday, February 27, 2015

Electric Load Forecasting with Recency Effect: a Big Data Approach

When I first wrote the CFP for the special issue on Analytics for Energy Forecasting with Applications to Smart Grid in 2012, I used the term big data, with a quotation mark. Nowadays, big data is no longer new to the utility industry. In fact the utilities have been working with big data since it was called just "data" - we witness the growth of data to big data in this smart grid era. To collect the most recent progress and advancements in big data analytics, we just issued another CFP for the special issue on Big Data Analytics for Grid Modernization.
What is big data analytics, deep learning, high-performance computing and petabyte size? 
I have three simple criteria:
  1. The data size is larger than what typical data analysis tools can handle. If you are using MS Excel to do some data analysis, then a data file with 1.1 million rows is big data. 
  2. The computing time is longer than the analysis time. Let's say it takes you a few days to think of a design of an algorithm. If testing the algorithm takes a few weeks, then it is big data. 
  3. The problem requires analysis at a higher level of granularity than usual. If your typical load forecasting process rely on monthly data, moving to daily or hourly data may bring you the big data challenge. 
Although these three criteria do not have to be met at the same time to qualify big data analytics, they are indeed connected to each other. Analyzing high resolution data often requires advanced data analysis tools and significant computing time. 

This paper has big data in its title, because it covers the latter two criteria. The regression models we developed in this paper contain up to thousands of variables, which require significant amount of time for parameter estimation, much longer than our thought process. Moreover, we customized the models based on each zone of a geographic hierarchy and each node (hour) of the temporal hierarchy. Of course the forecasting errors are reduced with our proposed approach, which also tells us the importance of powerful computers in load forecasting. 

The case study is based on the GEFCom2012 data published in my 2014 IJF paper Global Energy Forecasting Competition 2012. We compared the results with those in my 2015 IJF paper Weather Station Selection for Electric Load Forecasting.

Citation
Pu Wang, Bidong Liu and Tao Hong, "Electric load forecasting with recency effect: a big data approach", International Journal of Forecasting, vol.32, no.3, pp 585-597, July-September, 2016. Working paper available online http://www.drhongtao.com/articles


Electric Load Forecasting with Recency Effect: a Big Data Approach

Pu Wang, Bidong Liu and Tao Hong

Abstract

Temperature plays a key role in driving electricity demand. We adopt "recency effect", a term originated from psychology, to illustrate the fact that electricity demand is affected by the temperatures of preceding hours. In the load forecasting literature, the temperature variables are often constructed in the form of lagged hourly temperatures and moving average temperatures. Over the past decades, computing power has been limiting the amount of temperature variables that can be used in a load forecasting model. In this paper, we present a comprehensive study on modeling recency effect through a big data approach. We take advantage of the modern computing power to answer a fundamental question: how many lagged hourly temperatures and/or moving average temperatures are needed in a regression model to fully capture recency effect without compromising the forecasting accuracy? Using the case study based on data from the load forecasting track of the Global Energy Forecasting Competition 2012, we first demonstrate that a model with recency effect outperforms its counterpart in forecasting individual load series at aggregated level by 18% to 20%. We then apply recency effect modeling to customize load forecasting models at low level of a geographic hierarchy, again showing the superiority over a benchmark model by 13% to 15% on average. Finally, we discuss four different implementations of the recency effect modeling by hour of a day. 

Sunday, December 28, 2014

Forecasting and Data Mining

The main difference between forecasting and data mining is on the goal of the task. The goal of forecasting is to make statements about the future, while the goal of data mining is to extract patterns from large datasets. (The term "data mining" was a buzzword 15 years ago to broadly refer to working on the data, which is a misuse.) Many techniques can be applied to both forecasting and data mining, such as artificial neural networks, regression analysis, and clustering analysis, and so forth.

Thursday, March 6, 2014

GEFCom2012 Papers

Thanks to International Journal of Forecasting and IEEE Transactions on Smart Grid, our two publication sponsors of GEFCom2012, all of the invited GEFCom2012 papers have been published as of today. These papers, in many aspects, represent the state-of-the-art in load and wind forecasting. The citations in IEEE format and links to these papers are provided below.

Wednesday, January 8, 2014

Guest Editorial: Special Section on Analytics for Energy Forecasting with Applications to Smart Grid

I'm glad to announce that the special section we edited last year has been published by IEEE Transactions on Smart Grid. I had a great pleasure working with and learning from my guest editors during this editorial process. The guest editorial is available on IEEE Xplore with open access.

Citation
Tao Hong, Shu Fan, Wei-Jen Lee, Wenyuan Li, Anil Pahwa, Pierre Pinson, Jianhui Wang and Hamidreza Zareipour, "Guest Editorial: special section on analytics for energy forecasting with applications to smart grid", IEEE Transactions on Smart Grid, vol.5, no.1, pp. 399-401, Jan 2014

Guest Editorial: Special Section on Analytics for Energy Forecasting with Applications to Smart Grid 

Tao Hong, Shu Fan, Wei-Jen Lee, Wenyuan Li, Anil Pahwa, Pierre Pinson, Jianhui Wang and Hamidreza Zareipour

Thanks to the help from the reviewers and editors, totally 15 high quality forecasting papers are included in this special section, which can be categorized into 5 groups:
  1. Global Energy Forecasting Competition;
  2. Load forecasting and analysis with high granular data;
  3. Probabilistic energy forecasting;
  4. Forecasting and analysis of emerging subjects;
  5. Novel methods for wind power forecasting.
The links to the papers with the IEEE citation format are provided below with indicators of the groups they fall into:

Enjoy reading!

Monday, September 30, 2013

Global Energy Forecasting Competition 2012: Lessons Learned and Beyond

In 2012, IEEE Working Group on Energy Forecasting organized the Global Energy Forecasting Competition (GEFCom2012). The competition, which included two tracks, Hierarchical Load Forecasting and Wind Power Forecasting, attracted thousands of data scientists from over 30 countries (see the map below). The 8 winning teams were formulated by people from 8 countries. The competition brought together many novel ideas that are meaningful to the improvement of the forecasting practices in the utility industry. In this webinar, I will present the lessons learned from GEFCom2012, which include several key findings and insights based on the approaches taken by the winning entries of both tracks. In addition, I will offer a preview of the future upgrades in GEFCom2014.
If you are interested, please go to the webinar page to register. The presentation slides are available through the webinar page as well.
Please notice that this webinar is co-sponsored by SAS Utilities User Group. Therefore, it will be the only webinar on my webinar series so far with the audio recording available.

Saturday, August 10, 2013

Winners Announced for Global Energy Forecasting Competition 2012

Two weeks ago at 2013 PES General Meeting held in Vancouver, BC, we announced the 8 winning teams of Global Energy Forecasting Competition (GEFCom2012). The picture below was taken at the end of the award reception.
I'm delighted to post the winners here:

Sunday, June 30, 2013

What's New in Energy Forecasting - Jun 2013

There are three exciting events going on in the energy forecasting community:
1. Global Energy Forecasting Competition
GEFCom2012 has been a great success. A valuable benchmark data pool is being established through this competition. Many novel ideas are brought to the field by the data scientists worldwide. Several papers from the top entries will be published in two prestigious scholarly journals, International Journal of Forecasting and IEEE Transactions on Smart Grid. An overview paper can be accessed from this post.
Meanwhile, we have started planning GEFCom2014. If you are interested, you are welcome to join the interest list to get the most recent update.

Saturday, June 15, 2013

Global Energy Forecasting Competition 2012

The paper is available on Science Direct.

Citation
Tao Hong, Pierre Pinson and Shu Fan, "Global Energy Forecasting Competition 2012", International Journal of Forecasting, vol.30, no.2., pp 357-363, April - June, 2014


Global Energy Forecasting Competition 2012

Tao Hong, Pierre Pinson and Shu Fan

Abstract

The Global Energy Forecasting Competition (GEFCom2012) attracted hundreds of participants worldwide, who contributed many novel ideas to the energy forecasting field. This paper introduces both tracks of GEFCom2012, hierarchical load forecasting and wind power forecasting, with details on the aspects of the problem, the data, and a summary of the methods used by selected top entries. We also discuss the lessons learned from this competition from the organizers’ perspective. The complete data set, including the solution data, is published along with this paper, in an effort to establish a benchmark data pool for the community.

Sunday, December 9, 2012

Post-GEFCom2012 Activities

There are several activities following up GEFCom2012:
1. Nine finalists will present their work at IEEE PES General Meeting 2013. Four of them are from the hierarchical load forecasting track: Colin Singleton from CountingLab Ltd, James Robert Lloyd from University of Cambridge, Raphael Nédellec from EDF R&D, and Souhaib Ben Taieb from Université Libre de Bruxelles. Five of them are from the wind power forecasting track: Lucas Eustaquio from DTI Sistemas, Ekaterina Mangalova from Siberian State Aerospace University, Matt Wytock from Carnegie Mellon University, Gabor I. Nagy from Budapset University of Technology and Economics, and Duehee Lee from University of Texas at Austin.
2. International Journal of Forecasting (IJF) will publish a few short papers from GEFCom2012, including one introduction paper and several other papers describing the methodologies used by the top teams. The competition data including the results will be published with the introduction paper.
3. IEEE Transactions of Smart Grid will publish a few full papers from GEFCom2012 in the Special Issue on Analytics for Energy Forecasting with Applications to Smart Grid.

Wednesday, October 31, 2012

Congratulations, GEFCom2012 Contestants!

Thank Kaggle's data scientists, the Advisory Committee and the officers of GEFCom2012, and of course hundreds of contestants who made this competition a fantastic and exciting event! While we are still accepting reports from the contestants for final evaluation, I can't wait to share some summary statistics with you all.
There are two tracks in GEFCom2012, hierarchical load forecasting and wind power forecasting. During the planning stage, we have received over 100 sign-ups across more than 30 countries in total. As of today...

Tuesday, August 28, 2012

More about GEFCom2012

Up to date, 100+ people from the following 31 countries have filled in the Call For Participants survey:
Australia, Belgium, Brazil, Canada, China, Croatia, Cyprus, Denmark, France, Germany, Greece, Hungary, India, Iran, Italy, Japan, Mexico, Netherlands, New Zealand, Norway, Pakistan, Poland, Russia, Singapore, South Africa, Spain, Sweden, Switzerland, Turkey, UK, US
While
Kaggle is setting up the online competition environment for us, here are some more details about the competition:

Sunday, August 12, 2012

IEEE PES Financially Sponsors GEFCom2012 with $18,000

Due to large scale deployment of smart meters, increasing penetration of renewable energy resources and aging workforce, the utility industry is facing challenges in the field of energy forecasting. To help the industry improve forecasting practices, Power Systems Planning and Implementation Committee (PSPI) and Power and Energy Education Committee (PEEC) are co-sponsoring the Global Energy Forecasting Competition 2012 (GEFCom2012), which is being organized by IEEE Working Group on Energy Forecasting, a working group under PSPI Committee. The competition will bring together the state-of-the-art techniques for energy forecasting, serve as the bridge to connect academic research and industry practice, promote analytics in power engineering education, and prepare the industry to overcome forecasting challenges in the smart grid world.
The competition includes two tracks: load forecasting and wind forecasting.

Sunday, June 17, 2012

What's New in Energy Forecasting - Jun 2012

There are three exciting events going on in the energy forecasting community:

1. Global Energy Forecasting Competition 2012
Over 50 teams across 25 countries have signed up the Global Energy Forecasting Competition 2012. Please visit the competition webpage for more information, such as Call For Participants, Important Dates, etc.

2. Activities in IEEE PES General Meeting 2012, San Diego, CA:
IEEE Working Group on Energy Forecasting is hosting three sessions in PESGM2012:
Panel Session – “Load forecasting methodologies and applications in operations and planning”: Room Emma AB, Monday, July 23, 2012, 13:00 – 17:00;
Energy Forecasting Working Group Meeting: Room Windsor B, Tuesday, July 24, 2012, 14:00 – 16:00;
Panel Session – “Demand response: analytics, practice, and challenges in smart grid environment”, Room Emma A, Thursday, July 26, 2012, 13:00 – 17:00.
More information about the topics to be presented in the panel sessions can be found here.

3. Special Issue on Analytics for Energy Forecasting with Applications to Smart Grid
This special issue of IEEE Transactions on Smart Grid covers both challenges and opportunities for the energy forecasting communities, and will bring together the state-of-the-art analytics, technologies and best practices in the smart grid industry. The extended abstract is due on Sep 30th, 2012. For more information, please refer to the Call For Papers.