Friday, October 14, 2016

GEFCom2017: Hierarchical Probabilistic Load Forecasting

IEEE Working Group on Energy Forecasting invites you to join the Global Energy Forecasting Competition 2017 (GEFCom2017): Hierarchical Probabilistic Load Forecasting.


Emerging technologies, such as microgrids, electric vehicles, rooftop solar panels and intelligent batteries, are challenging the traditional operational practices of the power industry. While uncertainties on the demand side are pushing the operational excellence toward the edge of the grid, probabilistic load forecasting at various levels of the power system hierarchy is becoming increasingly important.

GEFCom2017 will bring together state-of-the-art techniques and methodologies for hierarchical probabilistic energy forecasting. The competition features a bi-level setup: a three-month qualifying match that includes two tracks, and a one-month final match on a large-scale problem.

Qualifying match

The qualifying match means to attract and educate a large number of contestants with diverse background, and to prepare them for the final match. The qualifying match includes two tracks, both on forecasting the zonal and total loads of ISO New England (the "DEMAND" column) for the next month in real-time on rolling basis.

The defined-data track (GEFCom2017-D) restricts the data used by the contestants. The data cannot go beyond the calendar data, load (the "DEMAND" column) and temperature data (the "DryBulb" and "DewPnt" columns) provided by ISO New England via the zonal information page of the energy, load and demand reports,  plus the US Federal Holidays as published via US Office of Personnel Management. The contestants may infer day of week and Federal Holidays based on the aforementioned data.

The open-data track (GEFCom2017-O)encourages the contestants to explore various public and private data sources and bring the necessary data into the load forecasting process. The data may include, but is not limited to the data published by ISO New England, the weather forecast data from any weather service providers, the local economy information, the penetration of solar PV published by US government websites.

Final match

The final match (GEFCom2017-F) will be open to the top entries from the qualifying match, tackling a more challenging, larger scale problem than the qualifying match problems. The final match includes one-track only, forecasting the load of a few hundred delivery points of a U.S. utility. The data is from the real world, so the contestants should expect many data issues, such as load transfers and anomalies. Details of the final match will be released on March 15, 2017.

Submission method

To save competition platform costs and implement more sophisticated evaluation methods, the submission will be via email. Within two weeks of the registration, the contestants will receive an email with the instructions about how to submit the forecasts.


The "DEMAND" column published by ISO New England will be used to evaluate the skills of the probabilistic forecasts. Note that the "DEMAND" data may be revised during the settlement process. The version at the time of evaluation will be used to score the forecasts.

The evaluation metric is quantile score. For each forecasted period, the quantile score of a submitted forecast will be compared with the quantile score of the benchmark. The relative improvement over the benchmark will be used to rate and rank the teams.

World Energy Forecaster Rankings (WEFR)

Many contestants who joined GEFCom2012 also participated in GEFCom2014. To encourage the continuous investments in energy forecasting and recognize those who excel in these competitions, we will start building the World Energy Forecaster Rankings.

The contestants of GEFCom2017 will be eligible to participate in WEFR. We hope the rankings can help reward the participants with career opportunities and tickets to future competitions. In addition, editors of relevant journals can also leverage WEFR to enhance the peer review process.


IEEE Power and Energy Society budgeted 20,000 for this competition. The prize pool is $18,000, to be shared among the winning teams and institutions from qualifying match and final match.


Winning teams will be invited to submit papers to a special issue of the International Journal of Forecasting. 


The maximum team size is three. The team leader should register on behalf of the team. The registration period is from Oct 14, 2016 to Jan 14, 2017. Please register via THIS LINK if you want to join the competition.

Competition timeline
  • Competition Problems Release  --  Oct 14, 2016
  • Qualifying Match Starts  --  Dec 1, 2016
  • Qualifying Match Ends  --  Feb 28, 2017
  • Final Match Data Release  --  Mar 15, 2017
  • Final Match Submission Due  --  May 15, 2017

Additional rules

For any questions or comments, please put them in the comment field below. Please link your name to your LinkedIn profile. 

Thursday, October 13, 2016

Congratulations, Dr. Jingrui Xie!

Today (October 13, 2016), Jingrui (Rain) Xie defended her doctoral dissertation on probabilistic electric load forecasting, which made her the first BigDEAL PhD.

When coming back to academia three years ago, I had the mission of producing the next generation of finest analysts for the industry. As the first PhD from BigDEAL, Rain sets the standard for BigDEAL products and tells what the finest analyst looks like.

Rain joined UNC Charlotte in August, 2013, as my first master student. She received her M.S. degree in Engineering Management in May, 2015, and continued with her PhD in Infrastructure and Environmental Systems.

In just three years, she published 7 journal papers:
  • Temperature scenario generation for probabilistic load forecasting (TSG, in press)
  • Relative humidity for load forecasting models (TSG, in press)
  • On normality assumption in residual simulation for probabilistic load forecasting (TSG, 2016)
  • GEFCom2014 probabilistic electric load forecasting: an integrated solution with forecast combination and residual simulation (IJF, 2016)
  • Improving gas load forecasts with big data (GAS, 2016)
  • Long term retail energy forecasting with consideration of residential customer attrition (TSG, 2015)
  • Long term probabilistic load forecasting and normalization with hourly information (TSG, 2014)
and 3 conference papers:
  • Comparing two model selection frameworks for probabilistic load forecasting (PMAPS, 2016)
  • From high-resolution data to high-resolution probabilistic load forecasts (T&D, 2016)
  • Combining load forecasts from independent experts: experience at NPower forecasting challenge 2015 (NAPS, 2015)
She was among the top contestants in all of the forecasting competitions she participated:
  • Top1 in BigDEAL Forecasting Competition 2016
  • Top 3 in NPower Gas Demand Forecasting Challenge 2015
  • Top 3 in NPower Electricity Demand Forecasting Challenge 2015
  • Top 3 in Load Forecasting Track of Global Energy Forecasting Competition 2014
She has also received several prestigious awards:
  • IEEE PES Technical Committee Prize Paper Award 2016
  • International Symposium on Forecasting 2016 Travel Award
  • International Symposium on Forecasting 2015 Travel Award
  • UNCC College of Engineering Outstanding Graduate Research Assistant Award 2015
  • International Institute of Forecasters Student Forecasting Award 2015
Rain has been full-time working at SAS during the past three years. In addition to the academic excellence, Rain received a promotion earlier this year for her outstanding performance at work

It took her 21 months to get the PhD - she enrolled in the PhD program in January, 2015, and defended the dissertation today. That said, she just proved the reproducibility of my 20-month PhD!

Lastly, but most importantly, she became a mother two years ago - her daughter is now two-year old. 

Again, congratulations, Dr. Jingrui Xie!

Wednesday, October 5, 2016

NPower Forecasting Challenge 2016

RWE npower is running its forecasting challenge again this year. The purpose is to recruit summer interns from UK schools. Nevertheless, the competition will be open to people outside UK as well.

In 2015, BigDEAL participated in both competitions, one on electric load forecasting, and the other on gas load forecasting. We summarized our methods into two papers (electricity; gas), which may give you some idea about the previous competitions.

The registrations are now open until November 1, 2016. Have fun!

Thursday, September 22, 2016

A Five-minute Introduction to Electric Load Forecasting

I was recently interviewed by Prof. Galit Shmueli for her recently launched free online course Business Analytics Using Forecasting. In this interview, I gave a 5 minutes introduction to electric load forecasting, discussing the special characteristics of load forecasting and what is needed for successful solutions.

Monday, September 19, 2016

Announcing GEFCom2017: Join the Interest List

I'm sure readers of this blog are anxious for the next Global Energy Forecasting Competition. Today I'm pleased to announce the GEFCom2017, an upgraded version over GEFCom2012 and GEFCom2014.

To bring together contestants with diverse background and to dive deep into the challenging problems, GEFCom2017 will feature a bi-level setup: a three-month qualifying match and a one-month final match. The qualifying match means to attract and educate a large number of contestants with diverse background. The final match will be open to the top entries from the qualifying match, tackling a more challenging, larger scale problem.

Many contestants who joined GEFCom2012 also participated in GEFCom2014. To encourage the continuous investments in energy forecasting and recognize those who excel in these competitions, we will start building the World Energy Forecaster Rankings.

We will release the competition problems and the formal registration on 10/14/2016. Please join the interest list HERE to get timely updates about GEFCom2017.

Stay tuned!

Sunday, September 11, 2016

Call For Sponsors: 2017 International Symposium on Energy Analytics (ISEA2017)

The first International Symposium on Energy Analytics (ISEA2017) will be held in Cairns, Australia, June 22-23, 2017. Cairns is the only place in the world where two World Heritage listed areas are side-by-side: The Great Barrier Reef and The Daintree Rainforest, For more information about Cairns, please visit the Cairns visitors information guide.

ISEA2017 features the theme "Predictive Energy Analytics in the Big Data World". The topics of interest can be found HERE. We expect about 50 attendees, 1/3 from academia and 2/3 from the industry. ISEA2017 is right before the 37th International Symposium on Forecasting (ISF2017), the flagship conference of the International Institute of Forecasters (IIF). Attendees of ISEA2017 will also get a discounted registration to ISF2017.

IIF is a major sponsor of ISEA2017. We are also looking for additional sponsors to keep the cost down for attendees. The sponsor information is highly visible at ISEA2017 and its website, as well as through the email and social media campaigns. This is a great opportunity to support the energy forecasting community, promote your organization and show off your products and services. For your energy analysts, this symposium would be a great venue to learn from and network with peers from other organizations.

The sponsorship can be on any of the four levels as listed below. If you are interested in sponsoring the event, please contact me via email: hongtao01 AT gmail DOT com.

Wednesday, August 24, 2016

Guest Editorial: Big Data Analytics for Grid Modernization

IEEE Transactions on Smart Grid just published our special section on Big Data Analytics for Grid Modernization. The guest editorial is on IEEE Xplore with open access. The original Call for Papers is HERE.

While "big data" is quickly becoming a buzz word (see THIS POST), in this guest editorial we discussed our interpretation from four aspects:
  1. The data involved in the analysis is big in at least one of its three defining dimensions: volume, variety or velocity. The big data used in the utility industry includes but is not limited to smart meter data, phasor measurement unit data, weather data, and social media data. 
  2. The problem under investigation is to prepare for analyzing the big data, such as data compression and data security issues. 
  3. The methodology requires customized modeling of individual components of a system, or leads to in-depth understanding of the individual components. For instance, estimating the invisible solar generation belongs to this category. 
  4. The technology can be used to help reach the answer faster, or answer the questions otherwise difficult to answer. For example, a distributed platform can be used to speed up the analytic tasks.
Thanks to the diligent work from our guest editors, reviewers and the authors, we are able to present a high-quality collection of papers to the community. Below is the list of 17 special section papers:
  1. D. Zhou, J. Guo, Y. Zhang, J. Chai, H. Liu, Y. Liu, C. Huang, X. Gui and Y. Liu, "Distributed Data Analytics Platform for Wide-Area Synchrophasor Measurement Systems," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2397-2405, Sept. 2016
  2. P. H. Gadde, M. Biswal, S. Brahma and H. Cao, "Efficient Compression of PMU Data in WAMS," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2406-2413, Sept. 2016
  3. X. Tong, C. Kang and Q. Xia, "Smart Metering Load Data Compression Based on Load Feature Identification," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2414-2422, Sept. 2016
  4. J. Hu and A. V. Vasilakos, "Energy Big Data Analytics and Security: Challenges and Opportunities," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2423-2436, Sept. 2016
  5. Y. Wang, Q. Chen, C. Kang and Q. Xia, "Clustering of Electricity Consumption Behavior Dynamics Toward Big Data Applications," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2437-2447, Sept. 2016
  6. S. Ben Taieb, R. Huser, R. J. Hyndman and M. G. Genton, "Forecasting Uncertainty in Electricity Smart Meter Data by Boosting Additive Quantile Regression," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2448-2455, Sept. 2016
  7. H. Shaker, H. Zareipour and D. Wood, "Estimating Power Generation of Invisible Solar Sites Using Publicly Available Data," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2456-2465, Sept. 2016
  8. H. Shaker, H. Zareipour and D. Wood, "A Data-Driven Approach for Estimating the Power Generation of Invisible Solar Sites," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2466-2476, Sept. 2016
  9. X. Zhang and S. Grijalva, "A Data-Driven Approach for Detection and Estimation of Residential PV Installations," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2477-2485, Sept. 2016
  10. H. Wang and J. Huang, "Cooperative Planning of Renewable Generations for Interconnected Microgrids," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2486-2496, Sept. 2016
  11. J. Peppanen, M. J. Reno, R. J. Broderick and S. Grijalva, "Distribution System Model Calibration With Big Data From AMI and PV Inverters," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2497-2506, Sept. 2016
  12. Y. C. Chen, J. Wang, A. D. Domínguez-García and P. W. Sauer, "Measurement-Based Estimation of the Power Flow Jacobian Matrix," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2507-2515, Sept. 2016
  13. H. Sun, Z. Wang, J. Wang, Z. Huang, N. Carrington and J. Liao, "Data-Driven Power Outage Detection by Social Sensors," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2516-2524, Sept. 2016
  14. H. Jiang, X. Dai, D. W. Gao, J. J. Zhang, Y. Zhang and E. Muljadi, "Spatial-Temporal Synchrophasor Data Characterization and Analytics in Smart Grid Fault Detection, Identification, and Impact Causal Analysis," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2525-2536, Sept. 2016
  15. M. Rafferty, X. Liu, D. M. Laverty and S. McLoone, "Real-Time Multiple Event Detection and Classification Using Moving Window PCA," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2537-2548, Sept. 2016
  16. T. Jiang, Y. Mu, H. Jia, N. Lu, H. Yuan, J. Yan and W. Li, "A Novel Dominant Mode Estimation Method for Analyzing Inter-Area Oscillation in China Southern Power Grid," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2549-2560, Sept. 2016
  17. B. Wang, B. Fang, Y. Wang, H. Liu and Y. Liu, "Power System Transient Stability Assessment Based on Big Data and the Core Vector Machine," IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2561-2570, Sept. 2016


Tao Hong, Chen Chen, Jianwei Huang, Ning Lu, Le Xie and Hamidreza Zareipour, "Guest Editorial: big data analytics for grid modernization", IEEE Transactions on Smart Grid, vol.7, no.5, pp 2395-2396, September, 2016

Guest Editorial: Big Data Analytics for Grid Modernization

Tao Hong, Chen Chen, Jianwei Huang, Ning Lu, Le Xie and Hamidreza Zareipour

Saturday, July 30, 2016

Southwest Forecasting and Customer Analytics Forum 2016

Southwest Forecasting and Customer Analytics Forum 

Hosted By Tucson Electric Power, September 15-16, 2016

Tucson Electric Power is pleased to host the Southwest Forecasting and Customer Analytics Conference for the utility industry at its downtown Tucson headquarters.  Topics and events covered in the program:
  • Perspective on Using Forecasts by David G. Hutchens, President and CEO, TEP and its parent company, UNS Energy Corporation
  • The Evolving Regulated Utility Industry – Prof. Stanley Reynolds, University of Arizona
  • Energy Forecasting: Past, Present, and Future – Prof. Tao Hong, UNC at Charlotte
  • How Will Battery Storage Deployment Affect Load Forecasting? – Jason Burwen, ESA
  • Impact and Value of Plug-in Electric Vehicle Load & Managed Charging – R. Graham, U.S. Energy Department and Dr. Hongyan Sheng, Southern California Edison
  • Location-Specific Probabilistic Forecasting and Planning Methods – Josh Bode, Nexant
  • Home Appliances: Historical Declines in Energy Use, Future Potential Savings, and an Update on Efficiency Standards – Joanna Mauer, Appliance Standards Awareness Project
  • Likelihood of Customer Participation in Utility Programs – Dr. Erin Boyd, Pacific Gas & Electric
  • Networking Dinner
The only cost of this program is your own travel to Tucson, Arizona and the cost of dinner.  For details, see the program HERE.

Tuesday, July 26, 2016

Temperature Scenario Generation for Probabilistic Load Forecasting

When using weather scenarios to generate probabilistic load forecasts, a frequently asked question is
How many years of weather history do we need? 
This paper gives an answer based on an empirical study.

Most of my papers were accepted after two or more revisions. This time it only took one revision to have this paper accepted. In the first round of review, We received 40 comments from 6 reviewers. Our first revision was accepted after 4 of the reviewers recommended acceptance. In this blog post, I'm attaching the submitted version of the revision including our response letter. Some of our responses were rebuttals to one of the reviewers who made a personal attack on me.


Jingrui Xie and Tao Hong, "Temperature scenario generation for probabilistic load forecasting", Transactions on Smart Grid, accepted.

The working paper is available HERE.

Temperature Scenario Generation for Probabilistic Load Forecasting

Jingrui Xie and Tao Hong


In today’s dynamic and competitive business environment, probabilistic load forecasting (PLF) is becoming increasingly important to utilities for quantifying the uncertainties in the future. Among the various approaches to generating probabilistic load forecasts, feeding simulated weather scenarios to a point load forecasting model is being commonly accepted by the industry for its simplicity and interpretability. There are three practical and widely used methods for temperature scenario generation, namely fixed-date, shifted-date, and bootstrap methods. Nevertheless, these methods have been used mainly on ad hoc basis without being formally compared or quantitatively evaluated. For instance, it has never been clear to the industry how many years of weather history is sufficient to adopt these methods. This is the first study to quantitatively evaluate these three temperature scenario generation methods based on the quantile score, a comprehensive error measure for probabilistic forecasts. Through a series of empirical studies on both linear and nonlinear models with three different levels of predictive power, we find that 1) the quantile score of each method shows diminishing improvement as the length of available temperature history increases; 2) while shifting dates can compensate short weather history, the quantile score improvement gained from the shifted-date method diminishes and eventually becomes negative as the number of shifted days increases; and 3) comparing with the fixed-date method, the bootstrap method offers the capability of generating more comprehensive scenarios but does not improve the quantile score. At the end, an empirical formula for selecting and applying the temperature scenario generation methods is proposed together with a practical guideline.  

Thursday, July 7, 2016

GEFCom2012 Load Forecasting Data

The load forecasting track of GEFCom2012 was about hierarchical load forecasting. We asked the contestants to forecast and backcast (check out THIS POST for the definitions of forecasting and backcasting) the electricity demand for 21 zones, of which the Zone 21 was the sum of the other 20 zones.

Where to download the data?

You can also download an incomplete dataset from Kaggle, which does not have the solution data. The complete data was published as the appendix of our GEFCom2012 paper. If you don't have access to Science Direct, you can downloaded from my Dropbox link HERE. Regardless where you get the data, you should cite this paper to acknowledge the source:
  • Tao Hong, Pierre Pinson and Shu Fan, "Global energy forecasting competition 2012", International Journal of Forecasting, vol.30, no.2, pp 357-363, April-June, 2014. 

What's in the package?

Unzip the file, and navigate to "GEFCOM2012_Data\Load\" folder, you will see 6 files:
  • load_history
  • temperature_history
  • holiday_list
  • load_benchmark
  • load_solution
  • temperature_solution
Our GEFCom2012 paper has introduced the first five datasets but not the last one. The "temperature_solution" dataset includes the temperature data from 2008/6/30 7:00 to 2008/7/7 24:00, while the "load_solution" dataset does not include the load data from 2008/6/30 7:00 to 2008/6/30 24:00.

What's not working?

Before using the data, please understand that
there is no way to restore the exact Kaggle setup for you to make direct comparison on the error score. 
The main reason is that Kaggle pick a random subset of the solution data to calculate the scores for public leaderboard, and the rest for private leaderboard. We do not know which data was used for which leaderboard.

Nevertheless, it was never our intention to let you make comparisons in a Kaggle way. It is because the GEFCom2012 was set up more like a data mining competition than a forecasting competition. The contestants can submit their forecasts many times, while Kaggle was picking the best score. This is not a realistic forecasting process.

How to use the data?

Instead, we encourage you to use these 4.5 years of hourly data without considering the Kaggle setup. You can even keep 4 full calendar years and get rid of the last half a year in your case studies. With four years of data, you can perform one-year ahead ex post forecasting (see my weather station selection paper). You can also perform short term ex post forecasting on rolling basis (see my recency effect paper).

Then the question is whether the accuracy is "good enough". According to Table 3 of our GEFCom2012 paper, the winning teams improved the benchmark by about 30% - see the "test" column, which is the private leaderboard of Kaggle. In other words, if your model is getting about 30% error reduction comparing to the Vanilla benchmark on this dataset, it is a decent model.

Please also understand that this 30% is gained from a forecasting system with many bells and whistles, such as detailed modeling of temperature, and special treatment of holidays. If your research is focus on one components, the error reduction may be much smaller than 30%. You can find a more detailed arguments in my response to the second review comment in THIS POST.

It's been over two years since we publish the GEFCom2012 data. Many researchers have already used it to test their models. You can also replicate the experiment setup in the recently published papers that used this GEFCom2012 data, and compare your results with the results on those papers.