Friday, August 18, 2017

IEEE PES Announces Winning Teams for Global Energy Forecasting Competition 2017

More than 300 students and professionals from more than 30 countries formed 177 teams to compete on hierarchical probabilistic load forecasting, exploring opportunities from the big data world and tackling the analytical challenges.

PISCATAWAY, N.J., USA, August 18, 2017 – IEEE, the world's largest professional organization advancing technology for humanity, today announced the results of the Global Energy Forecasting Competition 2017 (GEFCom2017), which was organized and supported by the IEEE Power & Energy Society (IEEE PES) and the IEEE Working Group on Energy Forecasting (WGEF).

“I congratulate the eight winning teams and all the contestants of GEFCom2017. They are pushing the boundaries of electric load forecasting,” said Dr. Tao Hong, Chair of IEEE Working Group on Energy Forecasting and General Chair of Global Energy Forecasting Competition, “GEFCom2017 is the longest and most challenging one among the series of Global Energy Forecasting Competitions. To encourage the contestants to explore opportunities in the big data world, GEFCom2017 released more data than the previous competitions combined.”

The theme is hierarchical probabilistic load forecasting, merging the challenges of both GEFCom2012 and GEFCom2014. The 6-month-long GEFCom2017 includes two phases. The qualifying match was to provide medium term probabilistic forecasts for ISO New England region in real time. It meant to attract and educate a large number of contestants with diverse background, and to prepare them for the final match. The final match asked the contestants to provide probabilistic forecasts for 161 delivery points. All of the competition data will be further released to the public for future research and benchmarking purposes.

"The Global Energy Forecasting Competitions have been extraordinarily successful in stimulating and promoting technology advancement. To continue the momentum, IEEE PES decided to fund GEFCom2017 with $20,000 for the cash prizes to the winning teams," said Patrick Ryan, Executive Director of IEEE PES, "We are so delighted to witness another fantastic competition. Look forward to seeing its positive impact to the industry for the coming years."

GEFCom2017 includes a two-track qualifying match and a single-track final match. Each track recognizes three winning teams.

Qualifying Match Defined Data Track Winners:

  • Andrew J. Landgraf (Battelle, USA)
  • Slawek Smyl (Uber Technologies, USA) and Grace Hua (Microsoft, USA)
  • Gábor Nagy and Gergő Barta (Budapest University of Technology and Economics, Hungary), Gábor Simon (dmlab, Hungary)

Qualifying Match Open Track Winners:

  • Geert Scholma (The Netherlands)
  • Florian Ziel (Universität Duisburg-Essen, Germany)
  • Jingrui Xie (SAS Institute, Inc., USA)

Final Match Winners:

  • Isao Kanda and Juan Quintana (Japan Meteorological Corporation, Japan)
  • Ján Dolinský, Mária Starovská and Robert Toth (Tangent Works, Slovakia)
  • Gábor Nagy and Gergő Barta (Budapest University of Technology and Economics, Hungary), Gábor Simon (dmlab, Hungary)

For more information about the GEFCom2017, please visit

Original announcement: 

Wednesday, August 9, 2017

Benchmarking Robustness of Load Forecasting Models under Data Integrity Attacks

GIGO - garbage in, garbage out. In forecasting, GIGO means that if the model is fed with garbage (bad) data, the forecast would be bad too. In the power industry, bad load forecasts often result in waste of energy resources, financial losses, brownouts or even blackouts.

Anomaly detection and data cleansing procedures may help alleviate some of the bad data from the input side. However, what if the bad data was created by hackers? Can the existing models "survive" or stay accurate under data attacks? This paper offers some benchmark results.

This paper sets a few "first":
  1. This is the first paper formally addressing the cybersecurity issues in the load forecasting literature. I believe that the data attacks should be of a great concern to the forecasting community. This paper sets a solid ground for future research. 
  2. This is my first journal paper co-authored with a professor in my doctoral committee, Dr. Shu-Cherng Fang. Many years ago, I picked up the topic of my dissertation from one of my consulting projects. I then invited a team of world class professors from different areas to form the committee. None of them were really into load forecasting, though I had the opportunities learning from different perspectives. 
  3. This is my first journal paper that went through one year of peer review cycle, the longest peer review I've experienced. It's definitely worth the effort. The IJF editors and reviewers certainly spent a significant amount of time reading the paper and offered so many constructive comments. I wish I could know their names and identify them in the acknowledgement section.

Jian Luo, Tao Hong and Shu-Cherng Fang, "Benchmarking robustness of load forecasting models under data integrity attacks", International Journal of Forecasting, accepted. (working paper)

Benchmarking robustness of load forecasting models under data integrity attacks

Jian Luo, Tao Hong and Shu-Cherng Fang


As the internet continues to expand its footprint, cybersecurity has become a major concern for the governments and private sectors. One of the cybersecurity issues is on data integrity attacks. In this paper, we focus on the power industry, where the forecasting processes heavily rely on the quality of data. The data integrity attacks are expected to harm the performance of forecasting systems, which greatly impact the financial bottom line of power companies and the resilience of power grids. Here we reveal how data integrity attacks can affect the accuracy of four representative load forecasting models (i.e., multiple linear regression, support vector regression, artificial neural networks, and fuzzy interaction regression). We first simulate some data integrity attacks by randomly injecting some multipliers that follow a normal or uniform distribution to the load series. Then the aforementioned four load forecasting models are applied to generate one-year ahead ex post point forecasts for comparisons of their forecast errors. The results show that the support vector regression model, trailed closely by the multiple linear regression model, is most robust, while the fuzzy interaction regression model is least robust among the four. Nevertheless, all of the four models fail to provide satisfying forecasts when the scale of data integrity attacks becomes large. This presents a serious challenge to the load forecasters and the broader forecasting community: How to generate accurate forecasts under data integrity attacks? We use the publicly-available data from Global Energy Forecasting Competition 2012 to construct the case study. At the end, we also offer an outlook of potential research topics for future studies.

Monday, August 7, 2017

Breakthrough or Too Good To Be True: Several Smoke Tests

When sharing my Four Steps to Review an Energy Forecasting Paper, I spent about a third of the blog post elaborating what "contribution" means. This post is triggered by several review comments to my recent TSG paper variable selection methods for probabilistic load forecasting. Here I would like to elaborate what "contribution" means from a different angle.

A little background first. 

In that TSG paper, we compared two variable selection schemes, HeM (Heuristic Method) that sharpens the underlying model to minimize the point forecast error, and HoM (Holistic Method) that uses the quantile score to select the underlying model. The key finding is as follows:
HoM costs much more computational power but only produces slightly better quantile scores than HeM.
Then some of the reviewers raised the red flag:
If the new method is not much better than the existing one, why shall we accept the paper?
I believe that the question is genuine. Most likely the reviewers, as well as many other load forecasters, have read many papers in the literature that have presented super powerful models or methods that led to super accurate forecasts. After being flooded with those breakthroughs, they would be hesitant to give favorable ratings to a paper that presents a somewhat disappointing conclusion. 

Now let's take one step back:
What if those breakthroughs were just illusions? 
Given the fact that most of those papers were proposing complicated algorithms tested by some proprietary datasets, it is very difficult to reproduce the work. In other words, we can hardly verify those stories. The reviewers and editors may be rejecting valuable papers that are not bluffing. This time I was lucky - most reviewers were on my side.

When my premature models were beating all the other competitors many years ago, I was truly astonished about the realworld performance of those "state-of-the-art" models. If those breakthroughs in the literature were really tangible, my experiences tells me that the industry would be pouring money to those authors to ask for the insights. It's been many years after those papers were published, how much of those published papers have been recognized by the industry? (In my IJF review, I did mentioned a few exemplary papers though.)

We have run the Global Energy Forecasting Competitions three times. How often do you see those authors or their students on the leaderboard? If their methods are truly effective but not recognized by the industry, why not test them through these public competitions? 

Okay, now you know some of those "peer-reviewed" papers may be bluffing. How to tell if they are really bluffing? Before telling you my answer, let's see how those papers are produced:
  1. To make sure that the contribution is novel, they authors must propose something new. To insure it looks challenging, the proposal must be complicated. The easiest way to create such techniques is to mix the existing ones, such as ANN+PSO+ARIMA, etc.
  2. To make sure that nobody can reproduce the results, the data used in the case study must be proprietary. Since all we need to have the paper accepted is to have it go through the reviewers and editor(s). An unpopular dataset is fine too, because the reviewers don't bother to spend the time reproducing the work.
  3. To make sure that the results can justify the breakthrough, the forecasts must be close to perfection. The proposed models must beat the existing ones to death. How to accomplish that? Since the authors have the complete knowledge of the future dataset, just fine tune the model so that it outperform the others in the forecast period. This is called "peeking the future".
In reality, it is very hard to build the models or methods that can dominate the state of the art. Certainly it doesn't come from a "hybrid" of the existing ones. Instead, the breakthroughs (or major improvement) come from using new variables that people have not yet completely understood in the past, borrowing the knowledge from other domains, leveraging new computing power, and so forth.

In the world of predictive modeling, there is that well-known theorem called "no-free lunch", which states that no one model works the best in all situations. In other words, if one beats the others in all cases across all measures, it is "too good to be true". We need the empirical studies that report what's NOT working well as much as the ones promoting the champions. 

It's time for my list of smoke tests. The more check marks a paper gets, the more I consider it too good to be true.
  1. The paper is proposing a mix (or hybrid) of many techniques.
  2. The paper is merely catching new buzzwords.
  3. The data is proprietary.
  4. The paper is not co-authored with industry people (or not sponsored by the industry). 
  5. The proposed method does not utilize new variables 
  6. The proposed method does not take knowledge of other domains.
  7. The proposed method does not leverage new computing resources.
  8. The proposed method is dominating its counterparts (another credible method) in all aspects.
I spend minimal amount of time reading those papers, because they are emperor's new clothes to me,. Hopefully this list can help the readers save some time too. On the other hand, I didn't mean to imply that the authors were intentionally faking the paper. Many of them are genuine people but making the mistakes without knowing so. Hopefully this blog post can help point to the right direction for the authors as well.

Tuesday, August 1, 2017

Variable Selection Methods for Probabilistic Load Forecasting: Empirical Evidence from Seven states of the United States

I am an evidence-based man. This mentality saves me tremendous amount of time in recent years. I have been minimizing my time in following bluffs in the literature. On the other hand, I have been developing empirical case studies and encourage the community to contribute to he empirical research.

In my GEFCom2014 paper, I raised the following question to the forecasting community:
Can a better point forecasting model lead to a better probabilistic forecast?
To answer this question, we have to first understand the definition of "better", a.k.a., forecast evaluation measures and methods. In this paper, we compared two variable selection methods based on point and probabilistic error measures respectively. The case study covers seven states of the US. The results from this paper can hopefully be leveraged by future empirical studies for comparison purposes.

(This paper is an upgrade to our PMAPS2016 paper.)


Jingrui Xie and Tao Hong, "Variable Selection Methods for Probabilistic Load Forecasting: Empirical Evidence from Seven states of the United States", IEEE Transactions on Smart Grid, in press.

Variable Selection Methods for Probabilistic Load Forecasting: Empirical Evidence from Seven states of the United States

Jingrui Xie and Tao Hong


Variable selection is the process of selecting a subset of relevant variables for use in model construction. It is a critical step in forecasting but has not yet played a major role in the load forecasting literature. In probabilistic load forecasting, many methodologies to date rely on the variable selection mechanisms inherited from the point load forecasting literature. Consequently, the variables of an underlying model for probabilistic load forecasting are selected by minimizing a point error measure. On the other hand, a holistic and seemingly more accurate method would be to select variables using probabilistic error measures. Nevertheless, this holistic approach by nature requires more computational efforts than its counterpart. As the computing technologies are being greatly enhanced over time, a fundamental research question arises: can we significantly improve the forecast skill by taking the holistic yet computationally intensive variable selection method? This paper tackles the variable selection problem in probabilistic load forecasting by proposing a holistic method (HoM) and comparing it with a heuristic method (HeM). HoM uses a probabilistic error measure to select the variables to construct the underlying model for probabilistic forecasting, which is consistent with the error measure used for the final probabilistic forecast evaluation. HeM takes a shortcut by relying on a point error measure for variable selection. The evidence from the empirical study covering seven states of the United States suggests that 1) the two methods indeed return different variable sets for the underlying models, and 2) HoM slightly outperforms but does not dominate HeM w.r.t. the skill of probabilistic load forecasts. Nevertheless, the conclusion might vary on other datasets. Other empirical studies of the same nature would be encouraged as part of the future work.

Monday, July 31, 2017

Who's Who in Energy Forecasting: Rafał Weron

Prof. Rafał Weron is the Head of the Economic Modeling Group in the Department of Operations Research at Wroclaw University of Science and Technology. He has so many accomplishments in energy forecasting, such as his renowned book on load and price forecasting, many widely cited papers, and of course those talented students. Recently he won the IIF-Hong Award for his IJF review paper on electricity price forecasting.

What brought you to the energy field, particularly load and price forecasting?

I would like to say that a well thought-out decision ... but realistically ... it was more luck and coincidence. I was the right person at the right time.

Towards the end of my PhD studies, back in 1998, I started looking for a new area of research. I studied mathematics, my PhD was from something fashionable in the 1990s – math finance. But it was too theoretical for me, too far from real applications. Finance itself was better. Yet, doing top level finance research and publishing in top-tier finance journals was (nearly) impossible for someone working in a former East Block country back then. On the other hand, I liked the freedom of the Academia and didn’t want to work as a quant.

Then in the late 1990s, power market deregulation started spreading throughout Europe and the US. The Polish Power Exchange opened in mid-2000, but there were very few people who knew what power trading was about. Mathematicians and economists were set back by the technical aspects of power market operations, engineers had no economic training. I saw this as an opportunity. With a sound training in statistics/time series analysis, half a year spent as a trainee in an investment bank and basic knowledge about power system economics gained during a project run by the Hugo Steinhaus Center for the Polish Power Grid in 1995-1996, I had a head start compared to fellow colleagues. And I was eager to work hard.

But my first “energy” papers were not on forecasting. Rather on data analytics, risk management and derivatives pricing, focusing more on the mid-term horizon. I became interested short-term forecasting a bit later, around 2003-2004, when it became apparent that the day-ahead market was “the marketplace” for electricity trading, not the derivatives markets.

Tell us more about your price forecasting review paper. Why and how did you write that 52-page article?

The invitation from Rob Hyndman to write the review paper came out of the blue, around mid-2013. Ask him why he approached me in the first place. Prior to that I had only one paper in IJF, in a special issue on Energy Forecasting published in 2008.

But it was a welcome invitation. A few years have passed since I published my 2006 Wiley book on modeling and forecasting electricity loads and prices. And I have been gathering material
for a revised version. So I agreed happily and said that I would submit a draft by the end of the year. This turned out to be impossible … well, I started working on it too late, sometime in early December 2013, after Rob had send me a reminder email ;-)

I wanted my review to be self-contained and rather complete. I hate these review/survey papers which just cite dozens of articles without actually analyzing and thoroughly comparing the results using the same measures. So I wanted to include my own empirical examples. This meant a lot of coding and data analysis. No wonder it took me two full months to complete. But the outcome surprised even me – it was like a small book – the draft had 88 pages in the standard elsarticle Latex page layout. I was sure Rob would tell me to cut it in half …

What's your proudest accomplishment in forecasting?

A famous Polish mathematician, Hugo Steinhaus, used to say that his greatest discovery was … Stefan Banach. Yes, the same Banach known in mathematics for “Banach spaces”, one of the founders of modern functional analysis. I also like to think that my greatest accomplishments are my brilliant students. One of them – Jakub Nowotarski – graduated this June, with distinction. Jakub’s research output was outstanding, not only for a PhD student. The Quantile Regression Averaging method, which turned out to be a top performer in the GEFCom2014 competition, was 90% his idea. He would have easily received a habilitation (~tenure) within a year or two, if only he decided to stay in the Academia. But I am working now with two gifted BSc students – Grzegorz Marcjasz and Bartosz Uniejewski. Someday I may be able to say the same about them.

Do you work with companies to improve their forecasting practice?

I have had different episodes in my life, some more academic, some less. Over the last two decades I have been periodically engaged as a consultant to financial, energy and software engineering companies. And yes, I have worked with utilities and power generators on improving their load and price forecasting techniques and risk management systems. The hype on energy forecasting in Poland was between 2000 and 2006. Then a series of mergers changed the landscape – the four large companies that remained were not that interested in developing in-house solutions anymore. So my recent developments in forecasting are more academic in nature.

Is there a key initiative or exciting project you are working?

Together with Florian Ziel we are working on a book for CRC Press that will supersede my 2006 Wiley book. The tentative title is “Forecasting Electricity Prices: A Guide to Robust Modeling”. It is scheduled to be out in 2018.

The book will start with a chapter on the art of forecasting, introducing the basic notions of (energy) forecasting. We will continue with a chapter on the markets for electricity and discuss the products traded there. Then the three main chapters will follow: “Forecasting for Beginners” – which will introduce a few simple models and show how point, probabilistic and path forecasts can be computed for them, “Evaluating Models and Forecasts” – which is a very important, but still underdeveloped area in energy forecasting, and “Forecasting for Experts” – which will discuss a number of more advanced concepts, like regime-switching, shrinkage, feature selection, non-linear and fundamental models.

What's your forecast for the next 10 years of energy forecasting field?

This is a tough question. When writing my 2014 IJF review I came up with five directions in which electricity price forecasting would or should evolve over the next decade: (1) a better treatment of seasonality and use of fundamentals, (2) going beyond point forecasts, (3) more extensive use of forecast combinations, (4) development of multivariate models, and (5) more thorough forecast evaluation. Out of these, I think that the least has been done since then in the context of multivariate models. This is not a surprise, multivariate models are much more demanding, not only conceptually but also computationally. But I do believe that they have a lot to offer. Also Bayesian methods may see more extensive use, especially in probabilistic forecasting.

Another direction that I think may become important in the near future is “path forecasting”. Currently, in a vast majority of load or price forecasting papers only marginal (i.e., at one point in time) distributions are considered, either in a point or probabilistic context. But the forecast for hour 9 should not be independent from the one for hour 8. If we predict a price drop below a “normal” level for hour 8 tomorrow, then it is quite likely that the price for hour 9 will also be lower than under “normal” circumstances. Our forecasting models should take this into account.

What else do you do in the academic world other than energy forecasting?

Currently it’s 75% percent energy forecasting and 25% agent based modeling, but also related to energy markets – diffusion of innovations, like dynamic tariffs or pro-ecological behavior. In the not so distant past I have worked on long-range dependence, risk management and derivatives pricing, also outside the Academia.

What's fun about your job?

Everything. The sleepless nights spent on writing papers, the discussions with the reviewers and editors that I am right and they are wrong, and – most importantly – dozens of emails and skype calls I exchange each day with my students when working on a new research idea.  

How do you spend your free time?

Working. No, this would be an exaggeration … but only a small one. A researcher is never on a holiday – best ideas don’t come during my office hours, they tend to pop up unexpectedly. But I try my best not to be a 100% workaholic. I like mountains – both hiking and skiing. From sports – playing volleyball, badminton and squash – not that I’m a good player, but I like it ;-) 

Wednesday, May 31, 2017

Tao Hong - Energy Education Leader of the Year

Earlier this month, I was honored to be named as the Education Leader of the Year by Charlotte Business Journal (CBJ) at the Energy Inc. Summit.

My friend Alyssa Farrell took a video of the award reception speech, where I gave a "forecast" about the energy industry:
The energy companies will be moving more Gigabytes of data than GWh of electricity. 
Here is the 1-minute speech:

When I was first informed about this award, I didn't realize its prestige. Then I started getting congrats from friends, colleagues, and even the dean. I guessed that it must be something big. After the award ceremony, CBJ put my profile in print and online (Energy Leadership Awards: Putting big data to work for energy). UNC Charlotte also featured the story in its campus news letter.

Since the CBJ article is behind a paywall, I'm sharing the interview with the audience here.

What drew you to a career in education? How long have you been in that field?

Before coming back to the academia, I was working at SAS, one of the very best employers in the world. Part of that job was to teach classes internationally. My primary audience was industry professionals. Through that experience, I found a big gap between what the industry needed in terms of analytics and data science and what universities were offering through various academic programs. I thought I could be that person to help bridge the gap, so I took a mission of producing the finest data scientists for the energy industry and joined UNC Charlotte. I've been on this academic job for almost 4 years.

What’s the most important part of what you do?

I would say students are the most important part of what I do. I consider students as my products. I want to make the finest products for the industry, so everything I do is centered around the students: I try to pick the best raw materials, perfect them as much as possible, and then put them in the best place of the market. As a professor, my job can be mainly categorized in three pieces, research, teaching and services. These three closely tie together. The industry partners bring me their problems to work on; I help them solve these problems and then bring the research findings to the class; then they keep sending me new problems and hiring my students.

How do you see energy education evolving?

I think the evolution of the energy education is two-fold. First, it has to be interdisciplinary. It's no longer the job of one department, such as electrical engineering or mechanical engineering. We have to involve many academic departments to educate the workforce for the energy industry. Some of them should even go beyond the college of engineering, such as policy, economics, statistics and meteorological science.

Talk a little about the BigDEAL Lab. What does that mean for students?

It is the best place to be if you want to be the elite data scientists in the energy industry. BigDEAL students have the opportunity to solve the most challenging analytic problems in the industry; they have access to the state-of-the-art software donated by our industry partners; they can leverage many data sources that no other universities have access to. As a result, BigDEAL students have been taking top places in many international competitions and been chased by many renowned employers in the industry.

What role does UNCC play in the energy industry - both locally and nationally?

UNCC have been training many energy professionals in Carolinas and delivering many fresh graduates to the local energy industry. Nationally, UNCC sets a great example of industry-university collaboration.

What makes UNCC’s research so valuable?

We are fortunate to be located in a large city and surrounded by many enthusiastic industry partners. The research problems we work on are from the realworld rather than ivory tower. They tend to be very practical and meaningful to the industry.

Is there a key initiative you’re working on? 

During the recent few years, I've been experimenting a crowdsourcing approach to energy analytics research. I started the Global Energy Forecasting Competition in 2012. These competitions have attracted hundreds of contestants from more than 60 countries. Many of them are outside the power and energy field. In each competition, we try to tackle a challenging and emerging problem. Right now we are in the middle of the third one, GEFCom2017. The theme is energy forecasting in the big data world. We have also organized the first International Symposium on Energy Analytics this June in Cairns, Australia, to host the researchers and practitioners interested in this subject.

What are the advantages of working with industrial partners?

They bring in meaningful research problems, fund projects, hire graduates, and help broadcast our research findings through their network. Isn't it a sweet deal?

Are educational institutions able to educate enough workers, or does the industry face a shortage?

In my domain, which is energy analytics, there is definitely higher demand (jobs) than supply (workers). I get calls all the time asking for my students, but I don't have enough students to fill in all those job openings.

What’s fun about your job?

Teaching students to solve the most challenging problems for the industry. I very much enjoy both the analytical challenge and the success of the students. 

Wednesday, May 17, 2017

Wind Speed for Load Forecasting Models

One way to categorize the load forecasting papers is based on the variables used in those forecasting models. Because many people who wrote load forecasting papers only had access to the load data with time stamps, they had to propose the models based on the load series only. The representative techniques include exponential smoothing and the ARIMA family. Sometimes people also include the calendar information to come up with some regression models with classification variables. Although these are good and powerful techniques, their real-world applications in load forecasting are very limited. I have criticized those "load-only" models in some of my papers, such as the IJF2016 paper on recency effect:
Both seasonal naïve models perform very poorly compared with the other four models. Seasonal naïve models are used commonly for benchmarking purposes in other industries, such as the retail and manufacturing industries. In load forecasting, the two applications in which seasonal naïve models are most useful are: (1) benchmarking the forecast accuracy for very unpredictable loads, such as household level loads; and (2) comparisons with univariate models. In most other applications, however, the seasonal naïve models and other similar naïve models are not very meaningful, due to the lack of accuracy. 
Weather is must-have in most of the real-world load forecasting models. The most frequently used weather variable in the load forecasting literature is temperature. Some system operators, such as ISO New England, publish temperature data along with the load information. The recent load forecasting competitions, such as GEFCom2012 and GEFCom2014, have also released several years of hourly load and temperature data for benchmarking purpose.

Although non-temperature weather variables have some presence in the load forecasting literature, they are rarely studied in the context of variable selection. Recently we published a TSG paper Relative Humidity for Load Forecasting Models, discussing how to use humidity information to improve load forecasting accuracy. As a sister of that humidity paper, this paper discusses how to include wind speed information in load forecasting models.

Another comment I want to make is on the open access publication. I personally had no interest in publishing my paper with those open access publishers. This is my first try, which turns out to be a good surprise. The reviews were returned to me rather quickly, within 10 days. There were no non-sense comments, so I didn't need to deal with the personal attacks as I normally had to do. Before the final publication, the copy editor helped clean up some typos we had in the submission. From our first submission to the final pagerized version, the whole process took two weeks!

Anyway, hope that you enjoy reading this open access paper!


Jingrui Xie and Tao Hong, "Wind speed for load forecasting models", Sustainability, vol 9, no 5, pp 795, May, 2017 (open access).

Wind Speed for Load Forecasting Models

Jingrui Xie and Tao Hong


Temperature and its variants, such as polynomials and lags, have been the most frequently-used weather variables in load forecasting models. Some of the well-known secondary driving factors of electricity demand include wind speed and cloud cover. Due to the increasing penetration of distributed energy resources, the net load is more and more affected by these non-temperature weather factors. This paper fills a gap and need in the load forecasting literature by presenting a formal study on the role of wind variables in load forecasting models. We propose a systematic approach to include wind variables in a regression analysis framework. In addition to the Wind Chill Index (WCI), which is a predefined function of wind speed and temperature, we also investigate other combinations of wind speed and temperature variables. The case study is conducted for the eight load zones and the total load of ISO New England. The proposed models with the recommended wind speed variables outperform Tao’s Vanilla Benchmark model and three recency effect models on four forecast horizons, namely, day-ahead, week-ahead, month-ahead, and year-ahead. They also outperform two WCI-based models for most cases.

Thursday, May 11, 2017

RTE Day-ahead Load Forecasting Competition 2017

For many years, the Transmission System Operator RTE has been building electricity demand forecasts, ensuring the ability to match supply and demand at all times and, consequently, guaranteeing power system reliability.

The emergence of new factors related to the energy transition are impacting the electricity demand and making forecasting a more challenging task: self-consumption, growth of new uses (electric vehicles, heat pumps…), regulation of building insulation, new supply offers, possibility for consumers to monitor and control their consumption…

In this context of increasing flexibility and market rule harmonisation at the European level, RTE wants to conduct a review of current forecasting methods and assess the performance of new dynamic and adaptive approaches brought by Data Science.

A first challenge will focus on the deterministic short-term forecast of national and 12 regional electricity demands, a second one will focus on a forecast with associated uncertainty.

RTE will launch its first international public challenge in Data Science mid-May, running till mid-July. The second challenge will take place during winter 2017-2018.

Registration will be open from the opening date to the 24th of May on the platform :

All the information related to the challenge will be available on the platform mid-May. A discussion forum will allow participants to ask any questions they may have.     
Challenge rules

Participants will be given access to meteorological data provided by Météo France for RTE’s operational forecasting activities, and they will be able to retrieve national and regional demand data on RTE’s Eco2mix platform. Participants are allowed to use any other data, provided that source and nature are specified.

The models will be assessed on their ability to forecast demand of ten days (among which bank holidays) between the 25th of May and the 14th of July 2017.

These days will be announced since the challenge’s opening and will be used for the final ranking. Forecasts for day d must be submitted at 9pm on day d-1 at the latest.

In order to practice, participants will have the opportunity to submit forecasts on three consecutive days, from the 18th to the 20th of May.

The same rules regarding time of submission will apply.

Following the final ranking, the top three participants will have to submit a one page methodology document before being awarded their prize.

This document will describe the main principles of the method and the data used.
1st prize:      €10,000
2nd prize:     €5,000
3rd prize:      €3,000

Thursday, May 4, 2017

7 Reasons to Attend ISEA2017

The International Symposium on Energy Analytics (ISEA2017) is coming in 7 weeks. If you are still wondering whether you should join the event or not, here are 7 reasons for you to attend ISEA2017:

1. Grow your international network

ISEA2017 is truly international. The early registrations came from 16 countries. As a conference attendee, you will hear 20+ presentations describing methodologies and insights gained from various places in the world. You will also share your experience and expertise with this diverse audience and get their critique and compliment.

2. Check out the winning methods of GEFCom2017

Selected GEFCom2017 teams will be presenting their methodologies at ISEA2017. You will witness the recognition of GEFCom2017 winners and have the face-to-face discussion with them. Rather than reading thousands of energy forecasting papers published every year and wondering which ones work well, you can grasp the secret sauce of the most effective methods during ISEA2017.

3. Experience a novel peer review process 

Whether we like today's peer review system or not, we have to live with it, at least for the next few years before a better one is in place. We have tied ISEA2017 to an IJF special section on energy forecasting, where we try to implement a new peer review process. The ISEA2017 attendees will have the opportunity to experience this new process and help improve it.

4. Peek and shape the future of energy analytics 

If you are struggling with the topic for your next paper, ISEA2017 is a must-attend conference for you. We will discuss the emerging topics as well as the research agenda for the future. Rather than guessing where the future goes, you can contribute to the plan!

5. Attend International Symposium on Forecasting

The 37th International Symposium on Forecasting (ISF) will be held two days after ISEA2017, right at the same location. ISF is the only major scientific forecasting conference I know of. I find it very rewarding to attend ISF, where I hear forecasting topics from various industries, as well as the methodological breakthroughs in general. Many of them could be applied to the energy forecasting problems. Extending the trip to include ISF in your travel plan would be a wise choice.

6. Two world heritage sites in one place

The World Heritage Centre has a list of about 1000 world heritage sites around the globe. Two of them (Great Barrier Reef and Daintree Rainforest) are in Cairns, Australia, making Cairns the only place in this planet with two world heritage sites side by side. ISF organizers have planned the social program including numerous social events and tour opportunities for delegates, their friends and family.

7. Low registration fees

Our sponsors, the International Institute of Forecasters, Tangent Works and Journal of Modern Power Systems and Clean Energy, have generously contributed to the organization of ISEA2017, helping significantly subsidize the registration fees. If you attend both ISEA and ISF, there is an additional discount. To register both ISEA and ISF, click HERE. To register ISEA only, click HERE.

ISEA2017 will be held in Cairns, Australia, June 22-23, 2017. Look forward to seeing you there!

Monday, March 20, 2017

GEFCom2014 Load Forecasting Data

The load forecasting track of GEFCom2014 was about probabilistic load forecasting. We asked the contestants to provide one-month ahead hourly probabilistic forecasts on a rolling basis for 15 rounds. In the first round, we provided 69 months of hourly load data and 117 months of hourly temperature data. Incremental load and temperature data was provided in each of the future rounds.

Where to download the data?

The complete data was published as the appendix of our GEFCom2014 paper. If you don't have access to Science Direct, you can downloaded from my Dropbox link HERE. Regardless where you get the data, you should cite this paper to acknowledge the source:

  • Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli and Rob J. Hyndman, "Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond", International Journal of Forecasting, vol.32, no.3, pp 896-913, July-September, 2016.

What's in the package?

Unzip the file, you will see the folder "GEFCom2014 Data", which includes five zip files. The data for the probabilistic load forecasting track of GFECom2014 is in the file "". Unzip it, you will see the folder "load", which includes an "Instructions.txt" file and 15 other subfolders. In each folder named as "Task n", there are two files, Ln-train.csv and Ln-benchmark.csv. The train file, together with the train files released in previous rounds, can be used to generate forecasts. The benchmark file includes the forecast generated from the benchmark method.

How to use the data?

Apparently the most straightforward way of using this dataset is to replicate the competition setup and compare results directly with the top entries. Because the data published through GEFCom2014 is quite long (totally 7 years of matching load and temperature data), we can also use this dataset to test methods and models for short term load forecasting.

GEFCom2014-E data

After GEFCom2014, I organized an in-class probabilistic load forecasting competition in Fall 2015 that was open to external participants. My in-class competition setup was very similar to that of GEFCom2014, so I denoted the data for this in-class load forecasting competition as GEFCom2014-E, where E is the abbreviation of "extended". In total, this dataset covers 11 years of hourly temperature and 9 years of hourly load. A top team Florian Ziel was invited to contribute a paper to IJF (see HERE). The readers may replicate the same competition setup and compare results with Ziel's.


Note that the data I used for GEFCom2014-E was created using ISO New England data. If you want to validate a method using two independent sources, you should not use GEFCom2014-E together with ISO New England data.

Back to Datasets for Energy Forecasting.