Thursday, September 14, 2017

Who's Who in Energy Forecasting: Geert Scholma

I got to know Geert Scholma from NPower Forecasting Challenge 2015, where he outperformed my BigDEAL students on the leaderboard. Since then, he has been topping the NPower leaderboard every time. Recently, as a winner of the qualifying match of GEFCom2017, he presented his methodology at ISEA2017.

Geert lives in Rotterdam, The Netherlands. He has a strong focus on data science and the energy transition, with a masters degree in physics and 5 years experience as an Energy Forecaster for Energy Retail Company and E.On spin-off Uniper Benelux.

Since 2015 he has participated in several online energy forecasting competitions, with the following track record:

What brought you to the energy forecasting profession?

Since an early stage of my physics education at University I have been inspired by developments in the Energy Transition and have directed my career path towards it. This began with research and internships in the field of solar electricity production and energy service companies. My first job was at a consultancy firm, where we managed energy labels and energy policy for social housing firms. I then decided to look for a position at a large energy company in The Netherlands, but it was a coincidence that I ended up as an energy forecaster. I had never heard of the term before, but the field has proven me to be very interesting.

What do you do at your current job? And what's fun about it?

5 years ago I started my job as an energy forecaster for Uniper Benelux. My main focus has been the development of new day-ahead forecasting models for all our customers. As our portfolio consists of electricity, gas and district heating for small, medium and large clients, this is quite a diverse challenge. The main task of our team is to manage the balance responsibility and minimize our clients' imbalance volumes and costs. Besides having to forecast consumption and production volumes, this also means taking into account the effects of hierarchy / portfolio and pricing the profiles of potential new clients. The fun part for me is squeezing the most information out of these big data. And I guess in general, working with numbers just makes me a happy person :)

What was your first (forecasting or data mining) competition about? And how did you do?

My first competition was the first UK Npower competition. Data were a single aggregated daily electricity consumption time series and thus relatively easy to manage as I was used to work with multiple time series with an hourly resolution. I won the competition. As forecasting much more than 1 day into the future was new to me I learned to not extrapolate time trends too enthusiastically into the future. All competitions I have participated so far have always taught me similar lessons that I wouldn't have learned as fast within my daily job.

Can you share with us the most exciting competition you've participated?

The most exciting competition so far was the recent RTE Power Consumption Forecast Challenge in 2017. The task was to forecast the day-ahead 15 minute electricity consumption for all 12 French Regions. The aspect that it made it more interesting than the other competitions was the fact that the data was real and the solution applicable. Also the competition much tougher. The event was concluded with a seminar in Paris where I learned that almost all of my competitors used machine learning, where my solution was mainly based on a single linear regression model.

Is there a key initiative or exciting project you are working on these days?

I am working on an update for the second part of the French RTE Competition this winter. I am focusing on an update of my base model, but also machine learning and ensemble forecasting. I am curious how the battle between simple linear regression and complicated black box machine learning methods will end next time when I include some new variables I already have in mind. Together with someone from IBM we are also working on a new approach to (energy) forecasting benchmarks, but this will still take some more time to become concrete.

What's your forecast for the next 10 years of energy forecasting field?

I expect real-time pricing and demand-side management to become a significant new factor in energy forecasting. One of the current challenges is often still to predict a yearly growing volume of "behind the meter", renewable energy (mostly solar) production. As renewable production will become more and more difficult to manage, market prices for more clients will become flexible and more client groups will be encouraged to either store their own production or shift their demand towards off-peak time hours. I expect this to open a complete new and very interesting chapter in energy forecasting.

How do you spend your free time?

I am a real outdoor sportsman and enjoy cycling and tennis. My partner is from Italy and we often visit her family in Puglia where we enjoy the food, family and beautiful coast and countryside.

Sunday, September 10, 2017

TAO: The Analytics Officer

Today is the 7-year mark after my PhD defense. It happens to be the "Teachers' Day" in China, a holiday dedicated to the teachers. I just created a new label "9/10" to collect the posts published on this same date.

I'd like to announce a new blog on this special day, TAO: The Analytics Officer, a blog of data science for the current and future Chief Analytics Officers.

As you, my audience of this blog, are following me, I am following you too. I'm very happy to see that many of you have been promoted one or more times during the past five years. I'm sure that some of you are trying to get that promotion or climb up the career ladder. I started TAO to give you a hand by sharing some of my successful experience in helping others.

Comparing with this blog Energy Forecasting, TAO is different from the following three aspects:
  • The target audience of TAO are "the current and future Chief Analytics Officers". It will not be an academic or research-oriented blog. Instead, it will be crafted for industry professionals and the students who are going to the industry. 
  • The content of TAO is not specific to the energy industry. I'm mostly known for my energy forecasting work, because that's where my papers were published. Nevertheless, I have done a lot of analytical projects in other industries, such as retail, healthcare, sports, and financial services. TAO will attract people from various industries too. 
  • TAO will go beyond forecasting. Although I'm mostly known for forecasting, I did earn a PhD in Operations Research, where I had 10 times more coursework in optimization than statistics and forecasting. In TAO, I will include a significant amount of optimization (prescriptive analytics) in the blog posts.  

Here is the "About" of TAO:
From $1,350/month to $800/hour, I did it in 12 years.
I was torturing the data before people called it "big data". I was specialized in operations research and working on forecasting and optimization projects before people started to define and promote "analytics". I was building multi-layer fully-connected recurrent neural networks before "deep learning" became a buzzword.
I don't recall when my profession got its new name "data science". All of a sudden, the folks in my circle are either working on "big data analytics", or are trying to become "data scientists".
As a professor, I find my passion in mentoring students and helping them land the dream jobs. As a consultant, I enjoy making my clients look better and get promoted. To replicate the success to a broader audience, I created this blog, with the hope that more people can join this profession, unlock the power of data for their organizations, and rise to the top of the org chart.
Best luck with your data science journey!
The URL for TAO is

Thursday, August 31, 2017

Factors Affecting Load Forecast Accuracy

In some of my papers, I tried to present fairly comprehensive case studies that cover various load zones. I often use a primary case study to illustrate the flow or components of a proposed methodology. After that I apply the same methodology to a secondary case study to show that the same methodology works well on other zones. A by product of this publication process is a series of benchmark results on various of load zones. You may have realized that the same methodology or model typically results in different forecast errors on different load zones.

A most relevant example was in my IJF paper on weather station selection, where I applied the same methodology to two datasets, one from NCEMC that includes 3 power supply areas and 44 building blocks, the other from GEFCom2012 that includes 20 load zones and the sum of them. The MAPE values across these zones are quite different, with very high MAPEs (double or triple digits) on the industrial load zones. 

In this post, I will have a deeper dive into the factors affecting load forecast accuracy. Here we are concerning short term (point) load forecasting, because there is much more than accuracy to worry about in long term forecasting. 

That said, I'm going to answer the following questions:
What are the factors affecting short term load forecast accuracy?
Here is my list:
  1. Data quality. Garbage in, garbage out. If the input data is bad, the forecasts tend to be bad too. In some rare situations, the bad input data may offset some of the model deficiency. 
  2. Goodness of the model. If the model is able to capture most of the salient features, and ignore the noise, the forecast must be good. 
  3. Load composition. Errors of residential load forecasts are typically lower than those of industrial load forecasts, assuming that the size of loads are similar. Keep in mind that there are easy-to-forecast industrial loads, and hard-to-forecast residential loads.
  4. Size of load. The forecast errors (in MAPE) on big loads are typically smaller than the small loads.
  5. Time of day. The forecast errors during sleeping hours are typically smaller than the errors during daytime.
  6. Season of year. The forecast errors during summer and winter are typically bigger than the errors during spring and fall.
  7. Special days. The forecast errors during special days, such as holidays and large local even days, are typically higher than the errors during regular days. 
  8. Weather condition. The forecast errors in the areas with stable weather conditions are typically smaller than the place with fast-changing weather conditions.
  9. Weather forecasts. A good weather forecast often leads to a good load forecast. 
  10. Locations of weather station(s). When the weather stations can properly represent the weather of the territory, the forecasts are typically good. 
  11. Size of territory. With the same load level, a large territory typically has bigger errors than a small territory.
  12. Hierarchical load. When the load can be further split down the hierarchy, the forecast at top level can be improved. 
  13. Error calculation. The errors (in MAPE) of hourly loads are typically higher than those of daily energy, which is higher than those of monthly and annual energy. 
  14. Demand side management. Demand response and energy efficiency programs often lead to large errors in load forecasts.
  15. Distributed resources. Increase penetration of behind-the-meter solar typically increase the load forecast errors. 
  16. Emerging technologies. EV loads add more uncertainties to the conventional loads and tend to increase the load forecast errors. 
Apparently the answer is not trivial. In most if not all of the above bullet points, the answer is not definitive. This is why we need many benchmarking studies to better understand the forecast errors. It doesn't make sense to criticize the MAPE values prior to working on the data. Ironically, this is what many vendors do as part of the sales bluff.

Back to Error Analysis in Load Forecasting.

Friday, August 18, 2017

IEEE PES Announces Winning Teams for Global Energy Forecasting Competition 2017

More than 300 students and professionals from more than 30 countries formed 177 teams to compete on hierarchical probabilistic load forecasting, exploring opportunities from the big data world and tackling the analytical challenges.

PISCATAWAY, N.J., USA, August 18, 2017 – IEEE, the world's largest professional organization advancing technology for humanity, today announced the results of the Global Energy Forecasting Competition 2017 (GEFCom2017), which was organized and supported by the IEEE Power & Energy Society (IEEE PES) and the IEEE Working Group on Energy Forecasting (WGEF).

“I congratulate the eight winning teams and all the contestants of GEFCom2017. They are pushing the boundaries of electric load forecasting,” said Dr. Tao Hong, Chair of IEEE Working Group on Energy Forecasting and General Chair of Global Energy Forecasting Competition, “GEFCom2017 is the longest and most challenging one among the series of Global Energy Forecasting Competitions. To encourage the contestants to explore opportunities in the big data world, GEFCom2017 released more data than the previous competitions combined.”

The theme is hierarchical probabilistic load forecasting, merging the challenges of both GEFCom2012 and GEFCom2014. The 6-month-long GEFCom2017 includes two phases. The qualifying match was to provide medium term probabilistic forecasts for ISO New England region in real time. It meant to attract and educate a large number of contestants with diverse background, and to prepare them for the final match. The final match asked the contestants to provide probabilistic forecasts for 161 delivery points. All of the competition data will be further released to the public for future research and benchmarking purposes.

"The Global Energy Forecasting Competitions have been extraordinarily successful in stimulating and promoting technology advancement. To continue the momentum, IEEE PES decided to fund GEFCom2017 with $20,000 for the cash prizes to the winning teams," said Patrick Ryan, Executive Director of IEEE PES, "We are so delighted to witness another fantastic competition. Look forward to seeing its positive impact to the industry for the coming years."

GEFCom2017 includes a two-track qualifying match and a single-track final match. Each track recognizes three winning teams.

Qualifying Match Defined Data Track Winners:

  • Andrew J. Landgraf (Battelle, USA)
  • Slawek Smyl (Uber Technologies, USA) and Grace Hua (Microsoft, USA)
  • Gábor Nagy and Gergő Barta (Budapest University of Technology and Economics, Hungary), Gábor Simon (dmlab, Hungary)

Qualifying Match Open Track Winners:

  • Geert Scholma (The Netherlands)
  • Florian Ziel (Universität Duisburg-Essen, Germany)
  • Jingrui Xie (SAS Institute, Inc., USA)

Final Match Winners:

  • Isao Kanda and Juan Quintana (Japan Meteorological Corporation, Japan)
  • Ján Dolinský, Mária Starovská and Robert Toth (Tangent Works, Slovakia)
  • Gábor Nagy and Gergő Barta (Budapest University of Technology and Economics, Hungary), Gábor Simon (dmlab, Hungary)

For more information about the GEFCom2017, please visit

Original announcement: 

Wednesday, August 9, 2017

Benchmarking Robustness of Load Forecasting Models under Data Integrity Attacks

GIGO - garbage in, garbage out. In forecasting, GIGO means that if the model is fed with garbage (bad) data, the forecast would be bad too. In the power industry, bad load forecasts often result in waste of energy resources, financial losses, brownouts or even blackouts.

Anomaly detection and data cleansing procedures may help alleviate some of the bad data from the input side. However, what if the bad data was created by hackers? Can the existing models "survive" or stay accurate under data attacks? This paper offers some benchmark results.

This paper sets a few "first":
  1. This is the first paper formally addressing the cybersecurity issues in the load forecasting literature. I believe that the data attacks should be of a great concern to the forecasting community. This paper sets a solid ground for future research. 
  2. This is my first journal paper co-authored with a professor in my doctoral committee, Dr. Shu-Cherng Fang. Many years ago, I picked up the topic of my dissertation from one of my consulting projects. I then invited a team of world class professors from different areas to form the committee. None of them were really into load forecasting, though I had the opportunities learning from different perspectives. 
  3. This is my first journal paper that went through one year of peer review cycle, the longest peer review I've experienced. It's definitely worth the effort. The IJF editors and reviewers certainly spent a significant amount of time reading the paper and offered so many constructive comments. I wish I could know their names and identify them in the acknowledgement section.

Jian Luo, Tao Hong and Shu-Cherng Fang, "Benchmarking robustness of load forecasting models under data integrity attacks", International Journal of Forecasting, accepted. (working paper)

Benchmarking robustness of load forecasting models under data integrity attacks

Jian Luo, Tao Hong and Shu-Cherng Fang


As the internet continues to expand its footprint, cybersecurity has become a major concern for the governments and private sectors. One of the cybersecurity issues is on data integrity attacks. In this paper, we focus on the power industry, where the forecasting processes heavily rely on the quality of data. The data integrity attacks are expected to harm the performance of forecasting systems, which greatly impact the financial bottom line of power companies and the resilience of power grids. Here we reveal how data integrity attacks can affect the accuracy of four representative load forecasting models (i.e., multiple linear regression, support vector regression, artificial neural networks, and fuzzy interaction regression). We first simulate some data integrity attacks by randomly injecting some multipliers that follow a normal or uniform distribution to the load series. Then the aforementioned four load forecasting models are applied to generate one-year ahead ex post point forecasts for comparisons of their forecast errors. The results show that the support vector regression model, trailed closely by the multiple linear regression model, is most robust, while the fuzzy interaction regression model is least robust among the four. Nevertheless, all of the four models fail to provide satisfying forecasts when the scale of data integrity attacks becomes large. This presents a serious challenge to the load forecasters and the broader forecasting community: How to generate accurate forecasts under data integrity attacks? We use the publicly-available data from Global Energy Forecasting Competition 2012 to construct the case study. At the end, we also offer an outlook of potential research topics for future studies.

Monday, August 7, 2017

Breakthrough or Too Good To Be True: Several Smoke Tests

When sharing my Four Steps to Review an Energy Forecasting Paper, I spent about a third of the blog post elaborating what "contribution" means. This post is triggered by several review comments to my recent TSG paper variable selection methods for probabilistic load forecasting. Here I would like to elaborate what "contribution" means from a different angle.

A little background first. 

In that TSG paper, we compared two variable selection schemes, HeM (Heuristic Method) that sharpens the underlying model to minimize the point forecast error, and HoM (Holistic Method) that uses the quantile score to select the underlying model. The key finding is as follows:
HoM costs much more computational power but only produces slightly better quantile scores than HeM.
Then some of the reviewers raised the red flag:
If the new method is not much better than the existing one, why shall we accept the paper?
I believe that the question is genuine. Most likely the reviewers, as well as many other load forecasters, have read many papers in the literature that have presented super powerful models or methods that led to super accurate forecasts. After being flooded with those breakthroughs, they would be hesitant to give favorable ratings to a paper that presents a somewhat disappointing conclusion. 

Now let's take one step back:
What if those breakthroughs were just illusions? 
Given the fact that most of those papers were proposing complicated algorithms tested by some proprietary datasets, it is very difficult to reproduce the work. In other words, we can hardly verify those stories. The reviewers and editors may be rejecting valuable papers that are not bluffing. This time I was lucky - most reviewers were on my side.

When my premature models were beating all the other competitors many years ago, I was truly astonished about the realworld performance of those "state-of-the-art" models. If those breakthroughs in the literature were really tangible, my experiences tells me that the industry would be pouring money to those authors to ask for the insights. It's been many years after those papers were published, how much of those published papers have been recognized by the industry? (In my IJF review, I did mentioned a few exemplary papers though.)

We have run the Global Energy Forecasting Competitions three times. How often do you see those authors or their students on the leaderboard? If their methods are truly effective but not recognized by the industry, why not test them through these public competitions? 

Okay, now you know some of those "peer-reviewed" papers may be bluffing. How to tell if they are really bluffing? Before telling you my answer, let's see how those papers are produced:
  1. To make sure that the contribution is novel, they authors must propose something new. To insure it looks challenging, the proposal must be complicated. The easiest way to create such techniques is to mix the existing ones, such as ANN+PSO+ARIMA, etc.
  2. To make sure that nobody can reproduce the results, the data used in the case study must be proprietary. Since all we need to have the paper accepted is to have it go through the reviewers and editor(s). An unpopular dataset is fine too, because the reviewers don't bother to spend the time reproducing the work.
  3. To make sure that the results can justify the breakthrough, the forecasts must be close to perfection. The proposed models must beat the existing ones to death. How to accomplish that? Since the authors have the complete knowledge of the future dataset, just fine tune the model so that it outperform the others in the forecast period. This is called "peeking the future".
In reality, it is very hard to build the models or methods that can dominate the state of the art. Certainly it doesn't come from a "hybrid" of the existing ones. Instead, the breakthroughs (or major improvement) come from using new variables that people have not yet completely understood in the past, borrowing the knowledge from other domains, leveraging new computing power, and so forth.

In the world of predictive modeling, there is that well-known theorem called "no-free lunch", which states that no one model works the best in all situations. In other words, if one beats the others in all cases across all measures, it is "too good to be true". We need the empirical studies that report what's NOT working well as much as the ones promoting the champions. 

It's time for my list of smoke tests. The more check marks a paper gets, the more I consider it too good to be true.
  1. The paper is proposing a mix (or hybrid) of many techniques.
  2. The paper is merely catching new buzzwords.
  3. The data is proprietary.
  4. The paper is not co-authored with industry people (or not sponsored by the industry). 
  5. The proposed method does not utilize new variables 
  6. The proposed method does not take knowledge of other domains.
  7. The proposed method does not leverage new computing resources.
  8. The proposed method is dominating its counterparts (another credible method) in all aspects.
I spend minimal amount of time reading those papers, because they are emperor's new clothes to me,. Hopefully this list can help the readers save some time too. On the other hand, I didn't mean to imply that the authors were intentionally faking the paper. Many of them are genuine people but making the mistakes without knowing so. Hopefully this blog post can help point to the right direction for the authors as well.

Tuesday, August 1, 2017

Variable Selection Methods for Probabilistic Load Forecasting: Empirical Evidence from Seven states of the United States

I am an evidence-based man. This mentality saves me tremendous amount of time in recent years. I have been minimizing my time in following bluffs in the literature. On the other hand, I have been developing empirical case studies and encourage the community to contribute to he empirical research.

In my GEFCom2014 paper, I raised the following question to the forecasting community:
Can a better point forecasting model lead to a better probabilistic forecast?
To answer this question, we have to first understand the definition of "better", a.k.a., forecast evaluation measures and methods. In this paper, we compared two variable selection methods based on point and probabilistic error measures respectively. The case study covers seven states of the US. The results from this paper can hopefully be leveraged by future empirical studies for comparison purposes.

(This paper is an upgrade to our PMAPS2016 paper.)


Jingrui Xie and Tao Hong, "Variable Selection Methods for Probabilistic Load Forecasting: Empirical Evidence from Seven states of the United States", IEEE Transactions on Smart Grid, in press.

Variable Selection Methods for Probabilistic Load Forecasting: Empirical Evidence from Seven states of the United States

Jingrui Xie and Tao Hong


Variable selection is the process of selecting a subset of relevant variables for use in model construction. It is a critical step in forecasting but has not yet played a major role in the load forecasting literature. In probabilistic load forecasting, many methodologies to date rely on the variable selection mechanisms inherited from the point load forecasting literature. Consequently, the variables of an underlying model for probabilistic load forecasting are selected by minimizing a point error measure. On the other hand, a holistic and seemingly more accurate method would be to select variables using probabilistic error measures. Nevertheless, this holistic approach by nature requires more computational efforts than its counterpart. As the computing technologies are being greatly enhanced over time, a fundamental research question arises: can we significantly improve the forecast skill by taking the holistic yet computationally intensive variable selection method? This paper tackles the variable selection problem in probabilistic load forecasting by proposing a holistic method (HoM) and comparing it with a heuristic method (HeM). HoM uses a probabilistic error measure to select the variables to construct the underlying model for probabilistic forecasting, which is consistent with the error measure used for the final probabilistic forecast evaluation. HeM takes a shortcut by relying on a point error measure for variable selection. The evidence from the empirical study covering seven states of the United States suggests that 1) the two methods indeed return different variable sets for the underlying models, and 2) HoM slightly outperforms but does not dominate HeM w.r.t. the skill of probabilistic load forecasts. Nevertheless, the conclusion might vary on other datasets. Other empirical studies of the same nature would be encouraged as part of the future work.

Monday, July 31, 2017

Who's Who in Energy Forecasting: Rafał Weron

Prof. Rafał Weron is the Head of the Economic Modeling Group in the Department of Operations Research at Wroclaw University of Science and Technology. He has so many accomplishments in energy forecasting, such as his renowned book on load and price forecasting, many widely cited papers, and of course those talented students. Recently he won the IIF-Hong Award for his IJF review paper on electricity price forecasting.

What brought you to the energy field, particularly load and price forecasting?

I would like to say that a well thought-out decision ... but realistically ... it was more luck and coincidence. I was the right person at the right time.

Towards the end of my PhD studies, back in 1998, I started looking for a new area of research. I studied mathematics, my PhD was from something fashionable in the 1990s – math finance. But it was too theoretical for me, too far from real applications. Finance itself was better. Yet, doing top level finance research and publishing in top-tier finance journals was (nearly) impossible for someone working in a former East Block country back then. On the other hand, I liked the freedom of the Academia and didn’t want to work as a quant.

Then in the late 1990s, power market deregulation started spreading throughout Europe and the US. The Polish Power Exchange opened in mid-2000, but there were very few people who knew what power trading was about. Mathematicians and economists were set back by the technical aspects of power market operations, engineers had no economic training. I saw this as an opportunity. With a sound training in statistics/time series analysis, half a year spent as a trainee in an investment bank and basic knowledge about power system economics gained during a project run by the Hugo Steinhaus Center for the Polish Power Grid in 1995-1996, I had a head start compared to fellow colleagues. And I was eager to work hard.

But my first “energy” papers were not on forecasting. Rather on data analytics, risk management and derivatives pricing, focusing more on the mid-term horizon. I became interested short-term forecasting a bit later, around 2003-2004, when it became apparent that the day-ahead market was “the marketplace” for electricity trading, not the derivatives markets.

Tell us more about your price forecasting review paper. Why and how did you write that 52-page article?

The invitation from Rob Hyndman to write the review paper came out of the blue, around mid-2013. Ask him why he approached me in the first place. Prior to that I had only one paper in IJF, in a special issue on Energy Forecasting published in 2008.

But it was a welcome invitation. A few years have passed since I published my 2006 Wiley book on modeling and forecasting electricity loads and prices. And I have been gathering material
for a revised version. So I agreed happily and said that I would submit a draft by the end of the year. This turned out to be impossible … well, I started working on it too late, sometime in early December 2013, after Rob had send me a reminder email ;-)

I wanted my review to be self-contained and rather complete. I hate these review/survey papers which just cite dozens of articles without actually analyzing and thoroughly comparing the results using the same measures. So I wanted to include my own empirical examples. This meant a lot of coding and data analysis. No wonder it took me two full months to complete. But the outcome surprised even me – it was like a small book – the draft had 88 pages in the standard elsarticle Latex page layout. I was sure Rob would tell me to cut it in half …

What's your proudest accomplishment in forecasting?

A famous Polish mathematician, Hugo Steinhaus, used to say that his greatest discovery was … Stefan Banach. Yes, the same Banach known in mathematics for “Banach spaces”, one of the founders of modern functional analysis. I also like to think that my greatest accomplishments are my brilliant students. One of them – Jakub Nowotarski – graduated this June, with distinction. Jakub’s research output was outstanding, not only for a PhD student. The Quantile Regression Averaging method, which turned out to be a top performer in the GEFCom2014 competition, was 90% his idea. He would have easily received a habilitation (~tenure) within a year or two, if only he decided to stay in the Academia. But I am working now with two gifted BSc students – Grzegorz Marcjasz and Bartosz Uniejewski. Someday I may be able to say the same about them.

Do you work with companies to improve their forecasting practice?

I have had different episodes in my life, some more academic, some less. Over the last two decades I have been periodically engaged as a consultant to financial, energy and software engineering companies. And yes, I have worked with utilities and power generators on improving their load and price forecasting techniques and risk management systems. The hype on energy forecasting in Poland was between 2000 and 2006. Then a series of mergers changed the landscape – the four large companies that remained were not that interested in developing in-house solutions anymore. So my recent developments in forecasting are more academic in nature.

Is there a key initiative or exciting project you are working?

Together with Florian Ziel we are working on a book for CRC Press that will supersede my 2006 Wiley book. The tentative title is “Forecasting Electricity Prices: A Guide to Robust Modeling”. It is scheduled to be out in 2018.

The book will start with a chapter on the art of forecasting, introducing the basic notions of (energy) forecasting. We will continue with a chapter on the markets for electricity and discuss the products traded there. Then the three main chapters will follow: “Forecasting for Beginners” – which will introduce a few simple models and show how point, probabilistic and path forecasts can be computed for them, “Evaluating Models and Forecasts” – which is a very important, but still underdeveloped area in energy forecasting, and “Forecasting for Experts” – which will discuss a number of more advanced concepts, like regime-switching, shrinkage, feature selection, non-linear and fundamental models.

What's your forecast for the next 10 years of energy forecasting field?

This is a tough question. When writing my 2014 IJF review I came up with five directions in which electricity price forecasting would or should evolve over the next decade: (1) a better treatment of seasonality and use of fundamentals, (2) going beyond point forecasts, (3) more extensive use of forecast combinations, (4) development of multivariate models, and (5) more thorough forecast evaluation. Out of these, I think that the least has been done since then in the context of multivariate models. This is not a surprise, multivariate models are much more demanding, not only conceptually but also computationally. But I do believe that they have a lot to offer. Also Bayesian methods may see more extensive use, especially in probabilistic forecasting.

Another direction that I think may become important in the near future is “path forecasting”. Currently, in a vast majority of load or price forecasting papers only marginal (i.e., at one point in time) distributions are considered, either in a point or probabilistic context. But the forecast for hour 9 should not be independent from the one for hour 8. If we predict a price drop below a “normal” level for hour 8 tomorrow, then it is quite likely that the price for hour 9 will also be lower than under “normal” circumstances. Our forecasting models should take this into account.

What else do you do in the academic world other than energy forecasting?

Currently it’s 75% percent energy forecasting and 25% agent based modeling, but also related to energy markets – diffusion of innovations, like dynamic tariffs or pro-ecological behavior. In the not so distant past I have worked on long-range dependence, risk management and derivatives pricing, also outside the Academia.

What's fun about your job?

Everything. The sleepless nights spent on writing papers, the discussions with the reviewers and editors that I am right and they are wrong, and – most importantly – dozens of emails and skype calls I exchange each day with my students when working on a new research idea.  

How do you spend your free time?

Working. No, this would be an exaggeration … but only a small one. A researcher is never on a holiday – best ideas don’t come during my office hours, they tend to pop up unexpectedly. But I try my best not to be a 100% workaholic. I like mountains – both hiking and skiing. From sports – playing volleyball, badminton and squash – not that I’m a good player, but I like it ;-) 

Wednesday, May 31, 2017

Tao Hong - Energy Education Leader of the Year

Earlier this month, I was honored to be named as the Education Leader of the Year by Charlotte Business Journal (CBJ) at the Energy Inc. Summit.

My friend Alyssa Farrell took a video of the award reception speech, where I gave a "forecast" about the energy industry:
The energy companies will be moving more Gigabytes of data than GWh of electricity. 
Here is the 1-minute speech:

When I was first informed about this award, I didn't realize its prestige. Then I started getting congrats from friends, colleagues, and even the dean. I guessed that it must be something big. After the award ceremony, CBJ put my profile in print and online (Energy Leadership Awards: Putting big data to work for energy). UNC Charlotte also featured the story in its campus news letter.

Since the CBJ article is behind a paywall, I'm sharing the interview with the audience here.

What drew you to a career in education? How long have you been in that field?

Before coming back to the academia, I was working at SAS, one of the very best employers in the world. Part of that job was to teach classes internationally. My primary audience was industry professionals. Through that experience, I found a big gap between what the industry needed in terms of analytics and data science and what universities were offering through various academic programs. I thought I could be that person to help bridge the gap, so I took a mission of producing the finest data scientists for the energy industry and joined UNC Charlotte. I've been on this academic job for almost 4 years.

What’s the most important part of what you do?

I would say students are the most important part of what I do. I consider students as my products. I want to make the finest products for the industry, so everything I do is centered around the students: I try to pick the best raw materials, perfect them as much as possible, and then put them in the best place of the market. As a professor, my job can be mainly categorized in three pieces, research, teaching and services. These three closely tie together. The industry partners bring me their problems to work on; I help them solve these problems and then bring the research findings to the class; then they keep sending me new problems and hiring my students.

How do you see energy education evolving?

I think the evolution of the energy education is two-fold. First, it has to be interdisciplinary. It's no longer the job of one department, such as electrical engineering or mechanical engineering. We have to involve many academic departments to educate the workforce for the energy industry. Some of them should even go beyond the college of engineering, such as policy, economics, statistics and meteorological science.

Talk a little about the BigDEAL Lab. What does that mean for students?

It is the best place to be if you want to be the elite data scientists in the energy industry. BigDEAL students have the opportunity to solve the most challenging analytic problems in the industry; they have access to the state-of-the-art software donated by our industry partners; they can leverage many data sources that no other universities have access to. As a result, BigDEAL students have been taking top places in many international competitions and been chased by many renowned employers in the industry.

What role does UNCC play in the energy industry - both locally and nationally?

UNCC have been training many energy professionals in Carolinas and delivering many fresh graduates to the local energy industry. Nationally, UNCC sets a great example of industry-university collaboration.

What makes UNCC’s research so valuable?

We are fortunate to be located in a large city and surrounded by many enthusiastic industry partners. The research problems we work on are from the realworld rather than ivory tower. They tend to be very practical and meaningful to the industry.

Is there a key initiative you’re working on? 

During the recent few years, I've been experimenting a crowdsourcing approach to energy analytics research. I started the Global Energy Forecasting Competition in 2012. These competitions have attracted hundreds of contestants from more than 60 countries. Many of them are outside the power and energy field. In each competition, we try to tackle a challenging and emerging problem. Right now we are in the middle of the third one, GEFCom2017. The theme is energy forecasting in the big data world. We have also organized the first International Symposium on Energy Analytics this June in Cairns, Australia, to host the researchers and practitioners interested in this subject.

What are the advantages of working with industrial partners?

They bring in meaningful research problems, fund projects, hire graduates, and help broadcast our research findings through their network. Isn't it a sweet deal?

Are educational institutions able to educate enough workers, or does the industry face a shortage?

In my domain, which is energy analytics, there is definitely higher demand (jobs) than supply (workers). I get calls all the time asking for my students, but I don't have enough students to fill in all those job openings.

What’s fun about your job?

Teaching students to solve the most challenging problems for the industry. I very much enjoy both the analytical challenge and the success of the students. 

Wednesday, May 17, 2017

Wind Speed for Load Forecasting Models

One way to categorize the load forecasting papers is based on the variables used in those forecasting models. Because many people who wrote load forecasting papers only had access to the load data with time stamps, they had to propose the models based on the load series only. The representative techniques include exponential smoothing and the ARIMA family. Sometimes people also include the calendar information to come up with some regression models with classification variables. Although these are good and powerful techniques, their real-world applications in load forecasting are very limited. I have criticized those "load-only" models in some of my papers, such as the IJF2016 paper on recency effect:
Both seasonal naïve models perform very poorly compared with the other four models. Seasonal naïve models are used commonly for benchmarking purposes in other industries, such as the retail and manufacturing industries. In load forecasting, the two applications in which seasonal naïve models are most useful are: (1) benchmarking the forecast accuracy for very unpredictable loads, such as household level loads; and (2) comparisons with univariate models. In most other applications, however, the seasonal naïve models and other similar naïve models are not very meaningful, due to the lack of accuracy. 
Weather is must-have in most of the real-world load forecasting models. The most frequently used weather variable in the load forecasting literature is temperature. Some system operators, such as ISO New England, publish temperature data along with the load information. The recent load forecasting competitions, such as GEFCom2012 and GEFCom2014, have also released several years of hourly load and temperature data for benchmarking purpose.

Although non-temperature weather variables have some presence in the load forecasting literature, they are rarely studied in the context of variable selection. Recently we published a TSG paper Relative Humidity for Load Forecasting Models, discussing how to use humidity information to improve load forecasting accuracy. As a sister of that humidity paper, this paper discusses how to include wind speed information in load forecasting models.

Another comment I want to make is on the open access publication. I personally had no interest in publishing my paper with those open access publishers. This is my first try, which turns out to be a good surprise. The reviews were returned to me rather quickly, within 10 days. There were no non-sense comments, so I didn't need to deal with the personal attacks as I normally had to do. Before the final publication, the copy editor helped clean up some typos we had in the submission. From our first submission to the final pagerized version, the whole process took two weeks!

Anyway, hope that you enjoy reading this open access paper!


Jingrui Xie and Tao Hong, "Wind speed for load forecasting models", Sustainability, vol 9, no 5, pp 795, May, 2017 (open access).

Wind Speed for Load Forecasting Models

Jingrui Xie and Tao Hong


Temperature and its variants, such as polynomials and lags, have been the most frequently-used weather variables in load forecasting models. Some of the well-known secondary driving factors of electricity demand include wind speed and cloud cover. Due to the increasing penetration of distributed energy resources, the net load is more and more affected by these non-temperature weather factors. This paper fills a gap and need in the load forecasting literature by presenting a formal study on the role of wind variables in load forecasting models. We propose a systematic approach to include wind variables in a regression analysis framework. In addition to the Wind Chill Index (WCI), which is a predefined function of wind speed and temperature, we also investigate other combinations of wind speed and temperature variables. The case study is conducted for the eight load zones and the total load of ISO New England. The proposed models with the recommended wind speed variables outperform Tao’s Vanilla Benchmark model and three recency effect models on four forecast horizons, namely, day-ahead, week-ahead, month-ahead, and year-ahead. They also outperform two WCI-based models for most cases.