Friday, December 8, 2017

Energy Analytics (Fall 2017)

This semester is the fourth time I'm teaching Energy Analytics at UNC Charlotte. I have been offering this course every Fall since 2014. Previously I blogged about the offering in Fall 2015 (see THIS POST).

Student Profile

The class started with 11 master students and 1 PhD student. The master students were from three programs: engineering management (8), applied energy (2), and economics (1). The PhD student was from the PhD program in infrastructure and environmental systems.

After the first mid-term exam, 4 master students from engineering management withdrew the class. The other 8 students completed the course at the end. Here is the group picture including the 8 students, graduate teaching assistant Masoud Sobhani, and myself.

Energy Analytics group picture (Fall 2017)


The following 12 topics were covered throughout the semester. Due to Thanksgiving holiday, the last two topics were covered in one week. Each of the first 10 topics was covered in one week.

  1. Greetings; Introduction to analytics
  2. Forecasting principles and practices
  3. Electric load forecasting
  4. Weather station selection
  5. Retail energy forecasting
  6. Model selection
  7. Probabilistic load forecasting 
  8. Robustness of load forecasting models
  9. Electricity price forecasting 
  10. Wind and solar power forecasting
  11. Demand response analytics
  12. Outage analytics

Comparing with the offering in Fall 2015, I merged short and long term load forecasting, and added two topics based on my recent papers: model selection (Wang, Liu and Hong, IJF 2016) and robustness of load forecasting models (see Luo, Hong and Fang, IJF 2018). 

Homework, Group Presentations, Project and Exams

The course grading is based on 4 homework assignments (5' x 4), 2 group presentations (5' x 2), 1 project (30') and 2 exams (20' x 2). The students were ranked for their forecast accuracy in the homework assignments and exams. The rankings were tied to the credits they were receiving.

In 2015, I opened the in-class competitions to the external participants. This year, I didn't do that. Instead, I sent the students to the npower forecasting challenge, where a top 5 position would be counted as some bonus credits in the class. At the end, the No. 1 team of the npower forecasting challenge was from this class. (See the announcement HERE.)

If a student can pass the SAS programmer certification(s), s/he would also receive some bonus credits.

Training Objectives

The training objectives are the same as those in 2015 (See THIS POST). Most importantly, I kept reminding them about my promise: "the more time you spend, the more you learn."

To-do List

Again, the objectives have been met. Nevertheless, I have not yet completed the to-do list from the 2015 offering (See THIS POST).

The withdraw rate is 4 out of 12, which is higher than the previous three offerings. I noticed that all of the 4 students who withdrew the class were distance learning students. Only one distance learning student was able to complete the class. In the 2015 offering, all of the survivors were on campus students. I guess this is due to the heavy workload of the course. Most, if not all of the distance learning students are working professionals, who may not have the flexibility of spending 20-30 hours for a course.

I don't think I can complete my book by the next offering of this course. I have to hurry up on that!

Wednesday, December 6, 2017

UNC Charlotte Students Winning All Top 3 Spots of NPower Forecasting Challenge 2017

Every year, RWE npower, a large electricity generator and supplier of gas and electricity based in the United Kingdom hosts a forecasting competition to recruit summer interns. While the internships are only open to UK students, the competition is open to the world. Hundreds of students and working professionals have participated in these npower forecasting challenges in the past few years. Every time I sent a few students to the competition. Every time, they took a few the top spots (see 2015 electricity, 2015 gas, and 2016).

This year, 19 UK teams and 26 international teams joined the competition. Npower created a separate leaderboard for the UK students. The top 1 UK team would rank #9 among all teams. The screen shot below shows the top 8 teams. The official site is HERE.

For the first time, my students took all top 3 spots. They came from two of my classes: Technological Forecasting and Decision Making (Spring 2017) and Energy Analytics (Fall 2017). Most of them are currently enrolled in the master capstone projects under my supervision. My courses are among the most challenging ones in the college. The students had to spend tremendous amount of time to earn the credits. I'm glad that they have acquired some useful skills from the class and showed off their analytical capabilities through the competition. I asked the top teams to summarize their methodology in the comment field below.

Congratulations, 49ers!

Monday, November 27, 2017

Masoud Sobhani - From Petroleum Engineer to Load Forecaster

Today (November 27, 2017), Masoud Sobhani just defended his MS thesis on data cleansing, the first BigDEAL thesis authored by a non-Chinese student.

Masoud was a petroleum engineer in Iran. He migrated to the U.S. several years ago. He first came to my office in 2015 with inquiries about our MS Engineering Management program, when I was the program director. At that time he could barely speak English. Nevertheless, I admitted him to the program mainly because of his solid academic background and industry experience in the energy sector.

He started the program in Spring 2016 to pursue a non-thesis master degree, planning to graduate in Summer 2017. Due to the challenging nature of my courses (see some student comments HERE), most of the non-thesis master students in our MSEM program try their best to avoid them. Masoud is certainly an exception. He managed to take all my courses during his tenure in the program. At Npower forecasting challenge 2016, Masoud took a top 3 place.

In Spring 2017, he came to me to discuss the possibilities of pursuing a PhD under my supervision. Recognizing him as the top student in the program, I agreed to take him as my doctoral student with the condition that he completes a MS thesis by the end of the year. He took the challenge. From May to November, he passed SAS Advanced Programmer certification exam, identified his thesis topic, designed and implemented a novel data cleansing algorithm, and finished his 10,000-word thesis. The defense was very well done.

Congratulations, Masoud, and best luck with your PhD journey!

Tuesday, October 31, 2017

NPower Forecasting Challenge 2017

It's time for npower forecasting challenge 2017! The registration will close on Nov 2nd, 2017. You don't have to be a UK student or citizen to join the game, but the prizes and internship opportunities are for the UK people only. The organizer also told me that you may register as a single-person team if you like. Since the registration form asks for multiple names, you may put your own name and contact information twice.

This is the list of blog posts about the previous npower forecasting challenges, where you can find our winning methodologies. 

Look forward to seeing you in the competition!

Thursday, September 14, 2017

Who's Who in Energy Forecasting: Geert Scholma

I got to know Geert Scholma from NPower Forecasting Challenge 2015, where he outperformed my BigDEAL students on the leaderboard. Since then, he has been topping the NPower leaderboard every time. Recently, as a winner of the qualifying match of GEFCom2017, he presented his methodology at ISEA2017.

Geert lives in Rotterdam, The Netherlands. He has a strong focus on data science and the energy transition, with a masters degree in physics and 5 years experience as an Energy Forecaster for Energy Retail Company and E.On spin-off Uniper Benelux.

Since 2015 he has participated in several online energy forecasting competitions, with the following track record:

What brought you to the energy forecasting profession?

Since an early stage of my physics education at University I have been inspired by developments in the Energy Transition and have directed my career path towards it. This began with research and internships in the field of solar electricity production and energy service companies. My first job was at a consultancy firm, where we managed energy labels and energy policy for social housing firms. I then decided to look for a position at a large energy company in The Netherlands, but it was a coincidence that I ended up as an energy forecaster. I had never heard of the term before, but the field has proven me to be very interesting.

What do you do at your current job? And what's fun about it?

5 years ago I started my job as an energy forecaster for Uniper Benelux. My main focus has been the development of new day-ahead forecasting models for all our customers. As our portfolio consists of electricity, gas and district heating for small, medium and large clients, this is quite a diverse challenge. The main task of our team is to manage the balance responsibility and minimize our clients' imbalance volumes and costs. Besides having to forecast consumption and production volumes, this also means taking into account the effects of hierarchy / portfolio and pricing the profiles of potential new clients. The fun part for me is squeezing the most information out of these big data. And I guess in general, working with numbers just makes me a happy person :)

What was your first (forecasting or data mining) competition about? And how did you do?

My first competition was the first UK Npower competition. Data were a single aggregated daily electricity consumption time series and thus relatively easy to manage as I was used to work with multiple time series with an hourly resolution. I won the competition. As forecasting much more than 1 day into the future was new to me I learned to not extrapolate time trends too enthusiastically into the future. All competitions I have participated so far have always taught me similar lessons that I wouldn't have learned as fast within my daily job.

Can you share with us the most exciting competition you've participated?

The most exciting competition so far was the recent RTE Power Consumption Forecast Challenge in 2017. The task was to forecast the day-ahead 15 minute electricity consumption for all 12 French Regions. The aspect that it made it more interesting than the other competitions was the fact that the data was real and the solution applicable. Also the competition much tougher. The event was concluded with a seminar in Paris where I learned that almost all of my competitors used machine learning, where my solution was mainly based on a single linear regression model.

Is there a key initiative or exciting project you are working on these days?

I am working on an update for the second part of the French RTE Competition this winter. I am focusing on an update of my base model, but also machine learning and ensemble forecasting. I am curious how the battle between simple linear regression and complicated black box machine learning methods will end next time when I include some new variables I already have in mind. Together with someone from IBM we are also working on a new approach to (energy) forecasting benchmarks, but this will still take some more time to become concrete.

What's your forecast for the next 10 years of energy forecasting field?

I expect real-time pricing and demand-side management to become a significant new factor in energy forecasting. One of the current challenges is often still to predict a yearly growing volume of "behind the meter", renewable energy (mostly solar) production. As renewable production will become more and more difficult to manage, market prices for more clients will become flexible and more client groups will be encouraged to either store their own production or shift their demand towards off-peak time hours. I expect this to open a complete new and very interesting chapter in energy forecasting.

How do you spend your free time?

I am a real outdoor sportsman and enjoy cycling and tennis. My partner is from Italy and we often visit her family in Puglia where we enjoy the food, family and beautiful coast and countryside.

Sunday, September 10, 2017

TAO: The Analytics Officer

Today is the 7-year mark after my PhD defense. It happens to be the "Teachers' Day" in China, a holiday dedicated to the teachers. I just created a new label "9/10" to collect the posts published on this same date.

I'd like to announce a new blog on this special day, TAO: The Analytics Officer, a blog of data science for the current and future Chief Analytics Officers.

As you, my audience of this blog, are following me, I am following you too. I'm very happy to see that many of you have been promoted one or more times during the past five years. I'm sure that some of you are trying to get that promotion or climb up the career ladder. I started TAO to give you a hand by sharing some of my successful experience in helping others.

Comparing with this blog Energy Forecasting, TAO is different from the following three aspects:
  • The target audience of TAO are "the current and future Chief Analytics Officers". It will not be an academic or research-oriented blog. Instead, it will be crafted for industry professionals and the students who are going to the industry. 
  • The content of TAO is not specific to the energy industry. I'm mostly known for my energy forecasting work, because that's where my papers were published. Nevertheless, I have done a lot of analytical projects in other industries, such as retail, healthcare, sports, and financial services. TAO will attract people from various industries too. 
  • TAO will go beyond forecasting. Although I'm mostly known for forecasting, I did earn a PhD in Operations Research, where I had 10 times more coursework in optimization than statistics and forecasting. In TAO, I will include a significant amount of optimization (prescriptive analytics) in the blog posts.  

Here is the "About" of TAO:
From $1,350/month to $800/hour, I did it in 12 years.
I was torturing the data before people called it "big data". I was specialized in operations research and working on forecasting and optimization projects before people started to define and promote "analytics". I was building multi-layer fully-connected recurrent neural networks before "deep learning" became a buzzword.
I don't recall when my profession got its new name "data science". All of a sudden, the folks in my circle are either working on "big data analytics", or are trying to become "data scientists".
As a professor, I find my passion in mentoring students and helping them land the dream jobs. As a consultant, I enjoy making my clients look better and get promoted. To replicate the success to a broader audience, I created this blog, with the hope that more people can join this profession, unlock the power of data for their organizations, and rise to the top of the org chart.
Best luck with your data science journey!
The URL for TAO is

Thursday, August 31, 2017

Factors Affecting Load Forecast Accuracy

In some of my papers, I tried to present fairly comprehensive case studies that cover various load zones. I often use a primary case study to illustrate the flow or components of a proposed methodology. After that I apply the same methodology to a secondary case study to show that the same methodology works well on other zones. A by product of this publication process is a series of benchmark results on various of load zones. You may have realized that the same methodology or model typically results in different forecast errors on different load zones.

A most relevant example was in my IJF paper on weather station selection, where I applied the same methodology to two datasets, one from NCEMC that includes 3 power supply areas and 44 building blocks, the other from GEFCom2012 that includes 20 load zones and the sum of them. The MAPE values across these zones are quite different, with very high MAPEs (double or triple digits) on the industrial load zones. 

In this post, I will have a deeper dive into the factors affecting load forecast accuracy. Here we are concerning short term (point) load forecasting, because there is much more than accuracy to worry about in long term forecasting. 

That said, I'm going to answer the following questions:
What are the factors affecting short term load forecast accuracy?
Here is my list:
  1. Data quality. Garbage in, garbage out. If the input data is bad, the forecasts tend to be bad too. In some rare situations, the bad input data may offset some of the model deficiency. 
  2. Goodness of the model. If the model is able to capture most of the salient features, and ignore the noise, the forecast must be good. 
  3. Load composition. Errors of residential load forecasts are typically lower than those of industrial load forecasts, assuming that the size of loads are similar. Keep in mind that there are easy-to-forecast industrial loads, and hard-to-forecast residential loads.
  4. Size of load. The forecast errors (in MAPE) on big loads are typically smaller than the small loads.
  5. Time of day. The forecast errors during sleeping hours are typically smaller than the errors during daytime.
  6. Season of year. The forecast errors during summer and winter are typically bigger than the errors during spring and fall.
  7. Special days. The forecast errors during special days, such as holidays and large local even days, are typically higher than the errors during regular days. 
  8. Weather condition. The forecast errors in the areas with stable weather conditions are typically smaller than the place with fast-changing weather conditions.
  9. Weather forecasts. A good weather forecast often leads to a good load forecast. 
  10. Locations of weather station(s). When the weather stations can properly represent the weather of the territory, the forecasts are typically good. 
  11. Size of territory. With the same load level, a large territory typically has bigger errors than a small territory.
  12. Hierarchical load. When the load can be further split down the hierarchy, the forecast at top level can be improved. 
  13. Error calculation. The errors (in MAPE) of hourly loads are typically higher than those of daily energy, which is higher than those of monthly and annual energy. 
  14. Demand side management. Demand response and energy efficiency programs often lead to large errors in load forecasts.
  15. Distributed resources. Increase penetration of behind-the-meter solar typically increase the load forecast errors. 
  16. Emerging technologies. EV loads add more uncertainties to the conventional loads and tend to increase the load forecast errors. 
Apparently the answer is not trivial. In most if not all of the above bullet points, the answer is not definitive. This is why we need many benchmarking studies to better understand the forecast errors. It doesn't make sense to criticize the MAPE values prior to working on the data. Ironically, this is what many vendors do as part of the sales bluff.

Back to Error Analysis in Load Forecasting.

Friday, August 18, 2017

IEEE PES Announces Winning Teams for Global Energy Forecasting Competition 2017

More than 300 students and professionals from more than 30 countries formed 177 teams to compete on hierarchical probabilistic load forecasting, exploring opportunities from the big data world and tackling the analytical challenges.

PISCATAWAY, N.J., USA, August 18, 2017 – IEEE, the world's largest professional organization advancing technology for humanity, today announced the results of the Global Energy Forecasting Competition 2017 (GEFCom2017), which was organized and supported by the IEEE Power & Energy Society (IEEE PES) and the IEEE Working Group on Energy Forecasting (WGEF).

“I congratulate the eight winning teams and all the contestants of GEFCom2017. They are pushing the boundaries of electric load forecasting,” said Dr. Tao Hong, Chair of IEEE Working Group on Energy Forecasting and General Chair of Global Energy Forecasting Competition, “GEFCom2017 is the longest and most challenging one among the series of Global Energy Forecasting Competitions. To encourage the contestants to explore opportunities in the big data world, GEFCom2017 released more data than the previous competitions combined.”

The theme is hierarchical probabilistic load forecasting, merging the challenges of both GEFCom2012 and GEFCom2014. The 6-month-long GEFCom2017 includes two phases. The qualifying match was to provide medium term probabilistic forecasts for ISO New England region in real time. It meant to attract and educate a large number of contestants with diverse background, and to prepare them for the final match. The final match asked the contestants to provide probabilistic forecasts for 161 delivery points. All of the competition data will be further released to the public for future research and benchmarking purposes.

"The Global Energy Forecasting Competitions have been extraordinarily successful in stimulating and promoting technology advancement. To continue the momentum, IEEE PES decided to fund GEFCom2017 with $20,000 for the cash prizes to the winning teams," said Patrick Ryan, Executive Director of IEEE PES, "We are so delighted to witness another fantastic competition. Look forward to seeing its positive impact to the industry for the coming years."

GEFCom2017 includes a two-track qualifying match and a single-track final match. Each track recognizes three winning teams.

Qualifying Match Defined Data Track Winners:
  • Ján Dolinský, Mária Starovská and Robert Toth (Tangent Works, Slovakia)*
  • Andrew J. Landgraf (Battelle, USA)
  • Slawek Smyl (Uber Technologies, USA) and Grace Hua (Microsoft, USA)
  • Gábor Nagy and Gergő Barta (Budapest University of Technology and Economics, Hungary), Gábor Simon (dmlab, Hungary)

Qualifying Match Open Track Winners:
  • Geert Scholma (The Netherlands)
  • Florian Ziel (Universität Duisburg-Essen, Germany)
  • Jingrui Xie (SAS Institute, Inc., USA)

Final Match Winners:
  • Isao Kanda and Juan Quintana (Japan Meteorological Corporation, Japan)
  • Ján Dolinský, Mária Starovská and Robert Toth (Tangent Works, Slovakia)
  • Gábor Nagy and Gergő Barta (Budapest University of Technology and Economics, Hungary), Gábor Simon (dmlab, Hungary)

*We found an error in the original score calculation for the qualifying match. After fixing the error, the Tangent Works team is among the winning teams of the qualifying match.

For more information about the GEFCom2017, please visit

Original announcement: 

Wednesday, August 9, 2017

Benchmarking Robustness of Load Forecasting Models under Data Integrity Attacks

GIGO - garbage in, garbage out. In forecasting, GIGO means that if the model is fed with garbage (bad) data, the forecast would be bad too. In the power industry, bad load forecasts often result in waste of energy resources, financial losses, brownouts or even blackouts.

Anomaly detection and data cleansing procedures may help alleviate some of the bad data from the input side. However, what if the bad data was created by hackers? Can the existing models "survive" or stay accurate under data attacks? This paper offers some benchmark results.

This paper sets a few "first":
  1. This is the first paper formally addressing the cybersecurity issues in the load forecasting literature. I believe that the data attacks should be of a great concern to the forecasting community. This paper sets a solid ground for future research. 
  2. This is my first journal paper co-authored with a professor in my doctoral committee, Dr. Shu-Cherng Fang. Many years ago, I picked up the topic of my dissertation from one of my consulting projects. I then invited a team of world class professors from different areas to form the committee. None of them were really into load forecasting, though I had the opportunities learning from different perspectives. 
  3. This is my first journal paper that went through one year of peer review cycle, the longest peer review I've experienced. It's definitely worth the effort. The IJF editors and reviewers certainly spent a significant amount of time reading the paper and offered so many constructive comments. I wish I could know their names and identify them in the acknowledgement section.

Jian Luo, Tao Hong and Shu-Cherng Fang, "Benchmarking robustness of load forecasting models under data integrity attacks", International Journal of Forecasting, accepted. (working paper)

Benchmarking robustness of load forecasting models under data integrity attacks

Jian Luo, Tao Hong and Shu-Cherng Fang


As the internet continues to expand its footprint, cybersecurity has become a major concern for the governments and private sectors. One of the cybersecurity issues is on data integrity attacks. In this paper, we focus on the power industry, where the forecasting processes heavily rely on the quality of data. The data integrity attacks are expected to harm the performance of forecasting systems, which greatly impact the financial bottom line of power companies and the resilience of power grids. Here we reveal how data integrity attacks can affect the accuracy of four representative load forecasting models (i.e., multiple linear regression, support vector regression, artificial neural networks, and fuzzy interaction regression). We first simulate some data integrity attacks by randomly injecting some multipliers that follow a normal or uniform distribution to the load series. Then the aforementioned four load forecasting models are applied to generate one-year ahead ex post point forecasts for comparisons of their forecast errors. The results show that the support vector regression model, trailed closely by the multiple linear regression model, is most robust, while the fuzzy interaction regression model is least robust among the four. Nevertheless, all of the four models fail to provide satisfying forecasts when the scale of data integrity attacks becomes large. This presents a serious challenge to the load forecasters and the broader forecasting community: How to generate accurate forecasts under data integrity attacks? We use the publicly-available data from Global Energy Forecasting Competition 2012 to construct the case study. At the end, we also offer an outlook of potential research topics for future studies.

Monday, August 7, 2017

Breakthrough or Too Good To Be True: Several Smoke Tests

When sharing my Four Steps to Review an Energy Forecasting Paper, I spent about a third of the blog post elaborating what "contribution" means. This post is triggered by several review comments to my recent TSG paper variable selection methods for probabilistic load forecasting. Here I would like to elaborate what "contribution" means from a different angle.

A little background first. 

In that TSG paper, we compared two variable selection schemes, HeM (Heuristic Method) that sharpens the underlying model to minimize the point forecast error, and HoM (Holistic Method) that uses the quantile score to select the underlying model. The key finding is as follows:
HoM costs much more computational power but only produces slightly better quantile scores than HeM.
Then some of the reviewers raised the red flag:
If the new method is not much better than the existing one, why shall we accept the paper?
I believe that the question is genuine. Most likely the reviewers, as well as many other load forecasters, have read many papers in the literature that have presented super powerful models or methods that led to super accurate forecasts. After being flooded with those breakthroughs, they would be hesitant to give favorable ratings to a paper that presents a somewhat disappointing conclusion. 

Now let's take one step back:
What if those breakthroughs were just illusions? 
Given the fact that most of those papers were proposing complicated algorithms tested by some proprietary datasets, it is very difficult to reproduce the work. In other words, we can hardly verify those stories. The reviewers and editors may be rejecting valuable papers that are not bluffing. This time I was lucky - most reviewers were on my side.

When my premature models were beating all the other competitors many years ago, I was truly astonished about the realworld performance of those "state-of-the-art" models. If those breakthroughs in the literature were really tangible, my experiences tells me that the industry would be pouring money to those authors to ask for the insights. It's been many years after those papers were published, how much of those published papers have been recognized by the industry? (In my IJF review, I did mentioned a few exemplary papers though.)

We have run the Global Energy Forecasting Competitions three times. How often do you see those authors or their students on the leaderboard? If their methods are truly effective but not recognized by the industry, why not test them through these public competitions? 

Okay, now you know some of those "peer-reviewed" papers may be bluffing. How to tell if they are really bluffing? Before telling you my answer, let's see how those papers are produced:
  1. To make sure that the contribution is novel, they authors must propose something new. To insure it looks challenging, the proposal must be complicated. The easiest way to create such techniques is to mix the existing ones, such as ANN+PSO+ARIMA, etc.
  2. To make sure that nobody can reproduce the results, the data used in the case study must be proprietary. Since all we need to have the paper accepted is to have it go through the reviewers and editor(s). An unpopular dataset is fine too, because the reviewers don't bother to spend the time reproducing the work.
  3. To make sure that the results can justify the breakthrough, the forecasts must be close to perfection. The proposed models must beat the existing ones to death. How to accomplish that? Since the authors have the complete knowledge of the future dataset, just fine tune the model so that it outperform the others in the forecast period. This is called "peeking the future".
In reality, it is very hard to build the models or methods that can dominate the state of the art. Certainly it doesn't come from a "hybrid" of the existing ones. Instead, the breakthroughs (or major improvement) come from using new variables that people have not yet completely understood in the past, borrowing the knowledge from other domains, leveraging new computing power, and so forth.

In the world of predictive modeling, there is that well-known theorem called "no-free lunch", which states that no one model works the best in all situations. In other words, if one beats the others in all cases across all measures, it is "too good to be true". We need the empirical studies that report what's NOT working well as much as the ones promoting the champions. 

It's time for my list of smoke tests. The more check marks a paper gets, the more I consider it too good to be true.
  1. The paper is proposing a mix (or hybrid) of many techniques.
  2. The paper is merely catching new buzzwords.
  3. The data is proprietary.
  4. The paper is not co-authored with industry people (or not sponsored by the industry). 
  5. The proposed method does not utilize new variables 
  6. The proposed method does not take knowledge of other domains.
  7. The proposed method does not leverage new computing resources.
  8. The proposed method is dominating its counterparts (another credible method) in all aspects.
I spend minimal amount of time reading those papers, because they are emperor's new clothes to me,. Hopefully this list can help the readers save some time too. On the other hand, I didn't mean to imply that the authors were intentionally faking the paper. Many of them are genuine people but making the mistakes without knowing so. Hopefully this blog post can help point to the right direction for the authors as well.