Friday, April 22, 2016

BigDEAL Students Receiving Promotions

As a professor, I find nothing better than hearing the success stories of my students. Currently I have two PhD students, Jingrui (Rain) Xie and Jon Black. Both of them are also working full time in the industry. This is the season of promotion announcements in many companies. Rain was promoted from Sr. Associate Research Statistician Developer to Research Statistician Developer, while Jon was promoted from Lead Engineer to Manager. Here I'm very pleased to feature their short biographies with the new business titles. For more details about their profiles, please check out the BigDEAL current students page.

Congratulations, Rain and Jon, for the well-deserved promotions!

Jingrui Xie
Jingrui (Rain) Xie, Research Statistician Developer, Forecasting R&D, SAS Institute Inc.
Jingrui (Rain) is pursuing her Ph.D. degree at UNC Charlotte where her research focuses on probabilistic load forecasting. Meanwhile, she also works full-time as a Research Statistician Developer at SAS Forecasting R&D. At SAS, she works on the development of SAS forecasting components and solutions, and leads the energy forecasting research. Prior to joining SAS Forecasting R&D, Rain was an analytical consultant at SAS with expertise in statistical analysis and forecasting especially on energy forecasting. She was the lead statistician developer for SAS Energy Forecasting solution and delivered consulting services to several utilities on load forecasting for their system operations, planning and energy trading.
Rain has extensive experience in energy forecasting including exploratory data analysis, selection of weather stations, outlier detection and data cleansing, hierarchical load forecasting, model evaluation and selection, forecast combination, weather normalization and probabilistic load forecasting. She also has extensive knowledge and working experience with a broad set of SAS products.

Jonathan D. Black
Jonathan D. Black, Manager of Load Forecasting, System Planning, ISO New England Inc.
Jon is currently Manager of Load Forecasting at ISO New England, where he provides technical direction for energy analytics and both short-term and long-term forecasting of load, distributed photovoltaic (PV) resources, and energy efficiency. For the past three years he has led ISO-NE’s long-term PV forecasting for the six New England states based on a variety of state policy support mechanisms, and provided technical guidance for the modeling of PV in system planning studies. Jon is directing ISO-NE’s efforts to develop enhanced short-term load forecast tools that incorporate the effects of behind-the-meter distributed PV, and has developed methods of estimating distributed PV fleet production profiles using limited historical data, as well as simulating high penetration PV scenarios to identify future net load characteristics. Jon participates in industry-leading research on forecasting and integrating large-scale renewable energy resources, and has served as a Technical Review Committee member on several multi-year Department of Energy studies. Upon joining ISO-NE in 2010, Jon assisted with the New England Wind Integration Study and the design of wind plant data requirements for centralized wind power forecasting.
Mr. Black is currently a PhD student researching advanced forecasting techniques within the Infrastructure and Environmental Systems program at the University of North Carolina at Charlotte. He received his MS degree in Mechanical Engineering from the University of Massachusetts at Amherst, where his research at the UMass Wind Energy Center explored the effects of varying weather on regional electricity demand and renewable resource availability. He is an active member of both the Institute of Electrical and Electronics Engineers (IEEE) and the Utility Variable Generation Integration Group (UVIG).

Tuesday, April 19, 2016

Improving Gas Load Forecasts with Big Data

This is my first gas load forecasting paper. We introduce the methodology, models and lessons learned from the 2015 RWE npower gas load forecasting competition, where the BigDEAL team ranked Top 3. The core idea is to leverage comprehensive weather information to improve gas load forecasting accuracy.

Jingrui Xie and Tao Hong, "Improving gas load forecasts with big data". Natural Gas & Electricity, vol. 32, no. 10, pp 25–30, 2016. doi:10.1002/gas.21905 (working paper available HERE)

Improving Gas Load Forecasts with Big Data

Jingrui Xie and Tao Hong


The recent advancement in computing, networking, and sensor technologies has brought a massive amount of data to the business world. Many industries are taking advantage of the big data along with the modern information technologies to make informed decisions, such as managing smart cities, predicting crime activities, optimizing medicine based on genetic defects, detecting financial frauds, and personalizing marketing campaigns. According to Google Trends, the public interest in big data now is 10 times higher than it was five years ago (Exhibit 1). In this article, we will discuss gas load forecasting in the big data world. The 2015 RWE npower gas load forecasting challenge will be used as the case study to introduce how to leverage comprehensive weather information for daily gas load forecasting. We will also extend the discussion by articulating several other big data approaches to forecast accuracy improvement. Finally, we will discuss a crowdsourcing, competition-based approach to generating new ideas and methodologies for gas load forecasting.

Thursday, April 14, 2016

IJF Special Section on Probabilistic Energy Forecasting: GEFCom2014 Papers and More

As of this week, 21 of the 22 papers for the IJF Special Section on Probabilistic Energy Forecasting are on ScienceDirect (link to the CFP). Many thanks to the GEFCom2014 organizers, participants, and the expert reviewers, whose time and effort warranted an exceptionally high quality collection of energy forecasting papers. Although these papers are not yet pagerized, I can't wait to compile and post this list.

Editorial and GEFCom2014 Introduction Article

Review Article

Research Articles (Non-GEFCom2014)

Research Articles (GEFCom2014)
Enjoy reading and stay tuned for the next GEFCom!

Wednesday, April 13, 2016

Announcing BFCom2016s Winners

The Spring 2016 BigDEAL Forecasting Competition (BFCom2016s) just ended last week. I received 49 registrations from 15 countries, of which 18 teams from 6 countries completed all four rounds of the competition. I want to give my special appreciation to Prof. Chongqing Kang and his teaching assistant Mr. Yi Wang. They  organized 8 teams formulated by students from Tsinghua University, an institute prize winner of GEFCom2014. Two of the Tsinghua Teams were finally ranked among the Top 6.

The topic of BFCom2016s is ex ante short term load forecasting. I provided 4 years of historical load and temperature data, asking the contestants to forecast the next three months given historical day-ahead temperature forecasts. Three months of incremental data was released in each round.

The benchmark is made by the Vanilla model, the same as the one used in GEFCom2012. This time among the top 6 teams, five were able to beat the benchmark on average ranking, while four beat the benchmark on average MAPE. The detailed rankings and MAPEs of all teams are listed HERE.

I invited each of the top 6 teams to send me a piece of guest blog to describe their methodology. Their contributions (with my minor editorial changes) are listed below, together with the Vanilla Benchmark, which ranked No. 7.

No.1: Jingrui Xie (avg. ranking: 1.25; avg. MAPE: 5.38%)
Team member: Jingrui Xie
Affiliation: University of North Carolina at Charlotte, USA
The same model selection process was used in all four rounds. The implementation was in SAS. The model selection process follows the point forecasting model selection process implemented in Xie and Hong, IJF-2016. In this competition, the forecasting problem was dissected into three sub-problems with each of them having slightly different candidate models being evaluated.
The first sub-problem was a very-short term load forecasting problem, which considered forecasting the first day of the forecast period. The model selection process started with the "Vanilla model plus the lagged load of the previous 24th hour". It then considered the recency effect, the weekend effect, the holiday effect, the two-stage model, and the combination of forecasts as introduced in Hong, 2010 and Xie and Hong, IJF-2016.
The second sub-problem was a short term load forecasting problem, which considered forecasting the second to the seventh day of the month. The model selection process was the same to that for the very-short term load forecasting problem except that the starting benchmark model is the Vanilla model.
The third sub-problem can be categorized as a middle term load forecasting problem in which the rest of the forecast period were forecasted. The model selection process also started with the Vanilla model, but it only considered the recency effect, the weekend effect, and the holiday effect.

No.2: SMHC (avg. ranking: 3.75; avg. MAPE: 5.90%)
Team members: Zejing Wang; Qi Zeng; Weiqian Cai
Affiliation: Tsinghua University, China
We tried the support vector machine (SVM) and artificial neural networks (ANN) models in the model selection stage. We found that the ANN model had a better performance than SVM. When considering the cumulative effect, we introduced the aggregated temperatures of several hours as augmented variables, while and the number of hours was also determined in the model selection process.
In the first round, we used all the provided data for training but didn't consider the influence of holidays. Then in the next three rounds, we divided the provided data into two seasons, “summer” and “winter”. We separately forecasted the load of normal days and special holidays. These so-called seasons are not the traditional ones but were roughly defined by the plot of the average load of the given four years. Then we used the data from each seasons for training to forecast the corresponding season in 2014. This ultimately achieved a higher accuracy. All the aforementioned results and algorithms were implemented by using the MATLAB and C language.

No. 3: eps (avg. ranking: 5.25; avg. MAPE: 6.08%)
Team member: Ilias Dimoulkas
Affiliation: KTH Royal Institute of Technology, Sweden
I used the Matlab’s Neural Network toolbox for the modeling. The evolution of my model during the four rounds was as follows.
1st round: I used the “Fiiting app” which is suitable for function approximation. The training vector was IN =  [Hour Temperature] and the target vector OUT = [Load]
2nd round: I used the “Time series app” which is suitable for time series and dynamical systems. I used the Nonlinear Input-Output model instead of the Nonlinear Autoregressive with External Input model because it performs better for long term forecasting. The training vector was still IN =  [Hour Temperature] and the target vector OUT = [Load]. The number of the delays I found it works better is 5 (= 5 hourly lags).
3rd round. I used the same model but I changed the training vector to IN = [Month Weekday Hour Temperature AverageDailyTemperature MaxDailyTemperature] where AverageDailyTemperature is the average temperature and MaxDailyTemperature is the maximum temperature of the day that the specific hour belongs to.
4th round: I used two similar models with different training vectors. The final output was the average of the two models. The training vectors where IN1 = [Month Weekday Hour Temperature MovingAverageTemperature24 MovingMaxTemperature24] and IN2 = [Month Weekday Hour Temperature AverageTemperaturePreAfter4Hours MovingAverageTemperature24 MovingAverageTemperature5 MovingMaxTemperature24] where MovingAverageTemperature24 is the average temperature of the last 24 hours, MovingAverageTemperature5 is the average temperature of the last 5 hours, MovingMaxTemperature24 is the maximum temperature of the last 24 hours and AverageTemperaturePreAfter4Hours is the average temperature of the hours ranging from 4 hours before till 4 hours after the specific hour.

No. 4: Fortune Teller (avg. ranking: 6.25; avg. MAPE: 6.45%)
Member: Guangzheng Xing; Zetian Zheng; Liangzhou Wang
Affiliation: Tsinghua University, China
Round 1. Variables:Hour, Weekday, T_act, TH(the highest temperature in a day), TM(the mean temperature), TL(the lowest temperature). First of all, we used the MLR, fitting the mean load by TM, TM^2, TM^3. This method didn’t work well, the MAPE could reach about 14%. Then we used neural network, the data set contains the six variables above, and the target value is the Load_MW. The result is better, but because of improper parameters, the model was kind of overfitted, and we didn’t do the cross-validation. The result was not so good.
Round 2. We changed the parameter, and used the max value/min value/ mean value of the previous 24 hours rather than those of the day. The result was much better.
Round 3. We tried to use SVM to classify the two kinds of day curve, and then used the nnet separately. But this method did not seem to be effective. Then we used the SVM to do regression, the data set is same in nnet. Using the test set, the results of SVM and nnet were similar, so we submitted the mean value of both methods’ result.
Round 4: The MAPE of both methods reach over 7% during model selection, the result of SVM was worse, so we only submitted the result of nnet.

No. 5: Keith Bishop (avg. ranking: 6.50; avg. MAPE: 6.47%)
Team member: Keith Bishop
Affiliation: University of North Carolina-Charlotte, USA; Hepta Control Systems, USA
For my forecast, I utilized SkyFoundry’s SkySpark analytics software.  SkySpark is designed for modelling complex building systems and working with the time-series data on a wide range of levels. To support my model, I extended the inherent functionality of this software to support polynomial regression.  My model itself went through several iterations.  The first of these was fairly similar to Dr. Hong’s Vanilla Model with the exception that instead of clustering by month, I clustered based on whether the date was a heating or cooling date.  The heating or cooling determination was made by fitting a third-degree polynomial curve to each, hourly clustered, load-temperature scatter plot, solving for the minimums and then calculating the change-over point by averaging these hourly values.  If the average temperature for a day was above this point, it was a cooling day and vice-versa.  As my model progressed, I incorporated monthly clustering and the recency effect discussed in Electric load forecasting with recency effect: A big data approach.  With the recency effect, I optimized the number of lag hours for each monthly cluster by creating models for each of the past 24-hours and selecting the one with the lowest error.  In the end, I was able to reduce the MAPE of the forecast against the known data from 8.51% down to 5.01%.

No. 6: DUFEGO (avg. ranking: 7.25; avg. MAPE: 6.39%)
Team members: Lei Yang; Lanjiao Gong; Yating Su
Affiliation: Dongbei University of Finance and Economics, China
During the 4-round competition,we selected MATLAB as our tool. We use multiple linear regression models (MLR), each of which has 291 variables including trend, polynominal terms,interaction terms and recency effect. We just used all past historical data without cleansing the data. Considering the forecasting task is to improve predicting accuracy rather than the goodness of fit, we seperated the data into training set and validation set. We used cross validation and out of sample test method to select variables to give our model more generalizaton ability.
In Round 1, we trained one MLR model using the entire historical data. In Round 2, we roughly grouped the historical data by season (such as January - March and April - June,) and trained four MLR models, which improved the results significantly. We also found the distinct relationship between temperature and load in different temporal dimensions.We did some work about selecting the best MLR model in different temporal dimensions and found seasonal separate better. We made a mistake in Round 3 that resulted in a very high MAPE.

No. 7: Vanilla Benchmark (avg. ranking: 7.25; avg. MAPE: 6.42%)
The model is the same as the one used in GEFCom2012. See Hong, Pinson and Fan, IJF2014 for more details. All available historical data in each round was used to estimate the model.

Finally, congratulations to these top 6 teams of BFCom2016s, and many thanks to all of you who participated and are interested in BFCom2016s!

Friday, April 8, 2016

Statistical Learning and Machine Learning

Machine learning is a field that studies how to make computers (machines) learn from the data to make predictions or help with data-driven decisions. The term "machine learning" started to become popular in 1990s, and then was somewhat surpassed, if not replaced, by "analytics". In the recent few years, following the buzzword wave of big data and deep learning,  the field of machine learning is gaining some momentum again. Since 1980s, machine learning techniques have been on most papers in the load forecasting literature.

Statistical learning, on the other hand, is not a familiar term to many people. I first got to know this term during my graduate school days, when I was reading the book The Elements of Statistical Learning. Since many techniques and methodologies introduced in this book can be applied to forecasting, I'm using this book as a reference book for my forecasting course this semester.

As of today, I still don't quite understand the difference between the two terms. It seems to me that statisticians like to use the term "statistical learning", while computer scientists and engineers are used to the term "machine learning". Maybe statistical learning has more statistical rigor, while machine learning emphasizes more on the algorithmic aspect. Another discussion about the difference between the two can be found on Wikipedia.

Early this year, I registered two online courses from Stanford University:
I thought it would be a good investment of my time for three reasons:
  • As a researcher in energy forecasting, I want to refresh my memory on machine/statistical learning;
  • As an instructor, I want to leverage the materials from other relevant courses when preparing for my forecasting course;
  • As a Graduate Program Director for a top-ranked online MS program, I want to see how Stanford University sets up its online courses.
I started both courses almost at the same time, though I was quickly addicted to the statistical learning one and gave up the other course. While both courses are taught by world-class professors and equipped with state-of-the-art online teaching technologies, the one by Trevor and Rob fits my taste much better:
  • The instruction style is very much grounded. These two statistics professors are very good at explaining the theory intuitively without involving much math. 
  • The subtitle is awesome.
  • Most of the quiz questions are testing the understanding of the concept, so I don't have to pull my computers to run R code. 
  • The guest speakers bring valuable perspectives and insights about the subject. 
  • The design of the progress bar is very simple and effective. Due to my busy schedule for other commitments, I almost quit the course several times. That progress bar helped me stay focus and on track. 
There are other nice features this course offers but I never had time to try, such as free textbook and an online discussion forum. I'm sure they are useful to the ones who can devote more time than me. In January and early February, I was watching video and answering the quiz questions late night before sleep every day. After getting about 35% overall progress, I was distracted by other commitments. Then in early April, I went through a few more lectures to pass the 50% bar. 

I will most likely take this course again, or at least watch the 15-hour video lectures. I would like to recommend this course to the energy forecasting community as well :)

Back to Load Forecasting Terminology.

Monday, March 28, 2016

Relative Humidity for Load Forecasting Models

The ultimate driver of using big data for predictive modeling and forecasting, in my opinion, is customization. Obviously, such customization can be reflected by providing special treatments to individual regions in a territory and individual hours of a day, as discussed in my recent IJF paper Electric Load Forecasting with Recency Effect: A Big Data Approach. In this paper, we are taking another big data approach to load forecasting by breaking a composite variable Heat Index. We show that the NWS' formula for Heat Index is not really designed for load forecasting.

The paper went through three rounds of reviews with 5 reviewers. Most reviewers were very good at providing helpful comments. Only one reviewer raised a few interesting but naive comments. I didn't bother to please him/her by revising my paper. Nevertheless, I would love to list them here so that other authors can use my argument to respond to similar comments.

1. "Twenty-five papers and previous works are cited in this paper, with exactly 54 references along the paper. On these 54 times, the authors are citing themselves at least 31 times" [...]"This behavior tends to give the reader a strange feeling and do not give a lot of credit to your work: it is hard to value it since you mostly compare yourself to.... yourself. You need to justify it properly before claiming these kind of affirmations."

Dear reviewer,
Unfortunately you are having a "strange" feeling. I believe that this is mostly due to the fact that you are not very much aware of the recent academic literature and field practice on load forecasting. Maybe you should go to google "load forecasting", "electric load forecasting" or "energy forecasting", and see how my work shows up among the top three entries on Google's first page. THIS POST discusses my way of citing references. The papers on my reference list are carefully picked based on relevance and quality.The quality is mainly determined by whether the work is being used by the industry or not.  So far, my readers have been very pleased with the useful references I've listed on my papers. Oh, you might be pissed off by my not citing (enough) of your papers. In order for me to cite more of your papers, you should write more high quality papers to show how your work is being valued by the industry.

In my first submission, there were 25 references, of which 9 were my own papers. We carefully considered the reviewer's comments, and increased the number of references to 34 in the final submission, of which 12 papers were mine.

2. "All the models in the paper are derivatives of the Vanilla's Tao Benchmark. It has been shown that this model can be outperformed by a significant margin by state of the art models (see GEFCOM2012 results). Is it useful to use such models (and to finally improve it by max 9%) while GEFCOM2012 results show that some models can improve its forecast by almost 40%."

Dear reviewer,
Take another look at Table I please. That 5.21% were from the Vanilla model. The MAPE of model B4 without humidity is already down to 3.79%. Our proposed model is at 3.62%. This is more than 30% improvement over the Vanilla model. Here we did not even add holiday effect to the models. On the other hand, please take a look at this paper Weather Station Selection for Electric Load Forecasting. Are you wondering why the entire paper is based on the Vanilla model? Why didn't I even add recency effect? It is because the proposed methodology can also be applied to more complicated models. To avoid verbose presentation and distraction from the main theme of a paper, we can show the results on a benchmark model.

3. "It is well known that multicolinearity is very bad in MLR models and can lead to instability of parameters and false results. It would be nice to have an idea of estimated parameters for the different variables and models, and to exhibit significance results, tests, etc."

Dear reviewer,
Please read some papers in the load forecasting literature, and see how often people are using lagged variables (both load and temperatures). Maybe my recent IJF paper on recency effect can totally piss you off. Are you wondering why those papers are using these highly correlated variables? It is because we have so many observations in load forecasting. BTW, please do not show those significant tests, such as p-value, in your papers. They are useless in load forecasting. Again, we have so many observations that the residuals are rarely normally distributed. Moreover, those p-values are from in-sample fit, which tells nothing about the predictive power of your models. Furthermore, there are hundreds of parameters being estimating in a load forecasting model, how do you plan to show the significant tests of those variables? You may want to read Scott Armstrong's paper on Illusions in Regression Analysis to re-examine your understandings in regression analysis.


Jingrui Xie, Ying Chen, Tao Hong and Thomas D. Laing, "Relative humidity for load forecasting models", IEEE Transactions on Smart Grid, in press.

The working paper is available HERE.

Relative Humidity for Load Forecasting Models

Jingrui Xie, Ying Chen, Tao Hong and Thomas D. Laing


Weather is a key driving factor of electricity demand. During the past five decades, temperature is the most commonly used weather variable in load forecasting models. Although humidity has been discussed in the load forecasting literature, it has not been studied as formally as temperature. Humidity is usually embedded in the form of Heat Index (HI) or Temperature-Humidity Index (THI). In this paper, we investigate how Relative Humidity (RH) affects electricity demand. From a real-world case study at a utility in North Carolina, we find that RH plays a vital role in driving electricity demand during the warm months (June to September). We then propose a systematic approach to including RH variables in a regression analysis framework, resulting in the recommendation of a group of RH variables. The proposed models with the recommended addition of RH variables improve the forecast accuracy of Tao’s Vanilla Benchmark Model and its three derivatives in one-day (24-hour) ahead, one-week ahead, one-month ahead and one-year ahead ex post forecasting settings, with the relative reduction in Mean Absolute Percentage Error (MAPE) ranging from 4% to 9% in this case study. It also outperforms two HI based models under the same settings.  Moreover, an extended test case also demonstrates the effectiveness of these RH variables on improving the Artificial Neural Network models.

Tuesday, March 8, 2016

How to Write Conference Papers

Every year, IEEE Power and Energy Society (PES) organizes many conferences all over the world. Here are some conferences of my interest:
  1. IEEE PES General Meeting (PESGM)
  2. North American Power Symposium (NAPS)
  3. IEEE PES Transmission & Distribution Conference and Exposition (T&D)
  4. International Conference on Probabilistic Methods Applied to Power Systems (PMAPS)
All of these conferences nowadays accept papers with the size of 5 pages. While PES posts the general guideline for preparing conference papers, I'm putting together some tips for my students and those who are writing forecasting papers.

You may upgrade a PES conference paper to a journal paper with at least 40% new content. In other words, if you want to publish your work in a journal eventually, do not send the complete version to a conference. That said, how to identify the topics for conference papers? Here are some thoughts.
  1. Apply the methodology from a recently published journal paper to a new data set, to support the methodology with another case study. 
  2. A journal paper may have not elaborated everything comprehensively. You can identify the gap and offer a supplemental view in a conference paper. My T&D2016 paper falls in this category. 
  3. There is no perfect forecasting methodology. If you find a way to marginally improve the methodology from a recently published journal paper, you may publish your improved method in a conference paper. Note that if the improvement is significant, you may consider sending the work to a journal. 
  4. If you have some preliminary results from your research, you can put them in a conference paper to get some feedback from the reviewers and conference attendees.
  5. If you have participated in a forecasting competition, you may publish your methodology in a conference paper. My NAPS2015 paper falls in this category. Note that if the competition is as notable as GEFCom2012 or GEFCom2014, and your ranking is high, you may consider sending the paper to a journal. 
  6. You may put together a short review on a very specific topic in a conference paper. My REPC2014 paper falls in this category.

Here are some additional tips for publishing the work as a conference paper:
  1. You only have 5 pages to tell a self-contained story, so do not stretch the topic too broad or too deep. 
  2. The reviewers may not look into the details of your paper beyond the title and abstract, so put a big effort on the title and abstract. 
  3. The reviewers may recommend a rejection right away if your paper has too many typos, grammatical errors and formatting issues. Please proofread your paper carefully before submission.
  4. Pay attention to the track or committee during submission. You'd better send the paper to the committees that value the topic.
  5. You may have up to one round of revision. Make sure that you address the review comments as much as possible in your revision.
Best luck with preparing your conference papers!

Monday, March 7, 2016

BigDEAL Forecasting Competition - Spring 2016

[Update]: Announcing BFCom2016s winnerslink to competition results.

I organized two in-class competitions last semester for my Energy Analytics course, one on short term load forecasting, and the other on probabilistic load forecasting. The competitions were very well received by my students and the external participants. The probabilistic forecasting competition generated a nice article for the International Journal of Forecasting.

I'm teaching another forecasting class, Technological Forecasting and Decision Making, this semester at UNC Charlotte. The course is at the same level as the Energy Analytics one. While the Energy Analytics course covers on various forecasting problems in the energy industry, this technological forecasting course covers various forecasting techniques without a specific focus on any industry. The course outline is available HERE.

I would like to open one of the exams to the external participants. I also plan to do so for my other forecasting-related courses going forward. Since I will leverage the help from BigDEAL members to run the show, I'm branding these activities as the BigDEAL Forecasting Competitions.

Here are the rules for this one:
  • The competition will start on 3/24/2016, and end on 4/6/2016.
  • The exam is individual effort. Each student will form a single-person team. No offline collaboration is allowed.
  • External participants may form multi-person teams with the team members identified during the registration process.
  • The competition topic will be on point forecasting. (At this stage, I haven't decided the exact problem to release yet.)
  • Incremental data will be released during the competition.
  • A report documenting how the models have been evolving is required to be eligible on the final leaderboard.
Interested? Register HERE by 3/22/2016. (If you don't have access to Google Form, you can email me directly to register.) 

Sunday, March 6, 2016

From High-resolution Data to High-resolution Probabilistic Load Forecasts

One of the contributions of my TSG2014 paper is to show that hourly data helps generate more accurate long term forecasts than those from daily or monthly data. While we showed the improvement in point load forecast accuracy, we did not formally compare probabilistic forecast accuracy. This conference paper completed that missing comparison. We will present the paper at the IEEE PES T&D conference this May. The working paper is available HERE.

Jingrui Xie; Tao Hong and Chongqing Kang "From high-resolution data to high-resolution probabilistic load forecasts", 2016 IEEE PES Transmission and Distribution Conference and Exposition, Dallas, TX, May 2-5, 2016

From High-resolution Data to High-resolution Probabilistic Load Forecasts

Jingrui Xie, Tao Hong and Chongqing Kang


Long term load forecasting plays a vital role in power systems planning and utility financial planning. Traditional methods in long term load forecasting rely on monthly data, which offers limited observations to support the comprehensive models with sufficient explanatory variables to capture the salient features in the electricity demand series. The grid modernization efforts undertaken by many utilities over the past decade have made high-resolution data available for many analytical tasks including load forecasting. In this paper, we investigate the effectiveness of using high-resolution data in long term probabilistic load forecasting. The primary error measure we use for forecast evaluation is pinball loss function. Through a case study based on the data from a U.S. utility, we show that high-resolution data is beneficial to the improvement of probabilistic load forecasts.

Monday, February 22, 2016

Hong Analytics New Course: Fundamentals of Utility Analytics

Five or six years ago, I had the idea of developing an analytics course for the professionals in the utility industry. I ran the idea through Jim Burke. The first response he gave me was
What's analytics?
After I explained the meaning of analytics, Jim recommended me preparing something more conventional, such as statistics and/or operations research for the utility professionals. Since then, the course outline for that utility analytics course has been quietly sitting on my hard drive.

Nowadays, despite the fact that analytics is being well-known to the utility industry (see my recent post on Analytics, Smart Grid and Big Data: Are They Like Teenage Sex?), there is still that gap between analytics education and business needs. In other words, it is really difficulty to find instructors with the knowledge base in both analytics and power systems. Although there are utility analytics conferences and meetings where speakers talk about case studies or visions at conceptual level, few people are offering courses in utility analytics to teach people how to make the techniques work in the real-world applications.

In 2013, I went back to the university to try to fill this gap in the academic environment (see Why I Left a Great Place to Work - from Industry to Academia). Understanding that many industry professionals do not have the bandwidth to go through out graduate program, I decided to develop and offer a series of short courses through Hong Analytics.

So here comes another new Hong Analytics course, to be offered for the first time through EUCI in Boston, MA, March 14-15, 2016.

Fundamentals of Utility Analytics: Techniques, Applications and Case Studies

Analytics — the scientific process of transforming data into insight for making better decisions — is now a must-have skill for almost all utility professionals: from planning, to operations, to trading, to mid-level management, to the C-suite, and everywhere in between. During the recent decade this knowledge requirement has emerged and accelerated in the utility industry. The deployment of various sensors and meters has brought a large amount of data to the industry.  Increased computing power has made quantitative analytics plausible, timely and economically feasible.  Meanwhile, the advancement of information technologies has enabled utilities to make real-time operational decisions that are fact-based and data-driven. A challenge, though, has been to compile a coherent approach for professionals having different skill sets within the utility to leveraging the multiple applications of analytics to achieve better key performance indicator.

This course provides an introduction to analytics in the context of the electric power systems and industry. The course is designed for engineers, planners, analysts and managers who are either new to the utility industry or looking to develop a better understanding of how to use analytics across the entire organization. Through a number of diverse case studies and hands-on exercises, the attendees will gain a fundamental understanding of how to apply analytics within the utility industry, the classical and emerging problems, and how to tackle those problems using the quantitative techniques in the enterprise environment.

For more information, such as course outline and registration link, please visit the course page HERE