Monday, April 22, 2019

Combining Weather Stations for Electric Load Forecasting

10 years ago, I started looking into how weather data quality issues affect load forecast accuracy. Later, I found that using data from multiple weather stations can help improve the load forecasts (see this SAS white paper). I also invented a weather station selection methodology to automatically select weather stations for a given load zone. After joining UNC Charlotte, I wrote an IJF paper with two collaborators to introduce that methodology. Nowadays many utilities are using it to select their weather stations. Because that IJF paper is reproducible, I often use it as an entrance exam for prospective students interested in joining BigDEAL.

During the past few years, I have been using that IJF paper as a homework problem in my Energy Analytics class. I have been challenging the students to improve the weather station selection methodology. Although the method is hard to beat, every year some students can turn in something better. Last year, I decided to work with the students in the class to write two papers, one on selecting weather stations, and the other on combining weather stations. Right after I made that decision, Antonio Bracale and Pasquale De Falco invited me to write a paper related to ensemble forecasting for a special issue they were editing. Weather station combination apparently fits the scope very well. Although I believed the research deserves publication with a higher tier journal, I accepted the invitation to make this paper open access, with the hope that those who are using the old methodology can upgrade to this new one with minimal effort.

The peer review process was fairly enjoyable. The paper was submitted on March 18, 2019. The first decision, which was a major revision, was sent back to us on April 1, with comments from three reviewers. Most of the review comments were constructive. None of them were as nonsense as some of the reviewers I encountered at IEEE transactions. We submitted the revision on April 8. The paper was accepted on April 12. The editorial office sent me the edited version for proofread on April 16. I was presently surprised that their copy editor did some wordsmith for us. I submitted the proofread version on April 20. The final version was published on April 21.

Citation

Masoud Sobhani, Allison Campbell, Saurabh Sangamwar, Changlin Li, and Tao Hong, "Combining weather stations for electric load forecasting," Energies, vol. 12, no. 8, pp. 1510, April 2019. (open access)

Combining Weather Stations for Electric Load Forecasting

Masoud Sobhani, Allison Campbell, Saurabh Sangamwar, Changlin Li, and Tao Hong

Abstract

Weather is a key factor affecting electricity demand. Many load forecasting models rely on weather variables. Weather stations provide point measurements of weather conditions in a service area. Since the load is spread geographically, a single weather station may not sufficiently explain the variations of the load over a vast area. Therefore, a proper combination of multiple weather stations plays a vital role in load forecasting. This paper answers the question: given a number of weather stations, how should they be combined for load forecasting? Simple averaging has been a commonly used and effective method in the literature. In this paper, we compared the performance of seven alternative methods with simple averaging as the benchmark using the data of the Global Energy Forecasting Competition 2012. The results demonstrate that some of the methods outperform the benchmark in combining weather stations. In addition, averaging the forecasts from these methods outperforms most individual methods.

Monday, April 8, 2019

Global Energy Forecasting Competition 2017: Hierarchical Probabilistic Load Forecasting

Check out the winning methodologies and data used in GEFCom2017! If you don't have access to ScienceDirect, you can use the dropbox link below to access the data.

Citation

Tao Hong, Jingrui Xie, and Jonathan Black, "Global Energy Forecasting Competition 2017: Hierarchical Probabilistic Load Forecasting," International Journal of Forecasting, in press. (ScienceDirect; Data)


Global Energy Forecasting Competition 2017: Hierarchical Probabilistic Load Forecasting

Tao Hong, Jingrui Xie, and Jonathan Black

Abstract

The Global Energy Forecasting Competition 2017 (GEFCom2017) attracted more than 300 students and professionals from over 30 countries for solving hierarchical probabilistic load forecasting problems. Of the series of global energy forecasting competitions that have been held, GEFCom2017 is the most challenging one to date: the first one to have a qualifying match, the first one to use hierarchical data with more than two levels, the first one to allow the usage of external data sources, the first one to ask for real-time ex-ante forecasts, and the longest one. This paper introduces the qualifying and final matches of GEFCom2017, summarizes the top-ranked methods, publishes the data used in the competition, and presents several reflections on the competition series and a vision for future energy forecasting competitions.

Thursday, April 4, 2019

Lagged Load Variables in Load Forecasting

This post was triggered by the email below:
I am a regular reader of your blog and website which is an inspiration to me as a forecasting analyst. I just have a very simple question for you, which I don’t understand as a practitioner. I have looked at 10-20 papers and almost every one has a lag variable in it for forecasting electricity demand. But in practice, if you are forecasting for a portfolio or a region and not the whole grid of a country, lag demand is simply not available until weeks or months later. Is this because academia is focused on the theoretical and not the practical, or is it because it focuses on the big picture, total demand and not by region/portfolio? And is there any way round this? You can always feed forecasts for D+1 as a lag into D+2 going forward, but this doesn’t give you a lag for D+0 and D+1.
This is an excellent and frequently asked question, but I don't have a simple answer. 

In practice, if you have lagged load as a variable in the model but don't have its observation for the forecasting period, you have to use the predicted value. 

Take day-ahead load forecasting for example, when forecasting hour ending 10am for tomorrow, we don't have the observation for hour ending 9am. If the model included the lagged load of the preceding hour, we have to predict the load of hour ending 9am first.  In order to make that prediction, we need the load of hour ending 8am, which has to be predicted as well. 

Let's say you are building is a multiple linear regression model, the regression models with lagged dependent variables are called dynamic regression models. 

To implement a dynamic regression model to forecast the period where the observations for the lags are not available, you will have to execute an iterative process to forecast those lags first. 

Now you may want to ask:
Are these dynamic regression models more accurate than the ones without lagged load?
Practically, it depends upon how far ahead you are forecasting and how far back the lagged variables go to. 

If you are using the load of the preceding hour in your model, you should expect some improvement for the next few hours comparing with the models without lagged load variables. The improvement diminishes as the forecast horizon stretches. Beyond 10 hours or so, you may not see any improvement. 

One way to get around this iterative process is to avoid using the load of preceding one or two hours. Instead, we can use the load of the same hour of yesterday. By doing so, you can expect some improvement for the next day or two comparing with the models without lagged load variables. Again, the improvement diminishes as the forecast horizon stretches. For the very short horizon, i.e., one or two hours ahead, the models with the load of the same hour of yesterday typically do not outperform the models with the load of the preceding hour. 

For long term load forecasting, adding lagged load variables doesn't help much but creates issues. 

One is on the interpretability of the model. Because the lagged load variables are highly correlated with the load series itself, most of the load variation is being "explained" by lagged load variables rather than the other explanatory variables such as weather and calendar variables. In other words, we can hardly answer "what if the next year is a hot year" if lagged load variables were in the model. 

Another issue is on the inflation of forecast accuracy. Many people are plugging in actual values of the lagged load when analyzing the long term load forecasting performance, which would result in a very low error. Be careful, this is not ex post forecasting! You should not assume the perfect knowledge of the dependent variable in ex post forecasting. 

To keep the answer short, this is what I have been doing: I use lagged load (see this MPCE paper) when the forecast horizon is less than two or three days, Sometimes I include residual forecasting (see the point forecasting portion of this IJF paper). I don't use lagged load for long term load forecasting. 

Hope this helps!

Tuesday, April 2, 2019

Zehan Xu - Pursuing Perfection

Yesterday (April 1, 2019), Zehan Xu defended his MS thesis Customer Attrition Modeling and Forecasting.

Zehan Xu's MS thesis defense
From left to right: Dr. Linquan Bai, Zehan Xu, Dr. Tao Hong, and Dr. Shaoyu Li


Zehan received his B.S. degree in Industrial and Systems Engineering from Virginia Tech in 2016. He joined our MSEM program in Fall 2017.

During his first semester, I gave a seminary talk about the research opportunities at BigDEAL. He approached me after that, passed the tests I gave him, and officially joined my research group in February 2018.

Knowing his solid math background, I asked him to work on forecasting customer count using survival analysis. The topic was an extension of Jingrui Xie's MS thesis and TSG paper. Since Zehan did not have much background in statistics, he had to teach himself about survival analysis. He quickly figured out that the tools working well on those textbook examples are not optimal for the real-world datasets I gave him. During the past year, he has been refining his work and finally came up with an effective methodology.

Our original plan was to have him graduate at the end of 2018, when I considered the quality of his work exceed a MS thesis level. Nevertheless, he was never satisfied until very recently.

I stopped by my office last Sunday, and saw one of my student Saurabh Sangamwar in the conference room presenting something. Since Saurabh already defended his thesis a month ago, I was a little curious. I went in and found him and another BigDEAL student Yike Li working with Zehan on Zehan's defense rehearsal.

I thought Zehan's defense preparation was done, but apparently he was pursuing that perfection.

His defense was very well done. I was impressed!

While advising him for the thesis research, I found Zehan a great candidate for doctoral research.  He also realized the need and value of advanced education, so he decided to continue pursuing his doctoral degree here at BigDEAL.

Congratulations, Zehan!

Wednesday, February 13, 2019

Saurabh Sangamwar - Nothing is Impossible

Yesterday (February 12, 2019),  Saurabh Sangamwar defended his MS thesis Grouping Calendar Variables for Electric Load Forecasting.

Saurabh Sangamwar's MS thesis defense
From left to right: Dr. Liquan Bai, Saurabh Sangamwar, Dr. Pu Wang, Dr. Tao Hong

Saurabh received his B.Eng. in Mechanical Engineering from K J Somaiya College of Engineering, Mumbai, in 2015. After working in India for two years, he joined our MSEM program in Fall 2017. 

I still remember the scene of our first conversation a year ago, when he expressed his interest in joining BigDEAL.

"learn SAS and get the SAS Base Programmer Certification." I told him the same as what I said to the other students.

"I did." Saurabh said. 

"Then go ahead and get the SAS Advance Programmer Certification." I responded. 

"I've done that too." He said. 

Apparently, he came to me so well prepared, and he was the first student I met this well prepared. 

I admitted him without a blink. 

The topic I gave him is about grouping calendar variables. It took him a while to get the preliminary results. Then I asked him to change a few parameters in his algorithms, and refresh the results. I took him another long while to get the second batch done. I saw him working hard everyday, so I was wondering why it took so long to get the results. During the conversation, I realized that his code is not fully automated. In other words, he had to do a lot of manual work to get the results. I also understood that he did have any programming background until last semester, when he was preparing for the SAS certification exams. 

I'm a professor who likes to pull the students out of their comfort zone. Knowing his weakness, I increased the programming requirements in his master thesis research, so that he can sharpen his programming skills. 

Saurabh did not disappoint me. Over the following few months, he automated his code, picked up parallel computing techniques, and even learned additional languages such as Python and R. Moreover, he is one of the few students took two tough courses from me and got a 4.0 GPA. 

To Saurabh, nothing is impossible. 

Congratulations!

Monday, February 11, 2019

Short-term Industrial Reactive Power Forecasting

Two years ago, I started collaborating with a team of Italian researchers. We had our first joint paper on short-term industrial load forecasting published at the 2017 ISGT-Europe. The complete story is HERE.

Since then, we've continued our collaboration. In this paper, we used the data from the same Italian factory. Now we focus on reactive power forecasting, a rarely touched topic in the load forecasting literature. 

Citation

Antonio Bracale, Guido Carpinelli, Pasquale De Falco, and Tao Hong, "Short-Term Industrial Reactive Power Forecasting," International Journal of Electrical Power & Energy Systems, vol.107, pp 177-185, May 2019 (ScienceDirect)

Short-term Industrial Reactive Power Forecasting

Antonio Bracale, Guido Carpinelli, Pasquale De Falco, and Tao Hong

Abstract

Reactive power forecasting is essential for managing energy systems of factories and industrial plants. However, the scientific community has devoted scant attention to industrial load forecasting, and even less to reactive power forecasting. Many challenges in developing a short-term reactive power forecasting system for factories have rarely been studied. Industrial loads may depend on many factors, such as scheduled processes and work shifts, which are uncommon or unnecessary in classical load forecasting models. Moreover, the features of reactive power are significantly different from active power, so some commonly used variables in classical load forecasting models may become meaningless for forecasting reactive power. In this paper, we develop several models to forecast industrial reactive power. These models are constructed based on two forecasting techniques (e.g., multiple linear regression and support vector regression) and two variable selection methods (e.g., cross validation and least absolute shrinkage and selection operator). In the numerical applications based on real data collected from an Italian factory at both aggregate and individual load levels, the proposed models outperform four benchmark models in short forecast horizons.

Tuesday, December 4, 2018

Leaderboard for BFCom2018 Final Match!!!

The final match of the BigDEAL Forecasting Competition 2018 was on probability daily peak hour forecasting, a very important problem in today's electricity market but new to the academic literature. Even without any monetary prize, all 16 finalists from 5 countries submitted their forecasts. (See the qualifying match leaderboad HERE.)

The figure below shows the leaderboard for BFCom2018 Qualifying Match. The green highlighted ones are in-class students. I also created a naive forecast, which is highlighted in red. 

BigDEAL Forecasting Competition 2018 Final Match Leaderboard

One of my students Zehan Xu, who was auditing the class but got disqualified in the qualifying match, also worked on the final problem and submitted his forecast on time. I included his score on the leaderboard, but marked his ranking as "BR-6", which means bragging right for ranking #6. His ranking does not affect the rankings of the other teams. 

Congratulations to all the BFCom2018 finalists for completing this competition! 

To get updates about the follow-up events, please follow my twitter and/or connect to me on LinkedIn.

Sunday, December 2, 2018

Temperature-based Models vs. Time Series Models

Last week, Spyros Makridakis asked me a question:
I have been reading your energy competition and I cannot find any clear statements about the superiority of explanatory/exogenous variables. Am I wrong? Is there a place where you state the difference in forecasting accuracy between time series and explanatory multivariate forecasting as it relates to the short as well as beyond the fist two or three days (not to mention the long term) that accurate temperature forecasting exist?
Today, Rob Hyndman asked me a similar question, which was routed originally from Spyros.

In fact, this has been quite a debatable topic in load forecasting. The answer is not straightforward. This subject could make a good master's thesis or even a doctoral dissertation. I was going to write a paper about it, but always had something more important or urgent to work on. Recently my research team has done some preliminary work along this direction. While the paper is still under preparation, let me start the discussion with this blog post, as part of the blog series on error analysis in load forecasting.

The literature is not vacant in this area. Various empirical studies have suggested different things.

Some earlier attempts were made by James Taylor. James has written many load forecasting papers. His best known work is on exponential smoothing models.

James' TPWRS2012 paper claimed that
Although weather-based modeling is common, univariate models can be useful when the lead time of interest is less than one day.
In Fig. 9 of the paper that depicted the MAPE values by lead time, the paper stated that
The exponential smoothing methods outperform the weather-based method up to about 5 hours ahead, but beyond this the weather-based method was better. 
Based on this paper, can we conclude that exponential smoothing models are more accurate than the weather-based methods for very short term ex ante load forecasting?

No.

This is my interpretation of the paper:
A world-class expert in exponential smoothing carefully developed several exponential smoothing models. These models generated more accurate forecasts than a U.K. power company's forecasts. 
The "weather-based method" used in that paper was devised by the transmission company in Great Britain using regression models. The paper briefly mentioned how the "weather-based method" worked, but the information was not enough for me to judge how accurate these weather-based models are. I don't know if this U.K. transmission company is using state-of-the-art models.

Some evidence came from recent load forecasting competitions, such as Global Energy Forecasting Competitions, npower forecasting challenges, and BigDEAL Forecasting Competition 2018.

In short, time series models, such as exponential smoothing and ARIMA models, never showed up as a major component of a winning entry in these competitions. On the other hand, regression models with temperature variables are always among the winning models.

In fact, ARIMA showed up in a winning method in GEFCom2014, where my former student Jingrui Xie used four techniques (UCM, ESM, ANN, and ARIMA) to model the residuals of a regression model (see our IJF paper).

Based on these competition results, can we conclude that time series models are not as accurate as regression models?

No.

In GEFCom2012, we let the contestants predict a few missing periods in the history without restricting the contestants to using only the data prior to each missing period. In my GEFCom2012 paper, I briefly mentioned that
This setup may mean that regression or some other data mining techniques have an advantage over some time series forecasting techniques such as ARIMA, which may be part of the reason why we did not receive any reports using the Box–Jenkins approach in the hierarchical load forecasting track.
In GEFCom2012, npower forecasting challenges, and the qualifying match of BFCom2018, actual temperature values were provided for the forecast period. In other words, these competitions were on ex post forecasting. Again, the temperature-based models have an advantage since perfect information of temperature is given for the forecast period.

GEFCom2014 and GEFCom2017 were on ex ante probabilistic forecasting. The temperature-based models dominated the leaderboards. This would be a fair evidence favoring temperature-based models.

For benchmarking purpose, I included two seasonal naive models in my recency effect paper per the request of an anonymous reviewer. Both performed very poorly compared with the other temperature-based models. I commented in the paper:
Seasonal naïve models are used commonly for benchmarking purposes in other industries, such as the retail and manufacturing industries. In load forecasting, the two applications in which seasonal naïve models are most useful are: (1) benchmarking the forecast accuracy for very unpredictable loads, such as household level loads; and (2) comparisons with univariate models. In most other applications, however, the seasonal naïve models and other similar naïve models are not very meaningful, due to the lack of accuracy. 
Here is a quick summary based on the evidence so far:

  • For ex post point load forecasting, evidence favors temperature-based models.
  • For ex ante point load forecasting, no solid evidence favoring either method. 
  • For ex ante probabilistic load forecasting, evidence favors temperature-based models.

I'm not a fan of comparing techniques. In my opinion, it's very difficult to make fair comparisons among techniques. If I were good at ANN but bad at regression, I could build super accurate ANN models than regression models. Using exactly the same technique, two forecasters may build different models with distinct accuracy levels. My fuzzy regression paper offers such an example. In other words, the goodness of a model is largely depending upon the competency of the forecaster. The best way to compare techniques is through forecasting competitions. 

In practice, weather variables is must-have in most load forecasting situations. I'll elaborate this in another blog post. 

Tuesday, November 27, 2018

Winning Methods from BFCom2018 Qualifying Match

I invited the BFCom2018 finalists to share their methods used at the qualifying match. Here are the ones I've received so far.

#1. Geert Scholma

Team member: Geert Scholma

Software: Excel, R (dplyr, lubridate, ggplot2, plotly, tidyr, dygraphs, xts, nnls)

Core technique: Multiple Linear Regression.

The model includes the usual variables with some special recipe: 5 weekdays; federal holidays; strong bridge days (mo before / fr after); weak bridge days (others); 4th degree polynomials for exponentially weighted moving average temperatures on 3 timescales (roughly 1 day, 1 week, 1 month) with optimized decaying factors; 4th degree polynomial time trend for long term gradual changes, changing in a constant value after the last training date; 8th degree polynomial year day for yearly shape, with weekend interaction.

Core methodology: No data cleaning. 1 weighted weather station, based on the non negative linear regression coefficients of a second model step that combined the predictions of all the single weather station driven models of a first step.

Key reference: (Hong, Wang, & White, 2015).


#2. Redwood Coast Energy Authority

Team member: Allison Campbell, Redwood Coast Energy Authority and UNCC

Software: Python (SKLearn package LinearRegression, and the genetic algorithm package DEAP)

Core technique: Multiple Linear Regression.

I adapted the DEAP One Max Problem to optimize selection of weather stations. The bulk of my model is built from Tao's vanilla benchmark, with the inclusion of lagged temperature, weighted moving average of the last day's temperature, transformation of holidays to weekend/days, and exponentially weighted least squares.  Before the regression, I log transformed the load.  I also created 18 "sister" forecasts by redefining the number of months in a year to be 6 to 24.  This model was informed by Tao's doctoral thesis, Hong, Wang, White 2015 (Weather Stn Selection), Wang, Liu, Hong 2016 (Recency Big Data), Nowotarski, Liu, Weron, Hong 2016 (Combining Sisters), Xie, Hong 2018 (24 Solar Terms), and Arlot, Celisse 2009 (CV for model selection).


#5. Masoud_BigDEAL

Team member: Masoud Sobhani, UNCC

Software: SAS (proc GLM)

Core technique: Multiple Linear Regression

I work with Dr. Hong in BigDEAL lab and I am the TA of “Energy Analytics” course this semester. For the first few assignments of this class, we gave the same dataset to the student to make them improve the accuracy of their forecast after they learned different forecasting skills. Like previous classes, Dr. Hong asks me to prepare a benchmark forecast for the class. I built a model during the first lecture and we kept it as the benchmark for all assignments. Later, Dr. Hong decided to make a competition using the same dataset for the qualifying exam. My initial benchmark model was still in the leader board and fortunately qualified to the next round.

In this model, I did not do any data cleansing and I used the raw data for the forecasting. The core technique that I used was based on Vanilla Benchmark Model with recency (Wang, Liu, & Hong, 2016) and holiday effects (Hong, 2010). This model uses third order polynomials of temperature and calendar variables and interactions between them. I removed the Trend variable and used 14 lagged temperatures. For the weather station selection, I employed the exact method proposed in (Hong, Wang, & White, 2015).


#7. SaurabhSangamwar_BigDEAL

Team Member: Saurabh Sangamwar, UNCC

Software: SAS (proc GLM)

Core technique: Multiple Linear Regression

Methodology:
  • Weather station selection using proposed approach mentioned in (Hong, Wang, & White, 2015)
  • Used 24 solar terms to classify the data as proposed in (Xie & Hong, 2018)
  • Added recency effect to Tao’s Vanilla Benchmark model as proposed in (Wang, Liu, & Hong, 2016)
  • Used holiday effect (considering holiday as Sunday and day after holiday as Monday), weekend
  • effect, trend variable (Increasing serial number), maximum and minimum temperature of day and its interaction with month, solar terms and hour is considered. While forecasting using solar terms solar month 5 and 4 are grouped together.
  • Used 2 years of training period to train the model i.e.,year 2006 and 2007 to train and 2008 load data was forecasted.
  • Used 3- fold cross validation and stepwise variable selection method to select the parameter, number of lagged effects.
  • As there was different lagged effect for each year. Also, solar terms were good instead of using Gregorian calendars months as class variable and for some cases vice a versa. So, generated the point forecast from 11,12,13 and 14 lagged effect for solar terms and Gregorian calendar. So total 8-point forecasts were generated and finally submitted the average of 8 forecasts.

#10. YikeLi_BigDEAL

Team member: Yike Li, Accenture and UNCC 

Software: SAS (proc GLM)

Core techniques: Multiple Linear Regression

Core methodology:
  • Weather station selection: A modified version of (Hong, Wang, & White, 2015) by evaluating all possible combinations of top selected weather stations. Selecting the virual station based on three-fold cross validation.
  • Recency effect:  Performed a 2-dimensional forward stepwise analysis. Assumption is the MAPE results of each d-h combinations on the validation period (d=0~6, h=0~24) form a convex hull; Starting from d=0 and gradually adding the h terms to Tao’s vanilla model, until adding more temperature lags to the existing model won’t yield better MAPE; Keep the selected h value and gradually add d terms to the existing model, until adding more past daily average to the existing model won’t yield better MAPE. 

#13. 4C

Team members:
  • Ilias Dimoulkas, KTH Royal Institute of Technology, Stockholm, Sweden
  • Peyman Mazidi, Loyola Andalucia University, Seville, Spain
  • Lars Herre, KTH Royal Institute of Technology, Stockholm, Sweden
  • Nicholas-Gregory Baltas, Loyola Andalucia University, Seville, Spain
Software: Matlab / Matlab Neural Network Toolbox

Technique: Feed-forward Neural Networks

Methodology:
  • Data cleansing. Missing values at the spring daylight saving hours were filled with the average of the previous and the following hours. Double values at the fall daylight saving hours were replaced by their average value. No other data cleansing or outlier detection was done.
  • Weather station selection. The technique described in (Hong, Wang, & White, 2015) was used with the difference that neural networks were used to make the forecasts instead of multiple linear regression. 
  • Feature selection. Forward sequential feature selection was used. The initial pool of variables consisted of time variables (year, month, hour, etc.), temperature related variables (temperature, power, lags, simple moving average) and cross effects between the temperature and the time variables. The pool contained 172 variables in total. The evaluation was also based on neural networks forecasts. The final feature set consisted of 31 variables.
  • Forecast. 10 neural networks were trained on the whole data set (years 2005-2007). The forecast for year 2008 was the mean forecast of the 10 neural networks.

#13. AdG

Team member: Andrés M. Alonso, Universidad Carlos III de Madrid, Spain.

Software: Matlab (Statistics and Machine Learning toolbox)

Technique: support vector regression

In this project, I use SVM regressions to predict hourly loads using explanatory variables such as temperatures, day of the week, month, federal holidays, and a linear trend. As in Hong et al (2015), I made a selection of meteorological stations taking the loads of 2007 as a trial period. I selected the five meteorological stations with the best results from MAPE. In the final model, the five temperature measures were considered instead of using an aggregate measure. The local or focused approach consists in selecting days in the training sample that have a temperature behavior similar to the day to be predicted. In that way, the regression is estimated / trained using only similar days. That is, for 2007 (2008), I performed 365 (366) SVM regressions but trained in different samples. For 2007, the focused approach improves the overall approach that uses all data from the training set. 


References used by the finalists:
  • Hong, T. (2010), “Short Term Electric Load Forecasting,” Ph.D. Dissertation, Graduate Program of Operation Research and Dept. of Electrical and Computer Engineering, North Carolina State University.
  • Wang, P., Liu, B. and Hong, T. (2016) "Electric load forecasting with recency effect: a big data approach, "International Journal of Forecasting, vol.32, no.3, pp 585-597.
  • Hong, T., Wang, P. and White, L. (2015) "Weather station selection for electric load forecasting, "International Journal of Forecasting, vol.31, no.2, pp 286-295.
  • Tashman, L. J. (2000). Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting, 16(4), 437-450.
  • Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics surveys,4, 40-79.
  • Xie, J. and Hong, T. (2018) "Load forecasting using 24 solar terms," Journal of Modern Power Systems and Clean Energy, vol.6, no.2, pp 208-214
  • Nowotarski, J., Liu, B., Weron, R. and Hong, T. (2016) "Improving short term load forecast accuracy via combining sister forecasts," Energy, vol.98, pp 40-49

BTW, I also created a new label "winning methods" so that audience of this blog can easily find the winning methods of previous competitions. 

Tuesday, November 6, 2018

Leaderboard for BFCom2018 Qualifying Match!!!

The forecast submission due date for Qualifying Match of BigDEAL Forecasting Competition 2018 was Nov 4, 2018. Out of 81 teams who registered the competition, 39 teams successfully submitted their forecasts by the due date. 10 teams will be advanced to the final match together with my Energy Analytics class of 2018 including 5 master and PhD students plus the teaching assistant Masoud Sobhani.

Two methods are used to calculate the MAPE of the forecasts. The first one is the direct calculation of Mean Absolute Percentage Error (MAPE) based on the raw forecast submitted by each team, which was originally announced The other is based on bias-adjusted forecast, which is calculated by dividing the hourly load forecast by the coincident monthly energy, and then multiplying it by the actual monthly energy of that month. For each measure, the MAPE of the last ranked in-class student is used as the qualifying bar. A team outperforming either bar can be advanced to the final match.

The figure below shows the leaderboard for BFCom2018 Qualifying Match. The green highlighted ones are in-class students, while the qualifying bar for each measure is in bold. The teams above the red line are the finalists. The "Ranking (BOTH)" column lists the rankings based on the sum of two rankings from both measures.

BigDEAL Forecasting Competition 2018 - Qualifying Match Leaderboard

Congratulations to the BFCom2018 finalists! A tougher problem is waiting for them in the final match :)

p.s., I will organize a series of follow-up events for the winners to present their methodologies. For more information about this qualifying match, please keep an eye on the FAQ page