Tuesday, December 20, 2016

Winning Methods from npower Forecasting Challenge 2016

RWE npower released the final leaderboard for its forecasting challenge 2016. I took a screen shot of the top teams. Interestingly, the international teams (colored in red) took over all of the top 6 places. Unfortunately, some of those top-notch UK load forecasters did not join the competition. I'm hoping that they can show up at the game to defend the country's legacy:)

RWE npower Forecasting Challenge 2016 Final Leaderboard (top 12 places)

In each of the previous two npower competitions, I asked my BigDEAL students to join the competition as a team. In both competitions, they were ranked top and beating all UK teams (see the blog posts HERE and HERE). We also published our winning methods for electricity demand forecasting and gas demand forecasting.

This year, instead of forming a BigDEAL team, I sent the students in my Energy Analytics class to the competition. The outcome is again very pleasing. The UNCC students took two of the top three places, and four of the top six places. What makes me, a professor, very happy is the fact that the research findings has been fully integrated into the teaching materials and smoothly transferred to the students in the class. (See my research-consulting-teaching circle HERE.)

OK, enough bragging...

I asked the top teams share their methodologies with the audience of my blog as what we did in BFCom2016s. Here they are:

1st Place: Geert Scholma

My forecast this time consisted of the following elements:
- linear regression models seperated per 30 minute period with 78 variables each
- fourth degree yearly shapes per weekday as a base shape
- an intercept, 6 weekdays and 22 holiday, bridgeday and schoolholiday variables
- daylight savings and a linear timetrend, each seperated for weekdays and weekends
- a shift at september 2014 and a night variable
- conversion of temperature to windchill
- third degree windchill polynomials for cooling and heating with different impacts
- three moving averages with different periods for temperature effects occurring at different timescales
- different radiation variables depending on time of day with up to 6 hourly and moving average radiation variables interacted with a second degree polynomial of the day of year for peak hours
- 1 hourly and 1 moving average rainfall variable
- manually exclusion of outliers and filling of any weather gaps

2nd Place: Devan Patel

Model: Multiple linear regression approach was used during the NPower forecasting competition. The basic model was Tao’s Vanilla Benchmark model. A major change was made in the form of dependent variable Energy Consumption. A Box-Cox transformation of Energy Consumption was taken based on the train data distribution. Polynomials of Humidity and Wind Speed were added into the Base model. With the help of this changes the performance of the benchmark vanilla model was improved. During testing above changes were successfully able to improve the accuracy of vanilla model by around 1.5% on the scale of MAPE.
Data: Two different approaches were used in order to train the model. During winter (Round 1 and Round 3) model was trained using whole year’s data. During summer (Round 2) only summer month’s data was used during model training. Scatter plots across different months were helpful to understand the distribution of energy consumption.
Explanatory data analysis: The missing values of the hours were replaced by previous day's hours. Scatter plots of temperature, humidity and wind speed were used to identify their relationships with energy consumption.
Error matrix: MAPE was used as a base error matrix in order to evaluate the accuracy of the forecast during model validation.
Software: RStudio was used as a main software for model building, validation and forecasting. MS Excel was used to prepare the data files which can be used in RStudio.

3rd Place: Masoud Sobhani

For the first round, the model was Tao's Vanilla model with recency effects (by adding extra lagged temperature to the original model). The model uses MLR method and the predictors are calendar variables, temperature, lagged values of temperature and cross effects between them. The model was implemented in SAS. For the second round, I tried to improve Vanilla model by adding more predictors beyond the temperature. Humidity was added to the model by using the method introduced in Xie and Hong 2016. The new model was an improved model having temperature and relative humidity as weather related predictors. Since we didn't know the location of the utility, I tried to change the new model to select the perfect model with the best results. For the third round, the model used in previous round was improved by adding some lagged values of relative humidity. In each round, the model selection was done by cross validation method. 

No comments:

Post a Comment

Note that you may link to your LinkedIn profile if you choose Name/URL option.