Tuesday, November 11, 2014

Prediction Interval and Confidence Interval

This is a pair of terms very difficult to distinguish, because statisticians and economists don't follow the same standard. Since load forecasting falls under the umbrella of forecasting, I'm following the terminology developed by the forecasting community. Special thanks to Rob Hyndman, who answered many questions from me during my preparation of this post. I highly recommend you his two blog posts The difference between prediction intervals and confidence intervals and Prediction intervals too narrow.

In short, there is a simple rule that tells where to use confidence or prediction interval:
A confidence interval is associated with a parameter, while a prediction interval is associated with a prediction. 
Below I'm using three examples to illustrate how to apply these two terms in load forecasting.

1. Scenario based probabilistic load forecasting 

In my 2014 TSG paper, we created 30 weather scenarios using 30 years of weather history. Based on each weather scenario, we can get a load forecast. Totally we got 30 load forecasts. The interval of 90/10 percentiles derived out of these 30 forecasts is a prediction interval.

2. Adding error distribution(s) to point forecast(s)

Let's say we develop a regression model for point load forecasting. We take out the residuals, and model them using a normal distribution. We then add this normal distribution back to the original point forecast to come up with a probabilistic forecast. An interval can be derived using the regression estimate +/- multiple standard deviations of the normal distribution. Many papers in the literature of load forecasting and its applications called this interval confidence interval, which is a typical misuse. It should have been called prediction interval.

3. Showing the uncertainties around estimated parameters. 

In the old days, when utility analysts were working on annual or monthly data with limited length of history.  As a result, there were not many variables in the regression models. For instance, a typical model to forecast monthly energy can be:
Energy = b0 + b1CC + b2CDD + b3HDD + e
Where CC represents customer count, CDD/HDD represents cooling/heating degree days.

After estimating the parameters, the analysts often reported the confidence interval and p value of each parameter, maybe for the sake of showing off some statistical complication. Anyway, it is correct to call them confidence intervals here.

In today's world, to take advantage of the information from the large amount of data, we rarely use such a simple model. In a large model like the ones used in my 2014 TSG paper, it is unrealistic and unnecessary to show the confidence intervals for all the parameters.

In conclusion, if you use "prediction interval" all the time in load forecasting, you would be correct most of the time.

Back to Load Forecasting Terminology

No comments:

Post a Comment

Note that you may link to your LinkedIn profile if you choose Name/URL option.