Wednesday, November 5, 2014

Quantile, Quartile and Percentile

Suppose we have a set of data sorted in ascending order,  by dividing the data into q equal-sized pieces, we can get q-quantiles. The quantiles are the values marking the boundaries between two adjacent subsets.

Here is an example: Using 15 years of historical weather data to forecast next year's annual peak, we can create 15 scenarios, which lead to 15 forecasted annual peaks for the next year. Sorting these 15 annual peak forecasts in ascending order, we get {810, 812, 813, 815, 815, 818, 820, 824, 827, 829, 832, 836, 839, 844, 848}. 

Let's try to figure out the 4-quantiles.
  • The rank of the first 4-quantile is 15 x (1/4) = 3.75, which rounds up to 4. The 4th smallest number is 815. 
  • The rank of the second 4-quantile is 15 x (2/4) = 7.5, which rounds up to 8. The 8th smallest number is 824. 
  • The rank of the third 4-quantile is 15 x (3/4) = 11.25, which rounds up to 12. The 12th smallest number is 836.
  • The fourth 4-quantile is the largest number, which is 848.
So the 4-quantiles are {815, 824, 836, 848}.

If we have 10 years of weather history instead of 15 years, we can get 10 annual peak forecasts in ascending order {810, 813, 815, 818, 824, 827, 829, 832, 839, 844}.
  • The rank of the first 4-quantile is 10 x (1/4) = 2.5, which rounds up to 3. The 3rd smallest number is 815. 
  • The rank of the second 4-quantile is 10 x (2/4) = 5, which is an integer. Here we take the average of the 5th and 6th smallest numbers are (824 + 827)/2 = 825.5. 
  • The rank of the third 4-quantile is 10 x (3/4) = 7.5, which rounds up to 8. The 8th smallest number is 832. 
  • The fourth 4-quantile is the largest number, which is 844. 
So the 4-quantiles are {815, 824, 832, 844}.

The 4-quantiles are called quartiles. The second 4-quantile (or the second quartile, Q2) is median. The first (Q1) and third (Q3) quartiles are also called lower and upper quartiles respectively. The difference between upper and lower quartiles is interquartile range (IQR = Q3 - Q1).

The 100-quantiles are called percentiles. The commonly used percentiles in load forecasting are 50th, 90th, 95th and 99th percentiles. The 50th percentile is median. In Global Energy Forecasting Competition 2014, we asked the participants to provide the probabilistic forecasts of 99 percentiles, assuming the 100th percentile is infinity.

Back to Load Forecasting Terminology

No comments:

Post a Comment

Note that you may link to your LinkedIn profile if you choose Name/URL option.