Tuesday, October 21, 2014

Training, Validation and Test

When developing models for forecasting or data mining (I will write a post about these two terms), we usually slice the data into three pieces, training, validation and test:
  • Training data is used to estimate the parameters. 
  • Validation data is used to select models. 
  • Test data is used to confirm the model performance. 
Here let me use two representative techniques, regression analysis and Artificial Neural Networks (ANN) to illustrate how the process works.

In regression analysis, the parameters can be estimated by applying the ordinary least square method to the training data, which leads to a closed-form solution. The regression model is then used to calculate the errors on the validation data. After trying several regression models, the one with the lowest validation error is selected as the final model. We then apply this final regression model to the test data to report the forecasting accuracy.

In ANN modeling, the parameters (weights and biases) are estimated iteratively. with the objective to enhance the goodness of fit on the training data. Without any stopping criteria, such a training process may go on and on until the ANN perfectly fits the training data (assuming the ANN is large enough). Validation data is used to tell the algorithm when to stop updating the parameters. After each aforementioned update, the ANN is used to predict the validation data. The training stops when the prediction error starts increasing. The parameters corresponding to the lowest prediction error on the validation data are locked as the final ones for the ANN structure being tried. Note that there are several factors that may affect the structure of an ANN model, such as the input variables, number of hidden neurons and hidden layers, interconnections, activation functions and so forth. We can try multiple ANN structures using training and validation data. Finally, we pick the one with the lowest validation error and report its performance on the test data.

There is, however, one way to cheat in this training-validation-testing process. After looking at the performance on the test data, some authors may find that the results are not satisfying. They will then alter the models again and again so that the testing error is being reduced. This is also called "peeking the future" in forecasting. In other words, although the test data is not used for parameter estimation, it is being used for model selection. This is a popular and false practice especially in applying ANN to load forecasting. As a result, many ANN based models in the literature present very high accuracy but fail miserably in practice. The accuracy obtained this way is not ex post nor ex ante forecasting accuracy (see Forecasting and Backcasting), because the actual values of dependent variable are being used in model selection.

If you are interested in more details, you may refer to Hyndsight for an in depth coverage of cross-validation in the context of forecasting.

Back to Load Forecasting Terminology.

No comments:

Post a Comment

Note that you may link to your LinkedIn profile if you choose Name/URL option.