Wednesday, October 15, 2014

Model, Variable, Function and Parameter

My first job was an engineer at an expert-based consulting firm. One of my first tasks was to develop models for a large Investor Owned Utility. I was quite exited when being assigned to this project - I thought it was a good opportunity to show off my modeling skills. During the project kick-off meeting, I realized that I misunderstood the scope of work. The "models" I was asked to develop are circuit models. "Modeling" was simply to draw the lines, fuses, switches and transformers on a distribution engineering software platform based on their physical specifications, which did not involve any math or statistics at all.

The predictive models in energy forecasting are different from the physical models mentioned above. A regression-based load forecasting model, for example, describes the relationship between load and the factors that drive the load. There are three components in such a model:

Variables 

Load (or some transformation of the load) is the response variable. Hour of the day, temperatures and other driving factors are explanatory variables.

Energy forecasters may come from different disciplines. Depending upon their education background, the terms may be called differently. For instance, response variable is also known as dependent variable, output variable and regressand (for regression models). The corresponding alternatives of explanatory variable are independent variable, input variable and regressor.

Another way of naming variables is based on their contents. We often use temperature variables to represent the variables made of temperatures and their augmented forms. We also use calendar variables to represent hour of the day, day of the week, month of the year and holidays.

Function

Most load forecasters know that the load is driven by calendar variables and weather variables. Sometimes I hear two forecasters sharing their models. The conversation often ends at the confirming the usage of calendar and weather variables. After that, both forecasters think they have a good model and are very pleased with the conversation. However, the quality of their models and forecasts may vary a lot. For instance, in GEFCom2012, most contestants used calendar and temperature variables, but the range of their error scores was fairly large. This means that only communicating the variables is far from fully specifying a model.

Function is the formula describing the relationship between the response variable and explanatory variables. Taking load and temperature for example, we can model their relationship using a piece wise linear function, a second order polynomial or third order polynomial. The results may be quite different.

Another meaning of function is at the algorithmic level. For instance, we can develop a "function" (algorithm) to automatically select variables to build a load forecasting model.

Parameters 

Parameters, also known as coefficients, are used to quantify the relationship between the response variable and explanatory variables. Although many people believe the variables and their functional form is enough to specify a model, I like to go one step further to parameter estimation. This is because different parameter estimation methods and their associated assumptions may affect the parameter estimation results.

Parameters is often used in algorithm design as well. When performing outlier detection, for instance, there need to be a threshold to tell whether an observation is an outlier or not. That "threshold" is called a parameter for this outlier detection algorithm.

This post is only about the model and its three components. There are many other things in the forecasting process that affect the quality of forecasts. That's why it usually takes a consultant (or consulting firm) a long time to audit the forecasting process of any utility.

No comments:

Post a Comment

Note that you may link to your LinkedIn profile if you choose Name/URL option.