Friday, April 13, 2018

Which Model is the Best?

Recently Rob Hyndman blogged about the history of forecasting competitions. I have read the post three times already. I learned something new each time. I wish every reader of this blog can also read that post and learn something from the history. Nevertheless, I would like to highlight a paragraph here:
[...] This reveals a view commonly held (even today) that there is some single model that describes the data generating process, and that the job of a forecaster is to find it. This seems patently absurd to me — real data comes from much more complicated, non-linear, non-stationary processes than any model we might dream up — and George Box famously dismissed it saying “All models are wrong but some are useful”.
I have been in the field load forecasting for a little more than 10 years. During the past decade, I received many "the best" questions: Which model is the best? Which technique is the best? Which software is the best? Which variable is the best? ...

There are some variants to these questions: Do you use neural networks? Do you use population in long term load forecasting? Do you use normal weather? Do you think demand response can reduce peak demand? Do you think a MAPE value of 20% is too high? ...

While the folks who asked these questions may expect a crisp answer of the best model, technique, software, variable, or a YES/NO answer, I had to disappoint them with "depends", sometimes followed by a lengthy elaboration.
It depends on the data, the business needs, the production environment, and many other factors... 
I don't have a single model to sell to my clients as "the best model". I recommend proper methodologies to the clients after making a comprehensive evaluation of their situations.

I think the root of these "the best" questions is that commonly held view:
There is some single model that describes the data generating process. 
I disagree with this view, and don't know where the view is originally coming from. I wish someone can write another history article to explain the source.

In load forecasting, there is no universally best model. We need empirical studies, many empirical studies, to show some evidence that one method is superior in some aspect. That's why I have been promoting reproducible research and benchmarking data pool and models.

In computational complexity and optimization, there is actually a theorem, no free lunch theorem

No comments:

Post a Comment

Note that you may link to your LinkedIn profile if you choose Name/URL option.