Wednesday, August 9, 2017

Benchmarking Robustness of Load Forecasting Models under Data Integrity Attacks

GIGO - garbage in, garbage out. In forecasting, GIGO means that if the model is fed with garbage (bad) data, the forecast would be bad too. In the power industry, bad load forecasts often result in waste of energy resources, financial losses, brownouts or even blackouts.

Anomaly detection and data cleansing procedures may help alleviate some of the bad data from the input side. However, what if the bad data was created by hackers? Can the existing models "survive" or stay accurate under data attacks? This paper offers some benchmark results.

This paper sets a few "first":
  1. This is the first paper formally addressing the cybersecurity issues in the load forecasting literature. I believe that the data attacks should be of a great concern to the forecasting community. This paper sets a solid ground for future research. 
  2. This is my first journal paper co-authored with a professor in my doctoral committee, Dr. Shu-Cherng Fang. Many years ago, I picked up the topic of my dissertation from one of my consulting projects. I then invited a team of world class professors from different areas to form the committee. None of them were really into load forecasting, though I had the opportunities learning from different perspectives. 
  3. This is my first journal paper that went through one year of peer review cycle, the longest peer review I've experienced. It's definitely worth the effort. The IJF editors and reviewers certainly spent a significant amount of time reading the paper and offered so many constructive comments. I wish I could know their names and identify them in the acknowledgement section.

Jian Luo, Tao Hong and Shu-Cherng Fang, "Benchmarking robustness of load forecasting models under data integrity attacks", International Journal of Forecasting, accepted. (working paper)

Benchmarking robustness of load forecasting models under data integrity attacks

Jian Luo, Tao Hong and Shu-Cherng Fang


As the internet continues to expand its footprint, cybersecurity has become a major concern for the governments and private sectors. One of the cybersecurity issues is on data integrity attacks. In this paper, we focus on the power industry, where the forecasting processes heavily rely on the quality of data. The data integrity attacks are expected to harm the performance of forecasting systems, which greatly impact the financial bottom line of power companies and the resilience of power grids. Here we reveal how data integrity attacks can affect the accuracy of four representative load forecasting models (i.e., multiple linear regression, support vector regression, artificial neural networks, and fuzzy interaction regression). We first simulate some data integrity attacks by randomly injecting some multipliers that follow a normal or uniform distribution to the load series. Then the aforementioned four load forecasting models are applied to generate one-year ahead ex post point forecasts for comparisons of their forecast errors. The results show that the support vector regression model, trailed closely by the multiple linear regression model, is most robust, while the fuzzy interaction regression model is least robust among the four. Nevertheless, all of the four models fail to provide satisfying forecasts when the scale of data integrity attacks becomes large. This presents a serious challenge to the load forecasters and the broader forecasting community: How to generate accurate forecasts under data integrity attacks? We use the publicly-available data from Global Energy Forecasting Competition 2012 to construct the case study. At the end, we also offer an outlook of potential research topics for future studies.

No comments:

Post a Comment

Note that you may link to your LinkedIn profile if you choose Name/URL option.