Tuesday, November 6, 2018

Leaderboard for BFCom2018 Qualifying Match!!!

The forecast submission due date for Qualifying Match of BigDEAL Forecasting Competition 2018 was Nov 4, 2018. Out of 81 teams who registered the competition, 39 teams successfully submitted their forecasts by the due date. 10 teams will be advanced to the final match together with my Energy Analytics class of 2018 including 5 master and PhD students plus the teaching assistant Masoud Sobhani.

Two methods are used to calculate the MAPE of the forecasts. The first one is the direct calculation of Mean Absolute Percentage Error (MAPE) based on the raw forecast submitted by each team, which was originally announced The other is based on bias-adjusted forecast, which is calculated by dividing the hourly load forecast by the coincident monthly energy, and then multiplying it by the actual monthly energy of that month. For each measure, the MAPE of the last ranked in-class student is used as the qualifying bar. A team outperforming either bar can be advanced to the final match.

The figure below shows the leaderboard for BFCom2018 Qualifying Match. The green highlighted ones are in-class students, while the qualifying bar for each measure is in bold. The teams above the red line are the finalists. The "Ranking (BOTH)" column lists the rankings based on the sum of two rankings from both measures.

BigDEAL Forecasting Competition 2018 - Qualifying Match Leaderboard

Congratulations to the BFCom2018 finalists! A tougher problem is waiting for them in the final match :)

p.s., I will organize a series of follow-up events for the winners to present their methodologies. For more information about this qualifying match, please keep an eye on the FAQ page

Saturday, October 27, 2018

Seven Lessons Learned from Two Plagiarism Cases

Recently, I was presented two plagiarism cases in one month, which triggered this blog post.

Case #1

Research group A used an idea proposed by research group B a few years ago to publish two papers in a flagship journal. 

In the first paper, A applied B's method to an application where B applied the same method, but A did not cite B's original paper at all. In the second paper, A applied B's method to a different application. A cited the original paper where B proposed that method, but did not clearly specify that the method was first proposed by B. Instead, A only cited how B commented on the previous work, which was the motivation of B's original idea. In other words, by reading A's second paper, a reader would consider that the method was originally proposed by A

Since the two papers were published almost at the same time. It's clear that A was aware of B's work, but did not give proper credit to the original paper. The improper citation misled the editors and reviewers. 

I have no intention to defend the editors and reviewers who handled those two papers in the peer review process. Their irresponsibility was part of the problem! In fact, both papers are good papers if the authors had properly cited the literature. At least the second one deserved to be published by that flagship journal.

My recommendation to A was to retract both papers, and to apologize to B

Case #2

Last year, I worked on a proposal with a few collaborators, including the Lead PI (A), PI (B), myself, and a few others. In our proposal, we used B's idea, which he published in a paper for a different application. The proposal was rejected by the funding agency. A decided to complete the research and publish a paper with us, so she sent the proposal to her student (C) to continue the work.

A few weeks ago, A sent the manuscript to us coauthors, and told me that C completed the research, and the results looked very promising. I quickly glanced through the manuscript and found the list of references very disorganized, so I asked A to work with C to re-do the literature review. In addition, I found the results a bit fishy. The proposed method dominated its counterparts with a landslide win. I thought it's too good to be true (see THIS POST about my smoke tests). I asked A to revise the paper and check the results.

After some investigation, A told me that C manipulated the computational results to make the benchmark models look bad. She has asked C to present the full picture.

Last week, I received the revised manuscript. I briefly read through the manuscript, but still didn't like how references are being cited. In addition, I felt some sentences read familiar. I asked A to work with C to further enhance the reference list, and to validate that this manuscript did not copy sentences from other papers.

Yesterday, A told me that she found the method proposed in this draft identical to the method in B's paper. She was pissed off, because C told her that the idea was original. Moreover, the manuscript never specified that the same method was used in a different application. Again, the improper citation misled A. With such frustration, A wanted to kill the manuscript.

My recommendation to A was to properly cite B's original paper, and submit the manuscript to a first-tier journal.

Lessons learned

  1. Plagiarism is defined as "the practice of taking someone else's work or ideas and passing them off as one's own." 
  2. Always give proper credit to the prior research by citing the papers in the right places!
  3. Every co-author should understand and be able to defend every piece of the paper. 
  4. Every reviewer and editor in the peer review process should carefully review the assigned paper.
  5. Do damage control ASAP. Don't wait!
  6. Don't over-react to plagiarism. 
  7. Make preventive actions to avoid plagiarism in the future. 
In the next blog post, I will further explain what "novelty" really means in the academic literature.

Tuesday, October 23, 2018

Shreyashi Shukla - Determined to Excel

Today (October 23, 2018), Shreyashi Shukla defended her MS thesis Daily Load Forecasting Using Hourly Temperatures.

Shreyashi Shukla's MS thesis defense
From left to right: Dr. Tao Hong, Shreyashi Shukla, Dr. Simon Hsiang, Dr. Churlzu Lim

Shreyashi received her B.Tech. with Honors in Production Engineering & Management from National Institute of Technology, Jamshedpur, in 2006. Before moving to the U.S. with her family, she had a 10-year progressive career in the energy sector in India. She joined our MSEM program in Fall 2017.

Every year I give a department seminar to share with the students about the research projects at BigDEAL. The purpose of these seminars is two-fold. On one hand, these seminars can broaden the students' view about systems engineering and engineering management. On the other hand, I would like to attract the most self-motivated and talented students from the program.

While most students were scared away after seeing how productive the BigDEAL students are, Shreyashi was one of the fearless students who contacted me after the seminar. During our first conversation in October 2017, I explained to her my expectation, and told her about the BigDEAL entrance tests. She took the challenge, passed the tests, and officially joined BigDEAL in Janurary 2018 to conduct her MS thesis research under my supervision.

The research problems BigDEAL students work on are never easy. In addition to tackling the research challenge, Shreyashi had the family duties too. Everyday she spends the morning on campus working on her coursework and research, and the rest of the day with her little daughter at home. She always comes to the lab on time, leaves on time, and works very efficiently. 9 months later, her research turned into a solid MS thesis, which made her the third "mom" student completing MS thesis research at BigDEAL (after Jingrui Xie and Ying Chen). She is also my first Indian student. Next semester, Shreyashi will continue working with me towards her PhD degree.

Congratulations, Shreyashi!

Monday, October 22, 2018

FAQ for BFCom2018 Qualifying Match

After the two-week registration period, we officially kicked off the BigDEAL Forecasting Competition 2018 with 81 teams formed by 142 data scientists across 26 countries. This morning, I sent out the data and instructions to the contestants. If you are a registered contestant but have not yet receive the data and instructions, please contact me directly.

BFCom2018 attracted 142 data scientists from 26 countries.

This blog post lists the frequently asked questions for BEFCom2018. I'll be updating this post as the questions come along, so please stay tuned. 

Q: Which error measure are you going to use to rank the teams?
A: MAPE, mean absolute percentage error.

Q: Why are there 23 hours in Mar 9, 2008 and 25 hours in Nov 2, 2008?
A: They were observed daylight savings time. Similar observations were in the historical years. See THIS BLOG POST for more information. In the original submission template, the hours in Nov 2, 2008 were from 1 to 25. A new submission template was sent to the contestant on Oct 24, 2018, which had the 2nd hour of Nov 2, 2008 repeated twice, to match the temperature of 2008.

Q: There are 28 weather stations, but only one load series. Which weather stations shall I use?
A: That's part of the challenge. Read this weather station selection paper for more information. 

Q: I'm new to load forecasting. Where shall I get started?
A: This qualifying problem is very similar to the load forecasting track of GEFCom2012. Reading the papers from those winning teams should help.

Q: We are going to use multiple methods. Can we submit multiple forecasts?
A: No. You should only submit one forecast for grading. If you have multiple forecasts, you may consider combining them. This paper may give you some idea about forecast combination.

Q: The local economy information, which was not given in the data, may have some significant effects to the forecasting period. Would you provide the local economy information? (For details, see Geert Scholma's comment under the original BFCom2018 announcement.)
A: No. We will add an error measure that calculate MAPE on bias-adjusted load forecast. We will adjust the hourly forecast based on the coincidence monthly energy, so that your forecasted energy of each month equal to the actual monthly energy. Beating the last-ranked in-class student on either measure can secure the ticket to the final match.

Q: I did not pass the qualifying match bar, but I'm very interested in learning from the winners about their methodologies. Would you summarize their methods?
A: I will organize a series of webinars for the finalists to talk about their methods, though the webinars are not recorded. I will also invite the finalists to summarize their winning methods to post on the blog.

Q: I'm a PhD student just starting my research in energy forecasting. I've learned a lot from this competition. Will you organize this again?
A: Yes. This is not the first BigDEAL Forecasting Competition. It will not be the last either. You can follow my twitter, subscribe to this blog, and/or connect to me on LinkedIn to get updates about events like this.

(To be continued...)

Tuesday, October 16, 2018

Robust Regression Models for Load Forecasting

One of my doctoral majors is operations research, for which I took many courses in graduate school to build my knowledge in optimization. The topic of my dissertation was on load forecasting. Only two chapters were related to optimization, one on Artificial Neural Networks, and the other on Fuzzy Regression (or Possibilistic Linear Regression).

In fact, the fuzzy regression chapter was the only one that seriously required some optimization skills, which was published as an FODM paper three years after my graduation. To build a fuzzy regression model, I had to formulate the parameter estimation process as a linear program, and solve it in CPLEX. At that time Gurobi was not even able to provide a feasible solution for my fuzzy regression model with 200+ parameters.

After that, I continued my profession in forecasting. I knew my optimization background is helpful to forecasting, but I didn't really expect to apply many optimization skills in forecasting.

About a year ago, we performed a benchmark study to show that four representative load forecasting models would fail miserably with bad input data. That study was published as an IJF paper early this year. At the end of that IJF paper, we mentioned a future research direction of designing more robust load forecasting models.

In this paper, we propose three robust regression models for load forecasting. While all of them are more robust than the ones compared in the IJF paper, the L1 regression model outperform the others. In fact L1 regression is not really new to load forecasting. It has been used for forecast combination, where some people call it Least Absolute Deviation (LAD) regression. Its "general" form, quantile regression, is heavily used in probabilistic load forecasting.
What's new about the L1 regression model in this paper?
We built an L1 regression model with hundreds of parameters. In fact it shares the same variable combination as the Vanilla model used in Global Energy Forecasting Competitions. Building such a model is nontrivial. We didn't find an off-the-shelf package to do what we need, so we formulated it as a linear program and solved it using MATLAB's linprog.
Among hundreds of techniques that are applicable to load forecasting, how did I find L1 regression?
The idea didn't come from nowhere. When I was working on my doctoral dissertation at FANGroup (Fuzzy And Neural Group), a few other students were working on another project sponsored by U.S. Army Research Office. They were investigating some features and applications of l1 norm. Although I was thinking about applying l1 norm to load forecasting, I didn't find a good use case at that time.

Well, it's better late than never. The skills I acquired 10 years ago came handy for this paper.


Jian Luo, Tao Hong, and Shu-Cherng Fang, "Robust regression models for load forecasting," submitted to IEEE Transactions on Smart Grid, in press.

Robust Regression Models for Load Forecasting

Jian Luo, Tao Hong, and Shu-Cherng Fang


Electric load forecasting has been extensively studied during the past century. While many models and their variants have been proposed and tested in the load forecasting literature, most of the existing case studies have been conducted using the data collected under normal operating conditions. A recent case study shows that four representative load forecasting models easily fail under data integrity attacks. To address this challenge, we propose three robust load forecasting models including two variants of the iteratively re-weighted least squares regression models and an L1 regression model. Numerical experiments indicate the dominating performance of the three proposed robust regression models, especially L1 regression, compared to other representative load forecasting models. 

Monday, October 8, 2018

BigDEAL Forecasting Competition 2018

[Update Oct 22, 2018] The registration is closed. 142 data scientists from 26 countries have formed 81 teams to join BFCom2018. See the news article from UNCC College of Engineering. An FAQ page is set up to address questions for the qualifying match.

This semester I'm teaching Energy Analytics for the fifth time. The course has earned its reputation on the UNC Charlotte campus and even around the utility industry, for its toughness, high withdraw rate, and challenging nature. Here are some comments from the students in 2015 and 2017. Nowadays, not many students even dare to register the course. 

After the first midterm exam last week, I have five students left in the class. These five "survivors" (out of more than a dozen students at the beginning of the semester) have completed two assignments and one exam. I am impressed by their submissions every time. I must confess that this is by far the most academically strong class I've ever had for this course, even stronger than the group that won several award plaques in GEFCom2014

Previously, I sent students of this course to the competitions, such as GEFCom2014 and NPower Forecasting Challenge, where they can solve some conventional energy forecasting problems while competing with others around the globe. 

This year, thanks to the outstanding performance of these students, I was spending a lot of time trying to figure out a challenge for them. Finally, I decided to give them a new load forecasting problem to solve. 

I'll keep the problem secret for now, but I can tell that a practical solution to this problem can save power companies a lot of money. To those who are interested in writing academic papers, a winning solution to this problem should greatly increase the likelihood of having the manuscript accepted by the top venues for energy forecasting papers, such as International Journal of Forecasting (IJF) and IEEE Transactions on Smart Grid (TSG). 

The competition is by invitation only. The ones who are interested in joining this competition should first pass the qualifying match. I will use the first homework problem of Energy Analytics for the qualifying match. A contestant has to beat the last-ranked student of my class to receive the invitation to BFCom2018. If nobody beats any of my students, I'll just run the competition with the in-class students. 

For the qualifying match, I'll provide three years of hourly load and temperature, and one year of hourly temperature for the fourth year. The contestants should submit the ex post load forecast for the fourth year. The temperature data is from 28 weather stations. To excel in the qualifying match, the contestants may want to read two of my IJF papers on weather station selection and recency effect

Important Dates

Oct 8, 2018 - Registration open. 
Oct 21, 2018 - Registration close. 
Oct 22, 2018 - Qualifying match data release.
Nov 4, 2018 - Qualifying match submission due. 
Nov 5, 2018 - Leaderboard published; BFCom2018 invitation sent. 
Dec 3, 2018 - BFCom2018 winners announced. 

Note: There is no monetary prize for this competition. The leaderboard will be published on this blog. I will consider providing research assistantships to the top three contestants if they are interested in joining my lab as PhD students.

If you are interested, please register HERE. See you in the game!

Monday, July 9, 2018

From Club Convergence of Per Capita Industrial Pollutant Emissions to Industrial Transfer Effects: An Empirical Study Across 285 Cities in China

China has grown to the world's second largest economy by nominal GDP. Many factors attribute to such rapid growth, such as globalization and hard-working Chinese people. Nevertheless, we can't ignore the pollution resulted from the industrialization. Dr. Chang Liu brought the research problem to me when she visited BigDEAL last year. We spent a year investigating the relationship between industrial transfer effects and per capita industrial pollutant emissions across 285 cities in China. We identified four convergence clubs for SO2 emissions, and three convergence clubs for soot emissions. We also concluded that industrial transfer effects can lead to multiple steady-state equilibria. This presents some evidence to support region-specific environmental policies and execution strategies. 

This is the first time I sent a paper to Energy Policy. The original version was submitted on Feb 5, 2018. Within five months, the paper was published after three revisions. The entire publication process was quite pleasant.

Chang Liu, Tao Hong, Huaifeng Liu, and Lili Wang, "From club convergence of per capita industrial pollutant emissions to industrial transfer effects: an empirical study across 285 cities in China," Energy Policy, vol.121, pp 300-313, October 2018. (ScienceDirect)

From Club Convergence of Per Capita Industrial Pollutant Emissions to Industrial Transfer Effects: An Empirical Study Across 285 Cities in China

Chang Liu, Tao Hong, Huaifeng Liu, and Lili Wang


The process of industrialization has led to an increase in air pollutant emissions in China. At the regional level, industrial restructuring and industrial transfer from eastern China to western China have caused a significant difference in pollutant emissions among various cities. This paper analyzes per capita industrial pollutant emissions across 285 prefecture-level cities from 2003 to 2015, aiming to reveal how industrial transfer affects the formation of convergence clubs. Whether industrial pollutant emissions across heterogeneous cities converge to a unique steady-state equilibrium is first identified based on the concept of club convergence. Logit regression analysis is then applied to assess the effects of industrial transfer on the observed clubs. The log t-test highlights four convergence clubs for industrial SO2 emissions and three clubs for industrial soot emissions. The regression analysis results reveal that the effects of industrial transfer can lead to multiple steady-state equilibria, suggesting region-specific environmental policies and execution strategies. In addition, accelerating the development of clean energy technologies in emission-intense regions should be further emphasized. 

Monday, June 25, 2018

Big Data Analytics: Making Smart Grid Smarter

The May 2018 issue of the Power & Energy Magazine is on Big Data Analytics. My guest editorial is on IEEE Xplore with open access. The original articles are in English. The Spanish translation is also available. The links to these articles are listed below.


Tao Hong, "Big data analytics: making smart grid smarter" IEEE Power and Energy Magazine, vol.16, no.3, pp 12-16, May-June 2018. (IEEE Xplore)

Features in This Issue

Visualizing Big Energy Data
By Rob J. Hyndman, Xueqin (Amy) Liu, and Pierre Pinson

Distribution Synchrophasors
By Hamed Mohsenian-Rad, Emma Stewart, and Ed Cortez

Big Data Analytics for Flexible Energy Sharing
By Furong Li, Ran Li, Zhipeng Zhang, Mark Dale, David Tolley, and Petri Ahokangas

Weather Data for Energy Analytics
By Jonathan Black, Alex Hofmann, Tao Hong, Joseph Roberts, and Pu Wang

Big Data Analytics in China’s Electric Power Industry
By Chongqing Kang, Yi Wang, Yusheng Xue, Gang Mu, and Ruijin Liao

Training Energy Data Scientists
By Tao Hong, David Wenzhong Gao, Tom Laing, Dale Kruchten, and Jorge Calzada

Articulos de Mayo/Junio de 2018

Visualización de "big data" de energía
Por Rob J. Hyndman, Xueqin (Amy) Liu y Pierre Pinson

Sincrofasores en la distribución
Por Hamed Mohsenian-Rad, Emma Stewart y Ed Cortez

Análisis de "big data" para el intercambio flexible de energía
Por Furong Li, Ran Li, Zhipeng Zhang, Mark Dale, David Tolley y Petri Ahokangas

Datos meteorológicos para el análisis de energía
Por Jonathan Black, Alex Hofmann, Tao Hong, Joseph Robert y Pu Wang

Análisis de "big data" en la industria de la potencia eléctrica de china
Por Chongqing Kang, Yi Wang, Yusheng Xue, Gang Mu y Ruijin Liao

Formación de científicos de datos de energía
Por Tao Hong, David Wenzhong Gao, Tom Laing, Dale Kruchten y Jorge Calzada

Saturday, June 23, 2018

Call For Papers: Food and Agriculture Forecasting | International Journal of Forecasting

International Journal of Forecasting

Special Section on Food and Agriculture Forecasting

The fast growing world population brings a critical challenge to humanity: how to ensure adequate supply and access to safe, healthy food. Accurate forecasts provide valuable information to help in formulating national food and agricultural policies, and to help agriculture companies and farmers adjust their business strategies. Such forecasts cover production, consumption, stocks, trade and prices of major field crops (e.g., corn, sorghum, barley, oats, wheat, rice, soybeans, and cotton) and livestock (e.g., beef, pork, poultry and eggs, and dairy). This special section is to collect high-quality research that involves theoretical and practical aspects of forecasting in food and agriculture. Specifically, it encourages papers that inspire actionable insights and/or make methodological breakthroughs in this area.

Potential topics include but are not limited to:

  • Forecasting methodologies in food and agriculture
  • Major field crops forecasting
  • Livestock forecasting 
  • Agri-food products forecasting 
  • Forecasting in vegetables, fruits and other agriculture commodities
  • Agriculture commodities futures market forecasting
  • Natural resources forecasting in agriculture and food industry 
  • Water and energy forecasting in agriculture 
  • Climate forecasting in agriculture

Submission deadline: 31 December 2018

To submit a paper for consideration for the Special Section, please upload your paper online and include a cover letter clearly indicating that the paper is for the special issue “Food and Agriculture Forecasting”. The webpage for online submission is mc.manuscriptcentral.com/ijf. Instructions for authors are provided at www.forecasters.org/ijf/authors. All papers will follow IJF’s double-blind refereeing process. For further information about the Special Section, please contact the guest editors.

Guest Editors

Jue Wang, Chinese Academy of Sciences, China (wjue@amss.ac.cn)
Tao Hong, University of North Carolina at Charlotte, USA (hong@uncc.edu)

Monday, June 18, 2018

Combining Probabilistic Load Forecasts

We often find simple averaging as a plausible solution for combining point forecasts. Combining probabilistic forecasts is not that trivial. The literature of combining probabilistic load forecasts is rather limited. Previously, we developed a Quantile Regression Averaging (QRA) method to generate probabilistic load forecasts by combining point forecasts. This work is a follow up, where we combine probabilistic load forecasts to generate a more accurate probabilistic forecast. The method we proposed here is a Constrained Quantile Regression Averaging (CQRA) method, where the parameters of a quantile regression model are non-negative and sum up to 1. We applied the method to loads at both high voltage level and household level, showing better results than the benchmarks.

Among my papers published so far, this one has the shortest title.

Yi Wang, Ning Zhang, Yushi Tan, Tao Hong, Daniel Kirschen, and Chongqing Kang, "Combining probabilistic load forecasts," IEEE Transactions on Smart Grid, in press, available online. (arXiv; IEEE Xplore).

Combining Probabilistic Load Forecasts

Yi Wang, Ning Zhang, Yushi Tan, Tao Hong, Daniel Kirschen, and Chongqing Kang


Probabilistic load forecasts provide comprehensive information about future load uncertainties. In recent years, many methodologies and techniques have been proposed for probabilistic load forecasting. Forecast combination, a widely recognized best practice in point forecasting literature, has never been formally adopted to combine probabilistic load forecasts. This paper proposes a constrained quantile regression averaging (CQRA) method to create an improved ensemble from several individual probabilistic forecasts. We formulate the CQRA parameter estimation problem as a linear program with the objective of minimizing the pinball loss and the constraints that the parameters are nonnegative and summing up to one. We demonstrate the effectiveness of the proposed method using two publicly available datasets, the ISO New England data and Irish smart meter data. Comparing with the best individual probabilistic forecast, the ensemble can reduce the pinball score by 4.39% on average. The proposed ensemble also demonstrates superior performance over nine other benchmark ensembles.