Friday, April 8, 2016

Statistical Learning and Machine Learning

Machine learning is a field that studies how to make computers (machines) learn from the data to make predictions or help with data-driven decisions. The term "machine learning" started to become popular in 1990s, and then was somewhat surpassed, if not replaced, by "analytics". In the recent few years, following the buzzword wave of big data and deep learning,  the field of machine learning is gaining some momentum again. Since 1980s, machine learning techniques have been on most papers in the load forecasting literature.

Statistical learning, on the other hand, is not a familiar term to many people. I first got to know this term during my graduate school days, when I was reading the book The Elements of Statistical Learning. Since many techniques and methodologies introduced in this book can be applied to forecasting, I'm using this book as a reference book for my forecasting course this semester.

As of today, I still don't quite understand the difference between the two terms. It seems to me that statisticians like to use the term "statistical learning", while computer scientists and engineers are used to the term "machine learning". Maybe statistical learning has more statistical rigor, while machine learning emphasizes more on the algorithmic aspect. Another discussion about the difference between the two can be found on Wikipedia.

Early this year, I registered two online courses from Stanford University:
I thought it would be a good investment of my time for three reasons:
  • As a researcher in energy forecasting, I want to refresh my memory on machine/statistical learning;
  • As an instructor, I want to leverage the materials from other relevant courses when preparing for my forecasting course;
  • As a Graduate Program Director for a top-ranked online MS program, I want to see how Stanford University sets up its online courses.
I started both courses almost at the same time, though I was quickly addicted to the statistical learning one and gave up the other course. While both courses are taught by world-class professors and equipped with state-of-the-art online teaching technologies, the one by Trevor and Rob fits my taste much better:
  • The instruction style is very much grounded. These two statistics professors are very good at explaining the theory intuitively without involving much math. 
  • The subtitle is awesome.
  • Most of the quiz questions are testing the understanding of the concept, so I don't have to pull my computers to run R code. 
  • The guest speakers bring valuable perspectives and insights about the subject. 
  • The design of the progress bar is very simple and effective. Due to my busy schedule for other commitments, I almost quit the course several times. That progress bar helped me stay focus and on track. 
There are other nice features this course offers but I never had time to try, such as free textbook and an online discussion forum. I'm sure they are useful to the ones who can devote more time than me. In January and early February, I was watching video and answering the quiz questions late night before sleep every day. After getting about 35% overall progress, I was distracted by other commitments. Then in early April, I went through a few more lectures to pass the 50% bar. 

I will most likely take this course again, or at least watch the 15-hour video lectures. I would like to recommend this course to the energy forecasting community as well :)

Back to Load Forecasting Terminology.


  1. Most helpful. Thank you.

  2. Thank you very much for posting your experience and recommending the online course!


Note that you may link to your LinkedIn profile if you choose Name/URL option.