Predictive analysis and modelling football results using machine learning approach for English Premier League

被引:91
|
作者
Baboota, Rahul [1 ]
Kaur, Harleen [2 ]
机构
[1] Guru Gobind Singh Indraprastha Univ, New Delhi, India
[2] Jamia Hamdard, Sch Engn Sci & Technol, Dept Comp Sci & Engn, New Delhi, India
关键词
Machine learning; Feature engineering; Data mining; Predictive analysis; Random forest; Support vector machines (SVM); Ranked probability score (RPS); Gradient boosting; MATCH; PROBABILITY; SCORES;
D O I
10.1016/j.ijforecast.2018.01.003
中图分类号
F [经济];
学科分类号
02 ;
摘要
The introduction of artificial intelligence has given us the ability to build predictive systems with unprecedented accuracy. Machine learning is being used in virtually all areas in one way or another, due to its extreme effectiveness. One such area where predictive systems have gained a lot of popularity is the prediction of football match results. This paper demonstrates our work on the building of a generalized predictive model for predicting the results of the English Premier League. Using feature engineering and exploratory data analysis, we create a feature set for determining the most important factors for predicting the results of a football match, and consequently create a highly accurate predictive system using machine learning. We demonstrate the strong dependence of our models' performances on important features. Our best model using gradient boosting achieved a performance of 0.2156 on the ranked probability score (RPS) metric for game weeks 6 to 38 for the English Premier League aggregated over two seasons (2014-2015 and 2015-2016), whereas the betting organizations that we consider (Bet365 and Pinnacle Sports) obtained an RPS value of 0.2012 for the same period. Since a lower RPS value represents a higher predictive accuracy, our model was not able to outperform the bookmaker's predictions, despite obtaining promising results. (C) 2018 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:741 / 755
页数:15
相关论文
共 50 条