Evaluating soccer match prediction models: a deep learning approach and feature optimization for gradient-boosted trees

被引:0
|
作者
Yeung, Calvin [1 ]
Bunker, Rory [1 ]
Umemoto, Rikuhei [1 ]
Fujii, Keisuke [1 ,2 ,3 ]
机构
[1] Nagoya Univ, Grad Sch Informat, Nagoya, Japan
[2] RIKEN, Ctr Adv Intelligence Project, Osaka, Japan
[3] Japan Sci & Technol Agcy, PRESTO, Urawa, Saitama, Japan
关键词
Sports; Match result prediction; Match outcome forecasting; Football; FOOTBALL; REGRESSION; OUTCOMES; SCORES;
D O I
10.1007/s10994-024-06608-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. The original training set of matches and features, which was provided for the competition, was augmented with additional matches that were played between 4 April and 13 April 2023, representing the period after which the training set ended, but prior to the first matches that were to be predicted (upon which the performance was evaluated). A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. Notably, deep learning models have frequently been disregarded in this particular task. Therefore, in this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model. The model was trained using the most recent 5 years of data, and three training and validation sets were used in a hyperparameter grid search. The results from the validation sets show that our model had strong performance and stability compared to previously published models from the 2017 Soccer Prediction Challenge for win/draw/loss prediction. Our model ranked 16th in the 2023 Soccer Prediction Challenge with RPS 0.2195.
引用
收藏
页码:7541 / 7564
页数:24
相关论文
共 50 条
  • [1] Comprehensive analysis of machine learning models for prediction of sub-clinical mastitis: Deep Learning and Gradient-Boosted Trees outperform other models
    Ebrahimi, Mansour
    Mohammadi-Dehcheshmeh, Manijeh
    Ebrahimie, Esmaeil
    Petrovski, Kiro R.
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 114
  • [2] Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees
    Brophy, Jonathan
    Hammoudeh, Zayd
    Lowd, Daniel
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [3] Bagging Gradient-Boosted Trees for High Precision, Low Variance Ranking Models
    Ganjisaffar, Yasser
    Caruana, Rich
    Lopes, Cristina Videira
    [J]. PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 85 - 94
  • [4] Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded
    Mistry, Miten
    Letsios, Dimitrios
    Krennrich, Gerhard
    Lee, Robert M.
    Misener, Ruth
    [J]. INFORMS JOURNAL ON COMPUTING, 2021, 33 (03) : 1103 - 1119
  • [5] Improving the prediction of an atmospheric chemistry transport model using gradient-boosted regression trees
    Ivatt, Peter D.
    Evans, Mathew J.
    [J]. ATMOSPHERIC CHEMISTRY AND PHYSICS, 2020, 20 (13) : 8063 - 8082
  • [6] Euclid: Identifying the reddest high-redshift galaxies in the Euclid Deep Fields with gradient-boosted trees
    Signor, T.
    Rodighiero, G.
    Bisigello, L.
    Bolzonella, M.
    Caputi, K.I.
    Daddi, E.
    De Lucia, G.
    Enia, A.
    Gabarra, L.
    Gruppioni, C.
    Humphrey, A.
    La Franca, F.
    Mancini, C.
    Pozzetti, L.
    Serjeant, S.
    Spinoglio, L.
    Van Mierlo, S.E.
    Andreon, S.
    Auricchio, N.
    Baldi, M.
    Bardelli, S.
    Battaglia, P.
    Bender, R.
    Bodendorf, C.
    Bonino, D.
    Branchini, E.
    Brescia, M.
    Brinchmann, J.
    Camera, S.
    Capobianco, V.
    Carbone, C.
    Carretero, J.
    Casas, S.
    Castellano, M.
    Cavuoti, S.
    Cimatti, A.
    Cledassou, R.
    Congedo, G.
    Conselice, C.J.
    Conversi, L.
    Copin, Y.
    Corcione, L.
    Courbin, F.
    Courtois, H.M.
    Da Silva, A.
    Degaudenzi, H.
    Di Giorgio, A.M.
    Dinis, J.
    Dubath, F.
    Dupac, X.
    [J]. Astronomy and Astrophysics, 2024, 685
  • [7] Learning to predict soccer results from relational data with gradient boosted trees
    Hubacek, Ondrej
    Sourek, Gustav
    Zelezny, Filip
    [J]. MACHINE LEARNING, 2019, 108 (01) : 29 - 47
  • [8] Learning to predict soccer results from relational data with gradient boosted trees
    Ondřej Hubáček
    Gustav Šourek
    Filip Železný
    [J]. Machine Learning, 2019, 108 : 29 - 47
  • [9] Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500
    Krauss, Christopher
    Xuan Anh Do
    Huck, Nicolas
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2017, 259 (02) : 689 - 702
  • [10] Deep learning and Boosted trees for injuries prediction in power infrastructure projects
    Oyedele, Ahmed
    Ajayi, Anuoluwapo
    Oyedele, Lukumon O.
    Delgado, Juan Manuel Davila
    Akanbi, Lukman
    Akinade, Olugbenga
    Owolabi, Hakeem
    Bilal, Muhammad
    [J]. APPLIED SOFT COMPUTING, 2021, 110