A data- and knowledge-driven framework for developing machine learning models to predict soccer match outcomes

被引:0
|
作者
Berrar, Daniel [1 ,2 ]
Lopes, Philippe [3 ]
Dubitzky, Werner
机构
[1] Open Univ, Sch Math & Stat, Machine Learning Res Grp, Milton Keynes, England
[2] Tokyo Inst Technol, Sch Engn, Dept Informat & Commun Engn, Tokyo, Japan
[3] Univ Evry Paris Saclay, Sport & Exercise Sci Dept, Lab Biol Exercice Performance & Sante LBEPS, Evry Courcouronnes, France
关键词
2023 soccer prediction challenge; k-NN; Ordinal forests; Naive Bayes; Neural networks; Outcome prediction; Soccer analytics; Super league; ASSOCIATION FOOTBALL;
D O I
10.1007/s10994-024-06625-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The 2023 Soccer Prediction Challenge invited the machine learning community to develop innovative methods to predict the outcomes of 736 future soccer matches. The Challenge included two tasks. Task 1 was to forecast the exact match score, i.e., the number of goals scored by each team. Task 2 was to predict the match outcome as probability vector over the three possible result categories: victory of the home team, draw, and victory of the away team. Here, we present a new data- and knowledge-driven framework for building machine learning models from readily available data to predict soccer match outcomes. A key component of this framework is an innovative approach to modeling interdependent time series data of competing entities. Using this framework, we developed various predictive models based on k-nearest neighbors, artificial neural networks, naive Bayes, and ordinal forests, which we applied to the two tasks of the 2023 Soccer Prediction Challenge. Among all submissions to the Challenge, our machine learning models based on k-nearest neighbors and neural networks achieved top performances. Our main insights from the Challenge are that relatively simple learning algorithms perform remarkably well compared to more complex algorithms, and that the key to successful predictions lies in how well soccer domain knowledge can be incorporated in the modeling process.
引用
收藏
页码:8165 / 8204
页数:40
相关论文
共 50 条
  • [41] A twin data and knowledge-driven intelligent process planning framework of aviation parts
    Li, Jingjing
    Zhou, Guanghui
    Zhang, Chao
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2022, 60 (17) : 5217 - 5234
  • [42] Predicting metabolic fluxes from omics data via machine learning: Moving from knowledge-driven towards data-driven approaches
    Goncalves, Daniel M.
    Henriques, Rui
    Costa, Rafael S.
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 4960 - 4973
  • [43] A Layered Quality Framework for Machine Learning-driven Data and Information Models
    Azimi, Shelernaz
    Pahl, Claus
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS), VOL 1, 2020, : 579 - 587
  • [44] Developing data-driven learning models to predict urban stormwater runoff volume
    Wood-Ponce, Rachel
    Diab, Ghada
    Liu, Zeyu
    Blanchette, Ryan
    Hathaway, Jon
    Khojandi, Anahita
    URBAN WATER JOURNAL, 2024, 21 (05) : 549 - 564
  • [45] A knowledge-driven framework for Robotic Odor Source Localization using large language models
    Mahmud, Khan Raqib
    Wang, Lingxiao
    Hassan, Sunzid
    Zhang, Zheng
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2025, 186
  • [46] Understanding the performance of machine learning models from data- to patient-level
    Valeriano, Maria gabriela
    Matran-fernandez, Ana
    Kiffer, Carlos
    Lorena, Ana Carolina
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2024, 16 (04):
  • [47] Transfer Learning with Prior Data- Driven Models from Multiple Unconventional Fields
    Cornelio, Jodel
    Razak, Syamil Mohd
    Cho, Young
    Liu, Hui-Hai
    Vaidya, Ravimadhav
    Jafarpour, Behnam
    SPE JOURNAL, 2023, 28 (05): : 2385 - 2414
  • [48] Knowledge-Driven Machine Learning-based Channel Estimation in Massive MIMO System
    Li, Daofeng
    Xu, YaMei
    Zhao, Ming
    Zhang, Sihai
    Zhu, Jinkang
    2021 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE WORKSHOPS (WCNCW), 2021,
  • [49] A hybrid data- and model-driven learning framework for remaining useful life prognostics
    Cao, Hongjie
    Xiao, Wei
    Sun, Jian
    Gan, Ming-Gang
    Wang, Gang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
  • [50] KE: A Knowledge Enhancing Framework for Machine Learning Models
    Wang, Yijue
    Shah, Nidhibahen
    Soliman, Ahmed
    Guo, Dan
    Rajasekaran, Sanguthevar
    JOURNAL OF PHYSICAL CHEMISTRY A, 2023, 127 (40): : 8437 - 8446