The Prevalence of Errors in Machine Learning Experiments

被引:6
|
作者
Shepperd, Martin [1 ]
Guo, Yuchen [2 ]
Li, Ning [3 ]
Arzoky, Mahir [1 ]
Capiluppi, Andrea [1 ]
Counsell, Steve [1 ]
Destefanis, Giuseppe [1 ]
Swift, Stephen [1 ]
Tucker, Allan [1 ]
Yousefi, Leila [1 ]
机构
[1] Brunel Univ London, London, England
[2] Xi An Jiao Tong Univ, Xian, Peoples R China
[3] Northwestern Polytech Univ, Xian, Peoples R China
关键词
Classifier; Computational experiment; Reliability; Error;
D O I
10.1007/978-3-030-33607-3_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments.
引用
收藏
页码:102 / 109
页数:8
相关论文
共 50 条
  • [31] Contrastive Learning for Robust Neural Machine Translation with ASR Errors
    Hu, Dongyang
    Li, Junhui
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 81 - 91
  • [32] A NEW APPLICATION OF MACHINE LEARNING: DETECTING ERRORS IN NETWORK SIMULATIONS
    Wozniak, Maciej K.
    Liang, Luke
    Phan, Hieu
    Giabbanelli, Philippe J.
    2022 WINTER SIMULATION CONFERENCE (WSC), 2022, : 653 - 664
  • [33] Machine learning and experiments A synergy for the development of functional materials
    Zheng, Bowen
    Jin, Zeqing
    Hu, Grace
    Gu, Jimin
    Yu, Shao-Yi
    Lee, Jeong-Ho
    Gu, Grace X. X.
    MRS BULLETIN, 2023, 48 (02) : 142 - 152
  • [34] Machine learning and experiments: A synergy for the development of functional materials
    Bowen Zheng
    Zeqing Jin
    Grace Hu
    Jimin Gu
    Shao-Yi Yu
    Jeong-Ho Lee
    Grace X. Gu
    MRS Bulletin, 2023, 48 : 142 - 152
  • [35] Assessing Naming Errors Using an Automated Machine Learning Approach
    Schnur, Tatiana T.
    Lei, Chia-Ming
    NEUROPSYCHOLOGY, 2022, 36 (08) : 709 - 718
  • [36] A New Application of Machine Learning: Detecting Errors in Network Simulations
    Wozniak, MacIej K.
    Liang, Luke
    Phan, Hieu
    Giabbanelli, Philippe J.
    Proceedings - Winter Simulation Conference, 2022, 2022-December : 653 - 664
  • [37] A hybrid machine learning algorithm for designing quantum experiments
    L. O’Driscoll
    R. Nichols
    P. A. Knott
    Quantum Machine Intelligence, 2019, 1 : 5 - 15
  • [38] The transformative potential of machine learning for experiments in fluid mechanics
    Ricardo Vinuesa
    Steven L. Brunton
    Beverley J. McKeon
    Nature Reviews Physics, 2023, 5 : 536 - 545
  • [39] Synthetic data enable experiments in atomistic machine learning
    Gardner, John L. A.
    Beaulieu, Zoe Faure
    Deringer, Volker L.
    DIGITAL DISCOVERY, 2023, 2 (03): : 651 - 662
  • [40] Machine Learning in Soft Matter: From Simulations to Experiments
    Zhang, Kaihua
    Gong, Xiangrui
    Jiang, Ying
    ADVANCED FUNCTIONAL MATERIALS, 2024, 34 (24)