The quest for the reliability of machine learning models in binary classification on tabular data

被引:0
|
作者
Vitor Cirilo Araujo Santos
Lucas Cardoso
Ronnie Alves
机构
[1] Federal University of Pará,
[2] PPGCC,undefined
[3] Vale Institute of Technology,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we explore the reliability of contexts of machine learning (ML) models. There are several evaluation procedures commonly used to validate a model (precision, F1 Score and others); However, these procedures are not linked to the evaluation of learning itself, but only to the number of correct answers presented by the model. This characteristic makes it impossible to assess whether a model was able to learn through elements that make sense of the context in which it is inserted. Therefore, the model could achieves good results in the training stage but poor results when the model needs to be generalized. When there are many different models that achieve similar performance, the model that presented the highest number of hits in training does not mean that this model is the best. Therefore, we created a methodology based on Item Response Theory that allows us to identify whether an ML context is unreliable, providing an extra and different validation for ML models.
引用
收藏
相关论文
共 50 条
  • [41] On Machine Learning Classification of Otoneurological Data
    Juhola, Martti
    [J]. EHEALTH BEYOND THE HORIZON - GET IT THERE, 2008, 136 : 211 - 216
  • [42] HyperFast: Instant Classification for Tabular Data
    Bonet, David
    Montserrat, Daniel Mas
    Giro-i-Nieto, Xavier
    Ioannidis, Alexander G.
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11114 - 11123
  • [43] A New Method for Binary Classification of Proteins with Machine Learning
    Perri, Damiano
    Simonetti, Marco
    Lombardi, Andrea
    Faginas-Lago, Noelia
    Gervasi, Osvaldo
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT X, 2021, 12958 : 388 - 397
  • [44] Machine Learning With the Sugeno Integral: The Case of Binary Classification
    Abbaszadeh, Sadegh
    Huellermeier, Eyke
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (12) : 3723 - 3733
  • [45] Automatic Machine Learning-Based OLAP Measure Detection for Tabular Data
    Yang, Yuzhao
    Abdelhedi, Fatma
    Darmont, Jerome
    Ravat, Franck
    Teste, Olivier
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 173 - 188
  • [46] Effects of Class Imbalance and Data Scarcity on the Performance of Binary Classification Machine Learning Models Developed Based on ToxCast/Tox21 Assay Data
    Kim, Changhun
    Jeong, Jaeseong
    Choi, Jinhee
    [J]. CHEMICAL RESEARCH IN TOXICOLOGY, 2022, 35 (12) : 2219 - 2226
  • [47] Use of Reliability Engineering Concepts in Machine Learning for Classification
    Ursani, Ziauddin
    Corne, David W.
    [J]. 2017 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI), 2017, : 30 - 34
  • [48] A machine-learning approach to automatic detection of delimiters in tabular data files
    Saurav, Shitesh
    Schwarz, Peter
    [J]. PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 1501 - 1503
  • [49] Machine Learning Models for Classification of BGP Anomalies
    Al-Rousan, Nabil M.
    Trajkovic, Ljiljana
    [J]. 2012 IEEE 13TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (HPSR), 2012,
  • [50] Machine learning models for classification of BGP anomalies
    Al-Rousan, Nabil M.
    Trajkovic, Ljiljana
    [J]. 2012 IEEE 13th International Conference on High Performance Switching and Routing, HPSR 2012, 2012, : 103 - 108