The quest for the reliability of machine learning models in binary classification on tabular data

被引:0
|
作者
Vitor Cirilo Araujo Santos
Lucas Cardoso
Ronnie Alves
机构
[1] Federal University of Pará,
[2] PPGCC,undefined
[3] Vale Institute of Technology,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we explore the reliability of contexts of machine learning (ML) models. There are several evaluation procedures commonly used to validate a model (precision, F1 Score and others); However, these procedures are not linked to the evaluation of learning itself, but only to the number of correct answers presented by the model. This characteristic makes it impossible to assess whether a model was able to learn through elements that make sense of the context in which it is inserted. Therefore, the model could achieves good results in the training stage but poor results when the model needs to be generalized. When there are many different models that achieve similar performance, the model that presented the highest number of hits in training does not mean that this model is the best. Therefore, we created a methodology based on Item Response Theory that allows us to identify whether an ML context is unreliable, providing an extra and different validation for ML models.
引用
收藏
相关论文
共 50 条
  • [31] A Method for Analyzing the Performance Impact of Imbalanced Binary Data on Machine Learning Models
    Zheng, Ming
    Wang, Fei
    Hu, Xiaowen
    Miao, Yuhao
    Cao, Huo
    Tang, Mingjing
    [J]. AXIOMS, 2022, 11 (11)
  • [32] Geochemistry π: Automated Machine Learning Python']Python Framework for Tabular Data
    ZhangZhou, J.
    He, Can
    Sun, Jianhao
    Zhao, Jianming
    Lyu, Yang
    Wang, Shengxin
    Zhao, Wenyu
    Li, Anzhou
    Ji, Xiaohui
    Agarwal, Anant
    [J]. GEOCHEMISTRY GEOPHYSICS GEOSYSTEMS, 2024, 25 (01)
  • [33] Binary Classification of Network-Generated Flow Data Using a Machine Learning Algorithm
    Bagui, Sikha
    Shah, Keenal M.
    Hu, Yizhi
    Bagui, Subhash
    [J]. INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2021, 15 (01) : 26 - 43
  • [34] AlphaML: A clear, legible, explainable, transparent, and elucidative binary classification platform for tabular data
    Nasimian, Ahmad
    Younus, Saleena
    Tatli, Ozge
    Hammarlund, Emma U.
    Pienta, Kenneth J.
    Ronnstrand, Lars
    Kazi, Julhash U.
    [J]. PATTERNS, 2024, 5 (01):
  • [35] On the diversity of machine learning models for system reliability
    Machida, Fumio
    [J]. 2019 IEEE 24TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2019), 2019, : 276 - 285
  • [36] Classification of machine learning frameworks for data-driven thermal fluid models
    Chang, Chih-Wei
    Dinh, Nam T.
    [J]. INTERNATIONAL JOURNAL OF THERMAL SCIENCES, 2019, 135 : 559 - 579
  • [37] Classification of Thyroid Using Data Mining Models: A Comparison with Machine Learning Algorithm
    Balasree K.
    Dharmarajan K.
    [J]. SN Computer Science, 5 (3)
  • [38] PERFORMANCE OF MACHINE LEARNING METHODS IN CLASSIFICATION MODELS WITH HIGH-DIMENSIONAL DATA
    Zekic-Susac, Marijana
    Pfeifer, Sanja
    Sarlija, Natasa
    [J]. SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 219 - 224
  • [39] Machine Learning Models for Classification of Cushing's Syndrome Using Retrospective Data
    Isci, Senol
    Kalender, Derya Sema Yaman
    Bayraktar, Firat
    Yaman, Alper
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (08) : 3153 - 3162
  • [40] Cardiotocography Data Analysis for Fetal Health Classification Using Machine Learning Models
    Salini, Yalamanchili
    Mohanty, Sachi Nandan
    Ramesh, Janjhyam Venkata Naga
    Yang, Ming
    Chalapathi, Mukkoti Maruthi Venkata
    [J]. IEEE ACCESS, 2024, 12 : 26005 - 26022