The quest for the reliability of machine learning models in binary classification on tabular data

被引:0
|
作者
Vitor Cirilo Araujo Santos
Lucas Cardoso
Ronnie Alves
机构
[1] Federal University of Pará,
[2] PPGCC,undefined
[3] Vale Institute of Technology,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we explore the reliability of contexts of machine learning (ML) models. There are several evaluation procedures commonly used to validate a model (precision, F1 Score and others); However, these procedures are not linked to the evaluation of learning itself, but only to the number of correct answers presented by the model. This characteristic makes it impossible to assess whether a model was able to learn through elements that make sense of the context in which it is inserted. Therefore, the model could achieves good results in the training stage but poor results when the model needs to be generalized. When there are many different models that achieve similar performance, the model that presented the highest number of hits in training does not mean that this model is the best. Therefore, we created a methodology based on Item Response Theory that allows us to identify whether an ML context is unreliable, providing an extra and different validation for ML models.
引用
收藏
相关论文
共 50 条
  • [21] Latent classification models for binary data
    Langseth, Helge
    Nielsen, Thomas D.
    [J]. PATTERN RECOGNITION, 2009, 42 (11) : 2724 - 2736
  • [22] Binary Classification of Proteins by a Machine Learning Approach
    Perri, Damiano
    Simonetti, Marco
    Lombardi, Andrea
    Faginas-Lago, Noelia
    Gervasi, Osvaldo
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT VII, 2020, 12255 : 549 - 558
  • [23] Machine learning classification of binary semiconductor heterostructures
    Rom, Samir
    Ghosh, Aishwaryo
    Halder, Anita
    Dasgupta, Tanusri Saha
    [J]. PHYSICAL REVIEW MATERIALS, 2021, 5 (04):
  • [24] Cancer Classification of Gene Expression Data using Machine Learning Models
    De Guia, Joseph M.
    Devaraj, Madhavi
    Vea, Larry A.
    [J]. 2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [25] A Comparative Study of Machine Learning Classification Models on Customer Behavior Data
    Rusli, Nur Ida Aniza
    Zulkifle, Farizuwana Akma
    Ramli, Intan Syaherra
    [J]. SOFT COMPUTING IN DATA SCIENCE, SCDS 2023, 2023, 1771 : 222 - 231
  • [26] Data set and machine learning models for the classification of network traffic originators
    Canavese, Daniele
    Regano, Leonardo
    Basile, Cataldo
    Ciravegna, Gabriele
    Lioy, Antonio
    [J]. DATA IN BRIEF, 2022, 41
  • [27] Classification of Firewall Log Data Using Multiclass Machine Learning Models
    Aljabri, Malak
    Alahmadi, Amal A.
    Mohammad, Rami Mustafa A.
    Aboulnour, Menna
    Alomari, Dorieh M.
    Almotiri, Sultan H.
    [J]. ELECTRONICS, 2022, 11 (12)
  • [28] Gene expression data classification using topology and machine learning models
    Dey, Tamal K.
    Mandal, Sayan
    Mukherjee, Soham
    [J]. BMC BIOINFORMATICS, 2022, 22 (SUPPL 10)
  • [29] Gene expression data classification using topology and machine learning models
    Tamal K. Dey
    Sayan Mandal
    Soham Mukherjee
    [J]. BMC Bioinformatics, 22
  • [30] Correlated Binary Data for Machine Learning
    Llobet Turro, Marti
    Cabrera-Bean, Margarita
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1411 - 1415