The quest for the reliability of machine learning models in binary classification on tabular data

被引：0

作者：

Vitor Cirilo Araujo Santos

Lucas Cardoso

Ronnie Alves

机构：

[1] Federal University of Pará,

[2] PPGCC,undefined

[3] Vale Institute of Technology,undefined

来源：

Scientific Reports | / 13卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper we explore the reliability of contexts of machine learning (ML) models. There are several evaluation procedures commonly used to validate a model (precision, F1 Score and others); However, these procedures are not linked to the evaluation of learning itself, but only to the number of correct answers presented by the model. This characteristic makes it impossible to assess whether a model was able to learn through elements that make sense of the context in which it is inserted. Therefore, the model could achieves good results in the training stage but poor results when the model needs to be generalized. When there are many different models that achieve similar performance, the model that presented the highest number of hits in training does not mean that this model is the best. Therefore, we created a methodology based on Item Response Theory that allows us to identify whether an ML context is unreliable, providing an extra and different validation for ML models.

引用

共 50 条

[31] A Method for Analyzing the Performance Impact of Imbalanced Binary Data on Machine Learning Models
Zheng, Ming
Wang, Fei
Hu, Xiaowen
Miao, Yuhao
Cao, Huo
Tang, Mingjing
[J]. AXIOMS, 2022, 11 (11)
[32] Geochemistry π: Automated Machine Learning Python']Python Framework for Tabular Data
ZhangZhou, J.
He, Can
Sun, Jianhao
Zhao, Jianming
Lyu, Yang
Wang, Shengxin
Zhao, Wenyu
Li, Anzhou
Ji, Xiaohui
Agarwal, Anant
[J]. GEOCHEMISTRY GEOPHYSICS GEOSYSTEMS, 2024, 25 (01)
[33] Binary Classification of Network-Generated Flow Data Using a Machine Learning Algorithm
Bagui, Sikha
Shah, Keenal M.
Hu, Yizhi
Bagui, Subhash
[J]. INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2021, 15 (01) : 26 - 43
[34] AlphaML: A clear, legible, explainable, transparent, and elucidative binary classification platform for tabular data
Nasimian, Ahmad
Younus, Saleena
Tatli, Ozge
Hammarlund, Emma U.
Pienta, Kenneth J.
Ronnstrand, Lars
Kazi, Julhash U.
[J]. PATTERNS, 2024, 5 (01):
[35] On the diversity of machine learning models for system reliability
Machida, Fumio
[J]. 2019 IEEE 24TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2019), 2019, : 276 - 285
[36] Classification of machine learning frameworks for data-driven thermal fluid models
Chang, Chih-Wei
Dinh, Nam T.
[J]. INTERNATIONAL JOURNAL OF THERMAL SCIENCES, 2019, 135 : 559 - 579
[37] Classification of Thyroid Using Data Mining Models: A Comparison with Machine Learning Algorithm
Balasree K.
Dharmarajan K.
[J]. SN Computer Science, 5 (3)
[38] PERFORMANCE OF MACHINE LEARNING METHODS IN CLASSIFICATION MODELS WITH HIGH-DIMENSIONAL DATA
Zekic-Susac, Marijana
Pfeifer, Sanja
Sarlija, Natasa
[J]. SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 219 - 224
[39] Machine Learning Models for Classification of Cushing's Syndrome Using Retrospective Data
Isci, Senol
Kalender, Derya Sema Yaman
Bayraktar, Firat
Yaman, Alper
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (08) : 3153 - 3162
[40] Cardiotocography Data Analysis for Fetal Health Classification Using Machine Learning Models
Salini, Yalamanchili
Mohanty, Sachi Nandan
Ramesh, Janjhyam Venkata Naga
Yang, Ming
Chalapathi, Mukkoti Maruthi Venkata
[J]. IEEE ACCESS, 2024, 12 : 26005 - 26022

← 1 2 3 4 5 →