A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis

被引:0
|
作者
Zeidi, Farnaz [1 ]
Azar, Lalah [1 ]
Arslan, Vasfiye [1 ]
Erol, Cigdem [2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkey
[2] Istanbul Univ, Informat Dept, Istanbul, Turkey
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkey
关键词
Classification algorithms; diabetes diagnosis; hybrid model; K-means algorithm; normalization; outliers detection;
D O I
10.1080/01969722.2022.2080338
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a common and serious disease that has been studied by many researchers. Pima Indians Diabetes Dataset is one of the most famous datasets in this field. This study aims to increase the accuracy of machine learning algorithms in diagnosing the disease and to reveal the patterns that enable early diagnosis of the disease by focusing on the pre-processing stages. The proposed hybrid model includes "filling in missing values with KNN", "examining six different normalization methods for normalization" and "removing outliers with K-means" in the pre-processing stage. In the data classification stage, four algorithms C4.5, SVM, Naive Bayes and KNN were examined and the best hybrid model was found. The performance evaluation of these models is based on accuracy. The results were compared with previous studies and had higher accuracy of 98.3% and 99.1% for (KNN + n5 + K-means + SVM) and (KNN + n4/n3 + K-means + KNN), respectively. Finally, we offer the conclusive notes and some suggestions for further study.
引用
收藏
页码:1199 / 1211
页数:13
相关论文
共 50 条
  • [21] Online calibration and pre-processing of TAMA data
    Tatsumi, D
    Tsunesada, Y
    CLASSICAL AND QUANTUM GRAVITY, 2004, 21 (05) : S451 - S456
  • [22] Data pre-processing pipeline generation for AutoETL
    Giovanelli, Joseph
    Bilalli, Besim
    Abello, Alberto
    INFORMATION SYSTEMS, 2022, 108
  • [23] Application of pre-processing of NIRS modeling data
    Wang Zhihong
    Lin Jun
    PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION & INSTRUMENTATION, VOLS 1 - 3, 2006, : 295 - 298
  • [24] Parallel Pre-processing of Affymetrix Microarray Data
    Guzzi, Pietro Hiram
    Cannataro, Mario
    EURO-PAR 2010 PARALLEL PROCESSING WORKSHOPS, 2011, 6586 : 225 - 232
  • [25] SumatraTT:: a generic data pre-processing system
    Aubrecht, P
    Miksovsky, P
    Král, L
    14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 120 - 124
  • [26] A study on data pre-processing in reverse engineering
    Liu Deping
    Shangguan Jianlin
    Chen Jianjun
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MECHANICAL TRANSMISSIONS, VOLS 1 AND 2, 2006, : 1428 - 1432
  • [27] NanoStringNormCNV: pre-processing of NanoString CNV data
    Sendorek, Dorota H.
    Lalonde, Emilie
    Yao, Cindy Q.
    Sabelnykova, Veronica Y.
    Bristow, Robert G.
    Boutros, Paul C.
    BIOINFORMATICS, 2018, 34 (06) : 1034 - 1036
  • [28] Data pre-processing for obstacle in automotive applications
    Wahl, M
    Georges, D
    Dang, M
    IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, 1997, : 409 - 414
  • [29] An intelligent data pre-processing of complex datasets
    Abdul-Rahman, Shuzlina
    Abu Bakar, Azuraliza
    Mohamed-Hussein, Zeti-Azura
    INTELLIGENT DATA ANALYSIS, 2012, 16 (02) : 305 - 325
  • [30] Pre-processing of RDF data for METIS partitioning
    Benhamed S.
    Nait-Bahloul S.
    International Journal of Metadata, Semantics and Ontologies, 2023, 16 (02) : 152 - 171