A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis

被引:0
|
作者
Zeidi, Farnaz [1 ]
Azar, Lalah [1 ]
Arslan, Vasfiye [1 ]
Erol, Cigdem [2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkey
[2] Istanbul Univ, Informat Dept, Istanbul, Turkey
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkey
关键词
Classification algorithms; diabetes diagnosis; hybrid model; K-means algorithm; normalization; outliers detection;
D O I
10.1080/01969722.2022.2080338
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a common and serious disease that has been studied by many researchers. Pima Indians Diabetes Dataset is one of the most famous datasets in this field. This study aims to increase the accuracy of machine learning algorithms in diagnosing the disease and to reveal the patterns that enable early diagnosis of the disease by focusing on the pre-processing stages. The proposed hybrid model includes "filling in missing values with KNN", "examining six different normalization methods for normalization" and "removing outliers with K-means" in the pre-processing stage. In the data classification stage, four algorithms C4.5, SVM, Naive Bayes and KNN were examined and the best hybrid model was found. The performance evaluation of these models is based on accuracy. The results were compared with previous studies and had higher accuracy of 98.3% and 99.1% for (KNN + n5 + K-means + SVM) and (KNN + n4/n3 + K-means + KNN), respectively. Finally, we offer the conclusive notes and some suggestions for further study.
引用
收藏
页码:1199 / 1211
页数:13
相关论文
共 50 条
  • [31] Ground data pre-processing for airborne scanner
    Zhu, Fuqing
    Hongwai Yu Haomibo Xuebao/Journal of Infrared and Millimeter Waves, 1992, 11 (03): : 227 - 234
  • [32] Big Data Pre-Processing: A Quality Framework
    Taleb, Ikbal
    Dssouli, Rachida
    Serhani, Mohamed Adel
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 191 - 198
  • [33] Pre-processing of meteorological data: Vertical profiles
    Erbrink, JJ
    Cenedese, A
    Cosemans, G
    Lasserre-Bigorry, A
    Weber, H
    Stubi, R
    INTERNATIONAL JOURNAL OF ENVIRONMENT AND POLLUTION, 1997, 8 (3-6) : 465 - 477
  • [34] PreP:: gene expression data pre-processing
    de la Nava, JG
    van Hijum, S
    Trelles, O
    BIOINFORMATICS, 2003, 19 (17) : 2328 - 2329
  • [35] Pre-processing of Partition Data for Enhancement of LOLIMOT
    Killian, Michaela
    Grosswindhager, Stefan
    Kozek, Martin
    Mayer, Barbara
    2013 8TH EUROSIM CONGRESS ON MODELLING AND SIMULATION (EUROSIM), 2013, : 271 - 275
  • [36] Analysis of activity detection data pre-processing
    Alexan, Anca
    Alexan, Alexandru
    Stefan, Oniga
    Pap, Iuliu Alexandru
    2019 IEEE 25TH INTERNATIONAL SYMPOSIUM FOR DESIGN AND TECHNOLOGY IN ELECTRONIC PACKAGING (SIITME 2019), 2019, : 282 - 286
  • [37] Improving Pipelining Tools for Pre-processing Data
    Novo-Loures, Maria
    Lage, Yeray
    Pavon, Reyes
    Laza, Rosalia
    Ruano-Ordas, David
    Ramon Mendez, Jose
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (04): : 214 - 224
  • [38] The Appliance of Data Pre-processing in Geological Modeling
    Zhang, Wei
    Li, Z. -P.
    Rong, Wang
    Wang, W. -X.
    2011 INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND NEURAL COMPUTING (FSNC 2011), VOL V, 2011, : 606 - 610
  • [39] Data pre-processing pipeline generation for AutoETL
    Giovanelli, Joseph
    Bilalli, Besim
    Abelló, Alberto
    Information Systems, 2022, 108
  • [40] The Appliance of Data Pre-processing in Geological Modeling
    Zhang, Wei
    Li, Z. -P.
    Rong, Wang
    Wang, W. -X.
    2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 3, 2011, : 75 - 79