A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis

被引:0
|
作者
Zeidi, Farnaz [1 ]
Azar, Lalah [1 ]
Arslan, Vasfiye [1 ]
Erol, Cigdem [2 ,3 ]
机构
[1] Istanbul Univ, Inst Sci, Div Informat, Istanbul, Turkey
[2] Istanbul Univ, Informat Dept, Istanbul, Turkey
[3] Istanbul Univ, Fac Sci, Dept Biol, Div Bot, Istanbul, Turkey
关键词
Classification algorithms; diabetes diagnosis; hybrid model; K-means algorithm; normalization; outliers detection;
D O I
10.1080/01969722.2022.2080338
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes mellitus is a common and serious disease that has been studied by many researchers. Pima Indians Diabetes Dataset is one of the most famous datasets in this field. This study aims to increase the accuracy of machine learning algorithms in diagnosing the disease and to reveal the patterns that enable early diagnosis of the disease by focusing on the pre-processing stages. The proposed hybrid model includes "filling in missing values with KNN", "examining six different normalization methods for normalization" and "removing outliers with K-means" in the pre-processing stage. In the data classification stage, four algorithms C4.5, SVM, Naive Bayes and KNN were examined and the best hybrid model was found. The performance evaluation of these models is based on accuracy. The results were compared with previous studies and had higher accuracy of 98.3% and 99.1% for (KNN + n5 + K-means + SVM) and (KNN + n4/n3 + K-means + KNN), respectively. Finally, we offer the conclusive notes and some suggestions for further study.
引用
收藏
页码:1199 / 1211
页数:13
相关论文
共 50 条
  • [41] A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification
    Wang, Yanbo J.
    Coenen, Frans
    Sanderson, Robert
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 338 - +
  • [42] Multiscale hybrid algorithm for pre-processing of ultrasound images
    Ilesanmi, Ademola E.
    Idowu, Oluwagbenga P.
    Chaumrattanakul, Utairat
    Makhanov, Stanislav S.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 66 (66)
  • [43] Pre-processing method of data processing for phased array radar
    Yang, Chenyang
    Li, Shaohong
    Mao, Shiyi
    Zhang, Zhaowu
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 1998, 26 (03): : 80 - 85
  • [44] A hybrid signal pre-processing approach in processing ultrasonic signals with noise
    Palanisamy, S.
    Nagarajah, C. R.
    Graves, K.
    Iovenitti, P.
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2009, 42 (7-8): : 766 - 771
  • [45] A hybrid signal pre-processing approach in processing ultrasonic signals with noise
    S. Palanisamy
    C. R. Nagarajah
    K. Graves
    P. Iovenitti
    The International Journal of Advanced Manufacturing Technology, 2009, 42 : 766 - 771
  • [46] Intelligent Data Pre-processing Model in Integrated Ocean Observing Network System
    韩华
    丁永生
    刘凤鸣
    Journal of Donghua University(English Edition), 2009, 26 (05) : 499 - 502
  • [47] An Innovative Hybrid Model Based on Data Pre-Processing and Modified Optimization Algorithm and Its Application in Wind Speed Forecasting
    Jiang, Ping
    Wang, Zeng
    Zhang, Kequan
    Yang, Wendong
    ENERGIES, 2017, 10 (07):
  • [48] A framework of irregularity enlightenment for data pre-processing in data mining
    Au, Siu-Tong
    Duan, Rong
    Hesar, Siamak G.
    Jiang, Wei
    ANNALS OF OPERATIONS RESEARCH, 2010, 174 (01) : 47 - 66
  • [49] The application of data pre-processing technology in the geoscience big data
    Wang ChengBin
    Ma XiaoGang
    Chen JianGuo
    ACTA PETROLOGICA SINICA, 2018, 34 (02) : 303 - 313
  • [50] Methods for pre-processing smartcard data to improve data quality
    Robinson, Steve
    Narayanan, Baskaran
    Toh, Nelson
    Pereira, Francisco
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2014, 49 : 43 - 58