Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique

被引:8
|
作者
Gundogdu, Serdar [1 ]
机构
[1] Dokuz Eylul Univ, Bergama Vocat Sch, Dept Comp Technol, Izmir, Turkiye
关键词
COVID-19; Diabetes; Feature selection; MLR; Random forest; XGBoost; MACHINE; MELLITUS; MODELS; PLASMA;
D O I
10.1007/s11042-023-15165-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is one of the most common and serious diseases affecting human health. Early diagnosis and treatment are vital to prevent or delay complications related to diabetes. An automated diabetes detection system assists physicians in the early diagnosis of the disease and reduces complications by providing fast and precise results. This study aims to introduce a technique based on a combination of multiple linear regression (MLR), random forest (RF), and XGBoost (XG) to diagnose diabetes from questionnaire data. MLR-RF algorithm is used for feature selection, and XG is used for classification in the proposed system. The dataset is the diabetic hospital data in Sylhet, Bangladesh. It contains 520 instances, including 320 diabetics and 200 control instances. The performance of the classifiers is measured concerning accuracy (ACC), precision (PPV), recall (SEN, sensitivity), F1 score (F1), and the area under the receiver-operating-characteristic curve (AUC). The results show that the proposed system achieves an accuracy of 99.2%, an AUC of 99.3%, and a prediction time of 0.04825 seconds. The feature selection method improves the prediction time, although it does not affect the accuracy of the four compared classifiers. The results of this study are quite reasonable and successful when compared with other studies. The proposed method can be used as an auxiliary tool in diagnosing diabetes.
引用
收藏
页码:34163 / 34181
页数:19
相关论文
共 50 条
  • [31] Football Match Result Prediction Using the Random Forest Classifier
    Pugsee, Pakawan
    Pattawong, Pattarachai
    PROCEEDINGS OF 2019 2ND INTERNATIONAL CONFERENCE ON BIG DATA TECHNOLOGIES (ICBDT 2019), 2019, : 154 - 158
  • [32] Default Risk Prediction Using Random Forest and XGBoosting Classifier
    Sharma, Alok Kumar
    Li, Li-Hua
    Ahmad, Ramli
    2021 INTERNATIONAL CONFERENCE ON SECURITY AND INFORMATION TECHNOLOGIES WITH AI, INTERNET COMPUTING AND BIG-DATA APPLICATIONS, 2023, 314 : 91 - 101
  • [33] An empirical study to estimate the stability of random forest classifier on the hybrid features recommended by filter based feature selection technique
    Darshan, S. L. Shiva
    Jaidhar, C. D.
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (02) : 339 - 358
  • [34] Feature selection and classification of leukocytes using random forest
    Saraswat, Mukesh
    Arya, K. V.
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2014, 52 (12) : 1041 - 1052
  • [35] Feature selection and classification of leukocytes using random forest
    Mukesh Saraswat
    K. V. Arya
    Medical & Biological Engineering & Computing, 2014, 52 : 1041 - 1052
  • [36] Automated prediction of COVID-19 mortality outcome using clinical and laboratory data based on hierarchical feature selection and random forest classifier
    Amini, Nasrin
    Mahdavi, Mahdi
    Choubdar, Hadi
    Abedini, Atefeh
    Shalbaf, Ahmad
    Lashgari, Reza
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2023, 26 (02) : 160 - 173
  • [37] Rolling bearing fault feature selection based on standard deviation and random forest classifier using vibration signals
    Imane, Moussaoui
    Rahmoune, Chemseddine
    Benazzouz, Djamel
    ADVANCES IN MECHANICAL ENGINEERING, 2023, 15 (04)
  • [38] Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier
    Mursalin, Md
    Zhang, Yuan
    Chen, Yuehui
    Chawla, Nitesh V.
    NEUROCOMPUTING, 2017, 241 : 204 - 214
  • [39] Feature Selection or Predicting Heart Disease Using Black Hole Optimization Algorithm and XGBoost Classifier
    Rajadevi, R.
    Devi, E. M. Roopa
    Shanthakumari, R.
    Latha, R. S.
    Anitha, N.
    Devipriya, R.
    2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [40] Analysis of NBA Players and Shot Prediction Using Random Forest and XGBoost Models
    Oughali, Maram Shikh
    Bahloul, Mariah
    El Rahman, Sahar A.
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 157 - 161