Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers

被引:124
|
作者
Hasan, Md. Kamrul [1 ]
Alam, Md. Ashraful [1 ]
Das, Dola [2 ]
Hossain, Eklas [3 ]
Hasan, Mahmudul [2 ]
机构
[1] Khulna Univ Engn & Technol, Dept Elect & Elect Engn, Khulna 9203, Bangladesh
[2] Khulna Univ Engn & Technol, Dept Comp Sci & Engn, Khulna 9203, Bangladesh
[3] Oregon Inst Technol, Dept Elect Engn & Renewable Energy, Oregon Renewable Energy Ctr OREC, Klamath Falls, OR 97601 USA
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Diabetes prediction; ensembling classifier; machine learning; multilayer perceptron; missing values and outliers; Pima Indian Diabetic dataset; CROSS-VALIDATION; NEURAL-NETWORKS; MELLITUS; CLASSIFICATION; RISK;
D O I
10.1109/ACCESS.2020.2989857
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is highly challenging due to the limited number of labeled data and also the presence of outliers (or missing values) in the diabetes datasets. In this literature, we are proposing a robust framework for diabetes prediction where the outlier rejection, filling the missing values, data standardization, feature selection, K-fold cross-validation, and different Machine Learning (ML) classifiers (k-nearest Neighbour, Decision Trees, Random Forest, AdaBoost, Naive Bayes, and XGBoost) and Multilayer Perceptron (MLP) were employed. The weighted ensembling of different ML models is also proposed, in this literature, to improve the prediction of diabetes where the weights are estimated from the corresponding Area Under ROC Curve (AUC) of the ML model. AUC is chosen as the performance metric, which is then maximized during hyperparameter tuning using the grid search technique. All the experiments, in this literature, were conducted under the same experimental conditions using the Pima Indian Diabetes Dataset. From all the extensive experiments, our proposed ensembling classifier is the best performing classifier with the sensitivity, specificity, false omission rate, diagnostic odds ratio, and AUC as 0.789, 0.934, 0.092, 66.234, and 0.950 respectively which outperforms the state-of-the-art results by 2.00 & x0025; in AUC. Our proposed framework for the diabetes prediction outperforms the other methods discussed in the article. It can also provide better results on the same dataset which can lead to better performance in diabetes prediction. Our source code for diabetes prediction is made publicly available.
引用
收藏
页码:76516 / 76531
页数:16
相关论文
共 50 条
  • [1] Multi Disease Prediction Using Ensembling of Distinct Machine Learning and Deep Learning Classifiers
    Datta, M. Chaitanya
    Chowdary, B. Venkaiah
    Senapati, Rajiv
    SOFT COMPUTING AND ITS ENGINEERING APPLICATIONS, PT 2, ICSOFTCOMP 2023, 2024, 2031 : 245 - 257
  • [2] Diabetes Prediction Using Derived Features and Ensembling of Boosting Classifiers
    Rajkamal, R.
    Karthi, Anitha
    Gao, Xiao-Zhi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 2013 - 2033
  • [3] Heart disease prediction using entropy based feature engineering and ensembling of machine learning classifiers
    Rajendran, Rajkamal
    Karthi, Anitha
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [4] Ensembling classifiers using unsupervised learning
    Bundzel, Marek
    Sincak, Peter
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2008, PROCEEDINGS, 2008, 5097 : 513 - 521
  • [5] Diabetes prediction using machine learning classifiers with oversampling and feature augmentation
    Banday, Mehroush
    Zafar, Sherin
    Agarwal, Parul
    Alam, M. Afshar
    JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2024, 27 (02) : 455 - 464
  • [6] Implementing a Model to Detect Diabetes Prediction using Machine Learning Classifiers
    Sireesha, P. J.
    Prakash, K.
    Sumathi, D.
    JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (01) : 558 - 566
  • [7] DIABETES PREDICTION USING DIFFERENT MACHINE LEARNING APPROACHES
    Sonar, Priyanka
    JayaMalini, K.
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 367 - 371
  • [8] PREDICTION OF NOCTURNAL HYPOGLYCAEMIA IN ADULTS WITH TYPE 1 DIABETES USING MACHINE LEARNING CLASSIFIERS
    Afentakis, I.
    Herrero, P.
    Unsworth, R.
    Reddy, M.
    Oliver, N.
    Georgiou, P.
    DIABETES TECHNOLOGY & THERAPEUTICS, 2022, 24 : A226 - A226
  • [9] Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation
    Ahamed, B. Shamreen
    Arya, Meenakshi S.
    Nancy, Auxilia Osvin V.
    ADVANCES IN HUMAN-COMPUTER INTERACTION, 2022, 2022
  • [10] Prediction Of Diabetics Using Machine Learning Classifiers:A Review
    Baby, Steffy T.
    Karunakaran, V
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 530 - 537