Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers

被引:124
|
作者
Hasan, Md. Kamrul [1 ]
Alam, Md. Ashraful [1 ]
Das, Dola [2 ]
Hossain, Eklas [3 ]
Hasan, Mahmudul [2 ]
机构
[1] Khulna Univ Engn & Technol, Dept Elect & Elect Engn, Khulna 9203, Bangladesh
[2] Khulna Univ Engn & Technol, Dept Comp Sci & Engn, Khulna 9203, Bangladesh
[3] Oregon Inst Technol, Dept Elect Engn & Renewable Energy, Oregon Renewable Energy Ctr OREC, Klamath Falls, OR 97601 USA
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Diabetes prediction; ensembling classifier; machine learning; multilayer perceptron; missing values and outliers; Pima Indian Diabetic dataset; CROSS-VALIDATION; NEURAL-NETWORKS; MELLITUS; CLASSIFICATION; RISK;
D O I
10.1109/ACCESS.2020.2989857
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is highly challenging due to the limited number of labeled data and also the presence of outliers (or missing values) in the diabetes datasets. In this literature, we are proposing a robust framework for diabetes prediction where the outlier rejection, filling the missing values, data standardization, feature selection, K-fold cross-validation, and different Machine Learning (ML) classifiers (k-nearest Neighbour, Decision Trees, Random Forest, AdaBoost, Naive Bayes, and XGBoost) and Multilayer Perceptron (MLP) were employed. The weighted ensembling of different ML models is also proposed, in this literature, to improve the prediction of diabetes where the weights are estimated from the corresponding Area Under ROC Curve (AUC) of the ML model. AUC is chosen as the performance metric, which is then maximized during hyperparameter tuning using the grid search technique. All the experiments, in this literature, were conducted under the same experimental conditions using the Pima Indian Diabetes Dataset. From all the extensive experiments, our proposed ensembling classifier is the best performing classifier with the sensitivity, specificity, false omission rate, diagnostic odds ratio, and AUC as 0.789, 0.934, 0.092, 66.234, and 0.950 respectively which outperforms the state-of-the-art results by 2.00 & x0025; in AUC. Our proposed framework for the diabetes prediction outperforms the other methods discussed in the article. It can also provide better results on the same dataset which can lead to better performance in diabetes prediction. Our source code for diabetes prediction is made publicly available.
引用
收藏
页码:76516 / 76531
页数:16
相关论文
共 50 条
  • [31] Diabetes Disease Prediction Using Machine Learning Algorithms
    Lyngdoh, Arwatki Chen
    Choudhury, Nurul Amin
    Moulik, Soumen
    2020 IEEE-EMBS CONFERENCE ON BIOMEDICAL ENGINEERING AND SCIENCES (IECBES 2020): LEADING MODERN HEALTHCARE TECHNOLOGY ENHANCING WELLNESS, 2021, : 517 - 521
  • [32] Comparative Analysis of Diabetes Prediction Using Machine Learning
    David, S. Alex
    Varsha, V.
    Ravali, Y.
    Saranya, N. Naga Amrutha
    SOFT COMPUTING FOR SECURITY APPLICATIONS, ICSCS 2022, 2023, 1428 : 155 - 163
  • [33] Diabetes prediction model using machine learning techniques
    Modak, Sandip Kumar Singh
    Jha, Vijay Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 38523 - 38549
  • [34] Diabetes Prediction Using Machine Learning Algorithms and Ontology
    El Massari H.
    Sabouri Z.
    Mhammedi S.
    Gherabi N.
    Journal of ICT Standardization, 2022, 10 (02): : 319 - 338
  • [35] Diabetes prediction model using machine learning techniques
    Sandip Kumar Singh Modak
    Vijay Kumar Jha
    Multimedia Tools and Applications, 2024, 83 : 38523 - 38549
  • [36] Prediction of blood supply in vestibular schwannomas using radiomics machine learning classifiers
    Song, Dixiang
    Zhai, Yixuan
    Tao, Xiaogang
    Zhao, Chao
    Wang, Minkai
    Wei, Xinting
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [37] COVID-19 Pandemic Prediction and Forecasting Using Machine Learning Classifiers
    Sultana, Jabeen
    Singha, Anjani Kumar
    Siddiqui, Shams Tabrez
    Nagalaxmi, Guthikonda
    Sriram, Anil Kumar
    Pathak, Nitish
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1007 - 1024
  • [38] Stroke Treatment Prediction Using Features Selection Methods and Machine Learning Classifiers
    Chourib, I.
    Guillard, G.
    Farah, I. R.
    Solaiman, B.
    IRBM, 2022, 43 (06) : 678 - 686
  • [39] Prediction of blood supply in vestibular schwannomas using radiomics machine learning classifiers
    Dixiang Song
    Yixuan Zhai
    Xiaogang Tao
    Chao Zhao
    Minkai Wang
    Xinting Wei
    Scientific Reports, 11
  • [40] Stock market prediction using machine learning classifiers and social media, news
    Wasiat Khan
    Mustansar Ali Ghazanfar
    Muhammad Awais Azam
    Amin Karami
    Khaled H. Alyoubi
    Ahmed S. Alfakeeh
    Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 3433 - 3456