Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers

被引:124
|
作者
Hasan, Md. Kamrul [1 ]
Alam, Md. Ashraful [1 ]
Das, Dola [2 ]
Hossain, Eklas [3 ]
Hasan, Mahmudul [2 ]
机构
[1] Khulna Univ Engn & Technol, Dept Elect & Elect Engn, Khulna 9203, Bangladesh
[2] Khulna Univ Engn & Technol, Dept Comp Sci & Engn, Khulna 9203, Bangladesh
[3] Oregon Inst Technol, Dept Elect Engn & Renewable Energy, Oregon Renewable Energy Ctr OREC, Klamath Falls, OR 97601 USA
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Diabetes prediction; ensembling classifier; machine learning; multilayer perceptron; missing values and outliers; Pima Indian Diabetic dataset; CROSS-VALIDATION; NEURAL-NETWORKS; MELLITUS; CLASSIFICATION; RISK;
D O I
10.1109/ACCESS.2020.2989857
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes, also known as chronic illness, is a group of metabolic diseases due to a high level of sugar in the blood over a long period. The risk factor and severity of diabetes can be reduced significantly if the precise early prediction is possible. The robust and accurate prediction of diabetes is highly challenging due to the limited number of labeled data and also the presence of outliers (or missing values) in the diabetes datasets. In this literature, we are proposing a robust framework for diabetes prediction where the outlier rejection, filling the missing values, data standardization, feature selection, K-fold cross-validation, and different Machine Learning (ML) classifiers (k-nearest Neighbour, Decision Trees, Random Forest, AdaBoost, Naive Bayes, and XGBoost) and Multilayer Perceptron (MLP) were employed. The weighted ensembling of different ML models is also proposed, in this literature, to improve the prediction of diabetes where the weights are estimated from the corresponding Area Under ROC Curve (AUC) of the ML model. AUC is chosen as the performance metric, which is then maximized during hyperparameter tuning using the grid search technique. All the experiments, in this literature, were conducted under the same experimental conditions using the Pima Indian Diabetes Dataset. From all the extensive experiments, our proposed ensembling classifier is the best performing classifier with the sensitivity, specificity, false omission rate, diagnostic odds ratio, and AUC as 0.789, 0.934, 0.092, 66.234, and 0.950 respectively which outperforms the state-of-the-art results by 2.00 & x0025; in AUC. Our proposed framework for the diabetes prediction outperforms the other methods discussed in the article. It can also provide better results on the same dataset which can lead to better performance in diabetes prediction. Our source code for diabetes prediction is made publicly available.
引用
收藏
页码:76516 / 76531
页数:16
相关论文
共 50 条
  • [21] Machine Learning for Dengue Outbreak Prediction: A Performance Evaluation of Different Prominent Classifiers
    Iqbal, Naiyar
    Islam, Mohammad
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2019, 43 (03): : 363 - 371
  • [22] Prediction of failures in sewer networks using various machine learning classifiers
    Kiziloz, Burak
    URBAN WATER JOURNAL, 2024, 21 (07) : 877 - 893
  • [23] Customer Churn Prediction In Telecommunication Industry Using Machine Learning Classifiers
    Mohammad, Nurul Izzati
    Ismail, Saiful Adli
    Kama, Mohd Nazri
    Yusop, Othman Mohd
    Azmi, Azri
    ICVISP 2019: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON VISION, IMAGE AND SIGNAL PROCESSING, 2019,
  • [24] Analysis and Prediction of Cardio Vascular Disease using Machine Learning Classifiers
    Kumar, N. Komal
    Sindhu, G. Sarika
    Prashanthi, D. Krishna
    Sulthana, A. Shaeen
    2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 15 - 21
  • [25] Prediction of Metastatic Relapse in Breast Cancer using Machine Learning Classifiers
    Merouane, Ertel
    Said, Amali
    Nour-eddine, El Faddouli
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (02) : 176 - 181
  • [26] Prediction of measles patients using machine learning classifiers: a comparative study
    Robert Gyebi
    Gabriel Asare Okyere
    Emmanuel Kwaku Nakua
    Franklin Aseidu-Bekoe
    Jane Serwaa Akoto Nti
    Emmanuel Owusu Ansah
    Felix Agyemang Opoku
    Bulletin of the National Research Centre, 47 (1)
  • [27] Diabetes prediction using Shapley additive explanations and DSaaS over machine learning classifiers: a novel healthcare paradigm
    Pratiyush Guleria
    Parvathaneni Naga Srinivasu
    M. Hassaballah
    Multimedia Tools and Applications, 2024, 83 : 40677 - 40712
  • [28] Diabetes Mellitus Disease Prediction and Type Classification Involving Predictive Modeling Using Machine Learning Techniques and Classifiers
    Ahamed, B. Shamreen
    Arya, Meenakshi S.
    Sangeetha, S. K. B.
    Auxilia Osvin, Nancy V.
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2022, 2022
  • [29] Diabetes prediction using Shapley additive explanations and DSaaS over machine learning classifiers: a novel healthcare paradigm
    Guleria, Pratiyush
    Srinivasu, Parvathaneni Naga
    Hassaballah, M.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 40677 - 40712
  • [30] Prediction of Diabetes Using Machine Learning Algorithms in Healthcare
    Sarwar, Muhammad Azeem
    Kamal, Nasir
    Hamid, Wajeeha
    Shah, Munam Ali
    2018 24TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC' 18), 2018, : 247 - 252