Prediction of diabetes disease using an ensemble of machine learning multi-classifier models

被引:10
|
作者
Abnoosian, Karlo [1 ]
Farnoosh, Rahman [2 ]
Behzadi, Mohammad Hassan [1 ]
机构
[1] Islamic Azad Univ, Dept Stat, Sci & Res Branch, Tehran, Iran
[2] Iran Univ Sci & Technol, Sch Math, Tehran, Iran
关键词
Diabetes disease prediction; Machine learning classifiers; Ensemble machine learning models; Decision tree; Random forest; Feature selection; FEATURE-SELECTION;
D O I
10.1186/s12859-023-05465-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background and objectiveDiabetes is a life-threatening chronic disease with a growing global prevalence, necessitating early diagnosis and treatment to prevent severe complications. Machine learning has emerged as a promising approach for diabetes diagnosis, but challenges such as limited labeled data, frequent missing values, and dataset imbalance hinder the development of accurate prediction models. Therefore, a novel framework is required to address these challenges and improve performance.MethodsIn this study, we propose an innovative pipeline-based multi-classification framework to predict diabetes in three classes: diabetic, non-diabetic, and prediabetes, using the imbalanced Iraqi Patient Dataset of Diabetes. Our framework incorporates various pre-processing techniques, including duplicate sample removal, attribute conversion, missing value imputation, data normalization and standardization, feature selection, and k-fold cross-validation. Furthermore, we implement multiple machine learning models, such as k-NN, SVM, DT, RF, AdaBoost, and GNB, and introduce a weighted ensemble approach based on the Area Under the Receiver Operating Characteristic Curve (AUC) to address dataset imbalance. Performance optimization is achieved through grid search and Bayesian optimization for hyper-parameter tuning.ResultsOur proposed model outperforms other machine learning models, including k-NN, SVM, DT, RF, AdaBoost, and GNB, in predicting diabetes. The model achieves high average accuracy, precision, recall, F1-score, and AUC values of 0.9887, 0.9861, 0.9792, 0.9851, and 0.999, respectively.ConclusionOur pipeline-based multi-classification framework demonstrates promising results in accurately predicting diabetes using an imbalanced dataset of Iraqi diabetic patients. The proposed framework addresses the challenges associated with limited labeled data, missing values, and dataset imbalance, leading to improved prediction performance. This study highlights the potential of machine learning techniques in diabetes diagnosis and management, and the proposed framework can serve as a valuable tool for accurate prediction and improved patient care. Further research can build upon our work to refine and optimize the framework and explore its applicability in diverse datasets and populations.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
    Karlo Abnoosian
    Rahman Farnoosh
    Mohammad Hassan Behzadi
    [J]. BMC Bioinformatics, 24
  • [2] Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
    Dutta, Aishwariya
    Hasan, Md Kamrul
    Ahmad, Mohiuddin
    Awal, Md Abdul
    Islam, Md Akhtarul
    Masud, Mehedi
    Meshref, Hossam
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (19)
  • [3] Optimizing multi-classifier fusion for seabed sediment classification using machine learning
    Anokye, Michael
    Cui, Xiaodong
    Yang, Fanlin
    Wang, Ping
    Sun, Yuewen
    Ma, Hadong
    Amoako, Emmanuel Oduro
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [4] Multi-classifier ensemble based on dynamic weights
    Ren, Fuji
    Li, Yanqiu
    Hu, Min
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (16) : 21083 - 21107
  • [5] Multi-classifier ensemble based on dynamic weights
    Fuji Ren
    Yanqiu Li
    Min Hu
    [J]. Multimedia Tools and Applications, 2018, 77 : 21083 - 21107
  • [6] A Machine Learning Ensemble Classifier for Prediction of Brain Strokes
    Mostafa, Samaa A.
    Elzanfaly, Doaa S.
    Yakoub, Ahmed E.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 258 - 266
  • [7] Predictive analysis by ensemble classifier with machine learning models
    Chaya, Jagtap D.
    Usha, Rani N.
    [J]. International Journal of Computers and Applications, 2023, 45 (01) : 19 - 26
  • [8] Webshell detection based on multi-classifier ensemble model
    Wenjuan-Lian
    Qi-Fan
    Dandan-Shi
    Qili-An
    Jia, Bin
    [J]. Journal of Computers (Taiwan), 2020, 31 (01): : 242 - 252
  • [9] Extraction of Larch Plantation Based on Multi-Classifier Ensemble
    Ma, Ting
    Li, Chonggui
    Tang, Fuquan
    Lü, Jie
    [J]. Linye Kexue/Scientia Silvae Sinicae, 2021, 57 (11): : 105 - 118
  • [10] A Multi-Classifier for DDoS Attacks Using Stacking Ensemble Deep Neural Network
    Sayed, Moinul Islam
    Sayem, Ibrahim Mohammed
    Saha, Sajal
    Haque, Anwar
    [J]. 2022 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2022, : 1125 - 1130