A practical framework for early detection of diabetes using ensemble machine learning models

被引:2
|
作者
Saihood, Qusay [1 ]
Sonuc, Emrullah [1 ]
机构
[1] Karabuk Univ, Dept Comp Engn, Karabuk, Turkiye
关键词
Machine learning; ensemble learning; diabetes diagnosis; classification; ARTIFICIAL-INTELLIGENCE; PREDICTION;
D O I
10.55730/1300-0632.4013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diagnosis of diabetes, a prevalent global health condition, is crucial for preventing severe complications. In recent years, there has been a growing effort to develop intelligent diagnostic systems for diabetes utilizing machine learning (ML) algorithms. Despite these efforts, achieving high accuracy rates using such systems remains a significant challenge. Recent advancements in ensemble ML methods offer promising opportunities for early detection of diabetes, as they are known to be faster and more cost-effective than traditional approaches. Therefore, this study proposes a practical framework for diagnosing diabetes that involves three stages. The data preprocessing stage encompasses several crucial tasks, including handling missing values, identifying outliers, balancing the data, normalizing the data, and selecting relevant features. Subsequently, the hyperparameters of the ML algorithms are fine-tuned using grid search to improve their performance. In the final stage, the framework employs ensemble techniques such as bagging, boosting, and stacking to combine multiple ML algorithms and further enhance their predictive capability. Pima Indians Diabetes Database open-access dataset was used to test the performance of the proposed models. The experimental results of this framework indicate the superiority of ensemble methods in diagnosing diabetes compared to individual ML models. The stacking method achieved the best accuracy among the ensemble methods, with the stacked random forest (RF) and support vector machine (SVM) model attaining an accuracy of 97.50%. Among the bagging methods, the RF model yielded the highest accuracy, while among the boosting methods, eXtreme Gradient Boosting (XGB) model achieved the highest accuracy rates of 97.20% and 97.10%, respectively. Moreover, our proposed framework outperforms other ML models as confirmed by the comparison. The study has demonstrated that ensemble methods are crucial for accurate diabetes diagnosis, enabling early detection through efficient preprocessing and calibrated models.
引用
收藏
页码:722 / 738
页数:18
相关论文
共 50 条
  • [1] Early Prediction of Diabetes Using an Ensemble of Machine Learning Models
    Dutta, Aishwariya
    Hasan, Md Kamrul
    Ahmad, Mohiuddin
    Awal, Md Abdul
    Islam, Md Akhtarul
    Masud, Mehedi
    Meshref, Hossam
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (19)
  • [2] Early depression detection using ensemble machine learning framework
    Khan I.
    Gupta R.
    International Journal of Information Technology, 2024, 16 (6) : 3791 - 3798
  • [3] An ensemble framework for explainable geospatial machine learning models
    Liu, Lingbo
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 132
  • [4] Enhancing Phishing Website Detection Using Ensemble Machine Learning Models
    Baliyan, Himanshu
    Prasath, A. Rama
    2024 OPJU International Technology Conference on Smart Computing for Innovation and Advancement in Industry 4.0, OTCON 2024, 2024,
  • [5] Ensemble machine learning models for the detection of energy theft
    Gunturi, Sravan Kumar
    Sarkar, Dipu
    ELECTRIC POWER SYSTEMS RESEARCH, 2021, 192
  • [6] Ensemble machine learning models for the detection of energy theft
    Gunturi, Sravan Kumar
    Sarkar, Dipu
    Electric Power Systems Research, 2021, 192
  • [7] DDoS Attack Detection Using Ensemble Machine Learning Models with RFE Algorithm
    Visetbunditkun, Tanut
    Srichavengsup, Warakorn
    2022 7TH INTERNATIONAL CONFERENCE ON BUSINESS AND INDUSTRIAL RESEARCH (ICBIR2022), 2022, : 269 - 273
  • [8] Ensemble machine learning for the early detection of COPD exacerbations
    Boubacar, Habiboulaye Amadou
    Texereau, Joelle
    EUROPEAN RESPIRATORY JOURNAL, 2017, 50
  • [9] Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
    Abnoosian, Karlo
    Farnoosh, Rahman
    Behzadi, Mohammad Hassan
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [10] Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
    Karlo Abnoosian
    Rahman Farnoosh
    Mohammad Hassan Behzadi
    BMC Bioinformatics, 24