A practical framework for early detection of diabetes using ensemble machine learning models

被引:2
|
作者
Saihood, Qusay [1 ]
Sonuc, Emrullah [1 ]
机构
[1] Karabuk Univ, Dept Comp Engn, Karabuk, Turkiye
关键词
Machine learning; ensemble learning; diabetes diagnosis; classification; ARTIFICIAL-INTELLIGENCE; PREDICTION;
D O I
10.55730/1300-0632.4013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diagnosis of diabetes, a prevalent global health condition, is crucial for preventing severe complications. In recent years, there has been a growing effort to develop intelligent diagnostic systems for diabetes utilizing machine learning (ML) algorithms. Despite these efforts, achieving high accuracy rates using such systems remains a significant challenge. Recent advancements in ensemble ML methods offer promising opportunities for early detection of diabetes, as they are known to be faster and more cost-effective than traditional approaches. Therefore, this study proposes a practical framework for diagnosing diabetes that involves three stages. The data preprocessing stage encompasses several crucial tasks, including handling missing values, identifying outliers, balancing the data, normalizing the data, and selecting relevant features. Subsequently, the hyperparameters of the ML algorithms are fine-tuned using grid search to improve their performance. In the final stage, the framework employs ensemble techniques such as bagging, boosting, and stacking to combine multiple ML algorithms and further enhance their predictive capability. Pima Indians Diabetes Database open-access dataset was used to test the performance of the proposed models. The experimental results of this framework indicate the superiority of ensemble methods in diagnosing diabetes compared to individual ML models. The stacking method achieved the best accuracy among the ensemble methods, with the stacked random forest (RF) and support vector machine (SVM) model attaining an accuracy of 97.50%. Among the bagging methods, the RF model yielded the highest accuracy, while among the boosting methods, eXtreme Gradient Boosting (XGB) model achieved the highest accuracy rates of 97.20% and 97.10%, respectively. Moreover, our proposed framework outperforms other ML models as confirmed by the comparison. The study has demonstrated that ensemble methods are crucial for accurate diabetes diagnosis, enabling early detection through efficient preprocessing and calibrated models.
引用
收藏
页码:722 / 738
页数:18
相关论文
共 50 条
  • [31] Integrating ensemble and machine learning models for early prediction of pneumonia mortality using laboratory tests
    Baik, Seung Min
    Hong, Kyung Sook
    Lee, Jae-Myeong
    Park, Dong Jin
    HELIYON, 2024, 10 (14)
  • [32] Type 2 Diabetes Mellitus: Early Detection using Machine Learning Classification
    Gowthami, S.
    Reddy, Venkata Siva
    Ahmed, Mohammed Riyaz
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1191 - 1198
  • [33] A framework for extrusion detection using machine learning
    Luo, Yan
    Tsai, Jeffrey J. P.
    ISORC 2008: 11TH IEEE SYMPOSIUM ON OBJECT/COMPONENT/SERVICE-ORIENTED REAL-TIME DISTRIBUTED COMPUTING - PROCEEDINGS, 2008, : 83 - 88
  • [34] A novel approach for explicit song lyrics detection using machine and deep ensemble learning models
    Chen, Xiaoyuan
    Aljrees, Turki
    Umer, Muhammad
    Karamti, Hanen
    Tahir, Saba
    Abuzinadah, Nihal
    Alnowaiser, Khaled
    Eshmawi, Ala' Abdulmajid
    Mohamed, Abdullah
    Ashraf, Imran
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [35] Unsupervised machine learning framework for early machine failure detection in an industry
    Hasan, Nabeela
    Chaudhary, Kiran
    Alam, Mansaf
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2021, 24 (05): : 1497 - 1508
  • [36] An Investigation of Early Detection of Driver Drowsiness Using Ensemble Machine Learning Based on Hybrid Sensing
    Gwak, Jongseong
    Hirao, Akinari
    Shino, Motoki
    APPLIED SCIENCES-BASEL, 2020, 10 (08):
  • [37] Solar Irradiance Forecasting Using Ensemble Models of Machine Learning
    Prajesh, Ashish
    Jain, Prerna
    Anwar, Md Kaifi
    2023 IEEE IAS GLOBAL CONFERENCE ON RENEWABLE ENERGY AND HYDROGEN TECHNOLOGIES, GLOBCONHT, 2023,
  • [38] Forecasting of meteorological drought using ensemble and machine learning models
    Pande, Chaitanya Baliram
    Sidek, Lariyah Mohd
    Varade, Abhay M.
    Elkhrachy, Ismail
    Radwan, Neyara
    Tolche, Abebe Debele
    Elbeltagi, Ahmed
    ENVIRONMENTAL SCIENCES EUROPE, 2024, 36 (01)
  • [39] Extraction and Early Detection of Anomalies in Lightpath SNR Using Machine Learning Models
    Allogba, Stephanie
    Yameogo, Banti Laure M.
    Tremblay, Christine
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2022, 40 (07) : 1864 - 1872
  • [40] Early Detection of Vulnerabilities from News Websites using Machine Learning Models
    Iorga, Denis
    Corlatescu, Dragos
    Grigorescu, Octavian
    Sandescu, Cristian
    Dascalu, Mihai
    Rughinis, Razvan
    2020 19TH ROEDUNET CONFERENCE: NETWORKING IN EDUCATION AND RESEARCH (ROEDUNET), 2020,