A practical framework for early detection of diabetes using ensemble machine learning models

被引:2
|
作者
Saihood, Qusay [1 ]
Sonuc, Emrullah [1 ]
机构
[1] Karabuk Univ, Dept Comp Engn, Karabuk, Turkiye
关键词
Machine learning; ensemble learning; diabetes diagnosis; classification; ARTIFICIAL-INTELLIGENCE; PREDICTION;
D O I
10.55730/1300-0632.4013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diagnosis of diabetes, a prevalent global health condition, is crucial for preventing severe complications. In recent years, there has been a growing effort to develop intelligent diagnostic systems for diabetes utilizing machine learning (ML) algorithms. Despite these efforts, achieving high accuracy rates using such systems remains a significant challenge. Recent advancements in ensemble ML methods offer promising opportunities for early detection of diabetes, as they are known to be faster and more cost-effective than traditional approaches. Therefore, this study proposes a practical framework for diagnosing diabetes that involves three stages. The data preprocessing stage encompasses several crucial tasks, including handling missing values, identifying outliers, balancing the data, normalizing the data, and selecting relevant features. Subsequently, the hyperparameters of the ML algorithms are fine-tuned using grid search to improve their performance. In the final stage, the framework employs ensemble techniques such as bagging, boosting, and stacking to combine multiple ML algorithms and further enhance their predictive capability. Pima Indians Diabetes Database open-access dataset was used to test the performance of the proposed models. The experimental results of this framework indicate the superiority of ensemble methods in diagnosing diabetes compared to individual ML models. The stacking method achieved the best accuracy among the ensemble methods, with the stacked random forest (RF) and support vector machine (SVM) model attaining an accuracy of 97.50%. Among the bagging methods, the RF model yielded the highest accuracy, while among the boosting methods, eXtreme Gradient Boosting (XGB) model achieved the highest accuracy rates of 97.20% and 97.10%, respectively. Moreover, our proposed framework outperforms other ML models as confirmed by the comparison. The study has demonstrated that ensemble methods are crucial for accurate diabetes diagnosis, enabling early detection through efficient preprocessing and calibrated models.
引用
收藏
页码:722 / 738
页数:18
相关论文
共 50 条
  • [41] An ensemble framework for farmland quality evaluation based on machine learning and physical models
    Xian, Weixuan
    Liu, Hang
    Yang, Xingjian
    Huang, Xi
    Huang, Huiming
    Li, Yongtao
    Zeng, Qijing
    Tang, Xianzhe
    SCIENCE OF THE TOTAL ENVIRONMENT, 2024, 912
  • [42] An Ensemble Approach to Predict Early-Stage Diabetes Risk Using Machine Learning: An Empirical Study
    Laila, Umm e
    Mahboob, Khalid
    Khan, Abdul Wahid
    Khan, Faheem
    Taekeun, Whangbo
    SENSORS, 2022, 22 (14)
  • [43] Important Feature Selection & Accuracy Comparisons of Different Machine Learning Models for Early Diabetes Detection
    Rubaiat, Sajratul Yakin
    Rahman, Md Monibor
    Hasan, Md. Kamrul
    2018 INTERNATIONAL CONFERENCE ON INNOVATION IN ENGINEERING AND TECHNOLOGY (ICIET), 2018,
  • [44] Malicious url detection using machine learning and ensemble modeling
    Pakhare P.S.
    Krishnan S.
    Charniya N.N.
    Lecture Notes on Data Engineering and Communications Technologies, 2021, 66 : 839 - 850
  • [45] Multiclass Fake News Detection using Ensemble Machine Learning
    Kaliyar, Rohit Kumar
    Goswami, Anurag
    Narang, Pratik
    PROCEEDINGS OF THE 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC 2019), 2019, : 103 - 107
  • [46] An Ensemble-based Supervised Machine Learning Framework for Android Ransomware Detection
    Sharma, Shweta
    Challa, Rama Krishna
    Kumar, Rakesh
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (3A) : 422 - 429
  • [47] Enhanced Twitter bot detection using ensemble machine learning
    Shukla, Hrushikesh
    Jagtap, Nakshatra
    Patil, Balaji
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2021), 2021, : 930 - 936
  • [48] Code Smell Detection Using Ensemble Machine Learning Algorithms
    Dewangan, Seema
    Rao, Rajwant Singh
    Mishra, Alok
    Gupta, Manjari
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [49] A Robust Intrusion Detection System using Ensemble Machine Learning
    Divakar, Subham
    Priyadarshini, Rojalina
    Mishra, Brojo Kishore
    PROCEEDINGS OF 2020 6TH IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE 2020), 2020, : 348 - 351
  • [50] Fake News Detection Using Machine Learning Ensemble Methods
    Ahmad, Iftikhar
    Yousaf, Muhammad
    Yousaf, Suhail
    Ahmad, Muhammad Ovais
    COMPLEXITY, 2020, 2020