Comparison of Statistical Logistic Regression and RandomForest Machine Learning Techniques in Predicting Diabetes

被引:27
|
作者
Daghistani, Tahani [1 ]
Alshammari, Riyad [1 ]
机构
[1] King Saud Bin Abdulaziz Univ Hlth Sci KSAU HS, King Abdullah Int Med Res Ctr KAIMRC, Coll Publ Hlth & Hlth Informat, Hlth Informat Dept,Minist Natl Guard Hlth Affairs, Riyadh, Saudi Arabia
关键词
diabetes; predictive model; machine learning; RandomForest; logistic regression;
D O I
10.12720/jait.11.2.78-83
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is one of the global concerns in the healthcare domain and one of the leading challenges locally in Saudi Arabia. The prevalence of diabetes is anticipated to rise; early prediction of individuals at high risk of diabetes is a significant challenge. This study aims to compare RandomForest machine learning algorithm and Logistic Regression algorithm towards the prediction of diabetes. We analyzed 66,325 records that extracted from the Ministry of National Guard Hospital Affairs (MNGHA) databases in Saudi Arabia between 2013 and 2015. Both Machine Learning algorithms were applied to predict diabetes based on 18 risk factors. The evaluation criteria to compare the two algorithms were based on precision, Recall, True Positive rate, False Negative rate, F-measure and Area under the curve. The overall prevalence of diabetes in the data set is 64.47%. Male represents 55.50% of the data set while female represents 44.50%. For RandomForest (RF) model, the precision, Recall, True Positive Rate, False Positive Rate and F-measure value for predicting diabetes were 0.883, 0.88, 0.88, 0.188 and 0.876, respectively, while Logistic Regression model were only 0.692, 0.703, 0.703,0.454 and 0.675, respectively. Area under the ROC curve (AUC) value was 0.944 for the RF model and 0.708 for Logistic Regression model, which demonstrates higher predictive performance for RF than the Logistic Regression model. The RF algorithm showed superior prediction performance over Logistic Regression technique in predicting diabetes based on various matrices.
引用
下载
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [21] Predicting Overweight and Obesity Status Among Malaysian Working Adults With Machine Learning or Logistic Regression: Retrospective Comparison Study
    Wong, Jyh Eiin
    Yamaguchi, Miwa
    Nishi, Nobuo
    Araki, Michihiro
    Wee, Lei Hum
    JMIR FORMATIVE RESEARCH, 2022, 6 (12)
  • [22] Investigation of expert rule bases, logistic regression, and non-linear machine learning techniques for predicting response to antiretroviral treatment
    Prosperi, Mattia C. F.
    Altmann, Andre
    Rosen-Zvi, Michal
    Aharoni, Ehud
    Gabor Borgulya
    Fulop Bazso
    Sonnerborg, Anders
    Schuelter, Eugen
    Struck, Daniel
    Ulivi, Giovanni
    Vandamme, Anne-Mieke
    Vercauteren, Jurgen
    Zazzi, Maurizio
    ANTIVIRAL THERAPY, 2009, 14 (03) : 433 - 442
  • [23] Logistic Regression for Machine Learning in Process Tomography
    Rymarczyk, Tomasz
    Kozlowski, Edward
    Klosowski, Grzegorz
    Niderla, Konrad
    SENSORS, 2019, 19 (15)
  • [24] Comparison of Statistical and Machine Learning Techniques for Physical Layer Authentication
    Senigagliesi, Linda
    Baldi, Marco
    Gambi, Ennio
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 1506 - 1521
  • [25] Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes
    Yang, Chung-Chi
    Peng, Chung-Hsin
    Huang, Li-Ying
    Chen, Fang Yu
    Kuo, Chun-Heng
    Wu, Chung-Ze
    Hsia, Te-Lin
    Lin, Chung-Yu
    WORLD JOURNAL OF CLINICAL CASES, 2023, 11 (33)
  • [26] NONPARAMETRIC STATISTICAL ANALYSIS FOR MULTIPLE COMPARISON OF MACHINE LEARNING REGRESSION ALGORITHMS
    Trawinski, Bogdan
    Smetek, Magdalena
    Telec, Zbigniew
    Lasota, Tadeusz
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2012, 22 (04) : 867 - 881
  • [27] Comparison between Machine Learning Algorithms in the Predicting the Onset of Diabetes
    Abed, Mahmood
    Ibrikci, Turgay
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [28] Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model
    Belsti, Yitayeh
    Moran, Lisa
    Du, Lan
    Mousa, Aya
    De Silva, Kushan
    Enticott, Joanne
    Teede, Helena
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 179
  • [29] Genetic biomarkers and machine learning techniques for predicting diabetes: systematic review
    Khan, Sulaiman
    Mohsen, Farida
    Shah, Zubair
    Artificial Intelligence Review, 2025, 58 (02)
  • [30] Predicting Daily Mean Solar Power Using Machine Learning Regression Techniques
    Jawaid, Faizan
    NazirJunejo, Khurum
    2016 SIXTH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2016, : 355 - 360