Comparison of Statistical Logistic Regression and RandomForest Machine Learning Techniques in Predicting Diabetes

被引:27
|
作者
Daghistani, Tahani [1 ]
Alshammari, Riyad [1 ]
机构
[1] King Saud Bin Abdulaziz Univ Hlth Sci KSAU HS, King Abdullah Int Med Res Ctr KAIMRC, Coll Publ Hlth & Hlth Informat, Hlth Informat Dept,Minist Natl Guard Hlth Affairs, Riyadh, Saudi Arabia
关键词
diabetes; predictive model; machine learning; RandomForest; logistic regression;
D O I
10.12720/jait.11.2.78-83
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is one of the global concerns in the healthcare domain and one of the leading challenges locally in Saudi Arabia. The prevalence of diabetes is anticipated to rise; early prediction of individuals at high risk of diabetes is a significant challenge. This study aims to compare RandomForest machine learning algorithm and Logistic Regression algorithm towards the prediction of diabetes. We analyzed 66,325 records that extracted from the Ministry of National Guard Hospital Affairs (MNGHA) databases in Saudi Arabia between 2013 and 2015. Both Machine Learning algorithms were applied to predict diabetes based on 18 risk factors. The evaluation criteria to compare the two algorithms were based on precision, Recall, True Positive rate, False Negative rate, F-measure and Area under the curve. The overall prevalence of diabetes in the data set is 64.47%. Male represents 55.50% of the data set while female represents 44.50%. For RandomForest (RF) model, the precision, Recall, True Positive Rate, False Positive Rate and F-measure value for predicting diabetes were 0.883, 0.88, 0.88, 0.188 and 0.876, respectively, while Logistic Regression model were only 0.692, 0.703, 0.703,0.454 and 0.675, respectively. Area under the ROC curve (AUC) value was 0.944 for the RF model and 0.708 for Logistic Regression model, which demonstrates higher predictive performance for RF than the Logistic Regression model. The RF algorithm showed superior prediction performance over Logistic Regression technique in predicting diabetes based on various matrices.
引用
下载
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [1] Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches
    Joshi, Ram D.
    Dhakal, Chandra K.
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (14)
  • [2] Comparison of logistic regression and machine learning techniques in prediction of habitat distribution of plant species
    Sahragard, Hossein Piri
    Chahouki, Mohammad Ali Zare
    RANGE MANAGEMENT AND AGROFORESTRY, 2016, 37 (01) : 21 - 26
  • [3] Comparison of Deep Learning, Machine Learning, and Penalized Logistic Regression for Predicting Clinical Deterioration in Oncology Inpatients
    Lyons, P.
    Li, D.
    McEvoy, C.
    Westervelt, P.
    Gage, B.
    Lu, C.
    Kollef, M. H.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2020, 201
  • [4] Comparison of machine learning techniques with classical statistical models in predicting health outcomes
    Song, XW
    Mitnitski, A
    Cox, J
    Rockwood, K
    MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 736 - 740
  • [5] Comparison of machine learning and logistic regression models in predicting psoriasis treatment outcome: A scoping review
    Haw, W.
    Hussain, A.
    Reynolds, N. J.
    Griffiths, C.
    Peek, N.
    Warren, R. B.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2022, 142 (12) : S200 - S200
  • [6] Predicting Diabetes Using Machine Learning Techniques
    Kirgil, Elif Nur Haner
    Erkal, Begum
    Ayyildiz, Tulin Ercelebi
    2022 INTERNATIONAL CONFERENCE ON THEORETICAL AND APPLIED COMPUTER SCIENCE AND ENGINEERING (ICTASCE), 2022, : 137 - 141
  • [7] Predicting Diabetes Mellitus With Machine Learning Techniques
    Zou, Quan
    Qu, Kaiyang
    Luo, Yamei
    Yin, Dehui
    Ju, Ying
    Tang, Hua
    FRONTIERS IN GENETICS, 2018, 9
  • [8] Logistic regression was as good as machine learning for predicting major chronic diseases
    Nusinovici, Simon
    Tham, Yih Chung
    Yan, Marco Yu Chak
    Ting, Daniel Shu Wei
    Li, Jialiang
    Sabanayagam, Charumathi
    Wong, Tien Yin
    Cheng, Ching-Yu
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2020, 122 : 56 - 69
  • [9] Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression
    Heine, John J.
    Land, Walker H.
    Egan, Kathleen M.
    BMC BIOINFORMATICS, 2011, 12
  • [10] Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression
    John J Heine
    Walker H Land
    Kathleen M Egan
    BMC Bioinformatics, 12