Comparison of Statistical Logistic Regression and RandomForest Machine Learning Techniques in Predicting Diabetes

被引:27
|
作者
Daghistani, Tahani [1 ]
Alshammari, Riyad [1 ]
机构
[1] King Saud Bin Abdulaziz Univ Hlth Sci KSAU HS, King Abdullah Int Med Res Ctr KAIMRC, Coll Publ Hlth & Hlth Informat, Hlth Informat Dept,Minist Natl Guard Hlth Affairs, Riyadh, Saudi Arabia
关键词
diabetes; predictive model; machine learning; RandomForest; logistic regression;
D O I
10.12720/jait.11.2.78-83
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Diabetes is one of the global concerns in the healthcare domain and one of the leading challenges locally in Saudi Arabia. The prevalence of diabetes is anticipated to rise; early prediction of individuals at high risk of diabetes is a significant challenge. This study aims to compare RandomForest machine learning algorithm and Logistic Regression algorithm towards the prediction of diabetes. We analyzed 66,325 records that extracted from the Ministry of National Guard Hospital Affairs (MNGHA) databases in Saudi Arabia between 2013 and 2015. Both Machine Learning algorithms were applied to predict diabetes based on 18 risk factors. The evaluation criteria to compare the two algorithms were based on precision, Recall, True Positive rate, False Negative rate, F-measure and Area under the curve. The overall prevalence of diabetes in the data set is 64.47%. Male represents 55.50% of the data set while female represents 44.50%. For RandomForest (RF) model, the precision, Recall, True Positive Rate, False Positive Rate and F-measure value for predicting diabetes were 0.883, 0.88, 0.88, 0.188 and 0.876, respectively, while Logistic Regression model were only 0.692, 0.703, 0.703,0.454 and 0.675, respectively. Area under the ROC curve (AUC) value was 0.944 for the RF model and 0.708 for Logistic Regression model, which demonstrates higher predictive performance for RF than the Logistic Regression model. The RF algorithm showed superior prediction performance over Logistic Regression technique in predicting diabetes based on various matrices.
引用
收藏
页码:78 / 83
页数:6
相关论文
共 50 条
  • [41] Machine Learning Techniques for Predicting Metamaterial Microwave Absorption Performance: A Comparison
    Jain, Prince
    Chhabra, Himanshu
    Chauhan, Urvashi
    Prakash, Krishna
    Samant, Piyush
    Singh, Dhiraj Kumar
    Soliman, Mohamed S.
    Islam, Mohammad Tariqul
    IEEE ACCESS, 2023, 11 : 128774 - 128783
  • [42] Predicting Language Difficulties in Middle Childhood From Early Developmental Milestones: A Comparison of Traditional Regression and Machine Learning Techniques
    Armstrong, Rebecca
    Symons, Martyn
    Scott, James G.
    Arnott, Wendy L.
    Copland, David A.
    McMahon, Katie L.
    Whitehouse, Andrew J. O.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2018, 61 (08): : 1926 - 1944
  • [43] Risk Factor Prediction by Naive Bayes Classifier, Logistic Regression Models, Various Classification and Regression Machine Learning Techniques
    Kannan K.
    Menaga A.
    Proceedings of the National Academy of Sciences, India Section B: Biological Sciences, 2022, 92 (1) : 63 - 79
  • [44] Comparison of the cox regression to machine learning in predicting the survival of anaplastic thyroid carcinoma
    Lizhen Xu
    Liangchun Cai
    Zheng Zhu
    Gang Chen
    BMC Endocrine Disorders, 23
  • [45] Lung Cancer Survival Prediction via Machine Learning Regression, Classification, and Statistical Techniques
    Bartholomai, James A.
    Frieboes, Hermann B.
    2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 632 - 637
  • [46] Predicting skilled delivery service use in Ethiopia: dual application of logistic regression and machine learning algorithms
    Brook Tesfaye
    Suleman Atique
    Tariq Azim
    Mihiretu M. Kebede
    BMC Medical Informatics and Decision Making, 19
  • [47] A Review of Statistical and Machine Learning Techniques for Microvascular Complications in Type 2 Diabetes
    Sambyal, Nitigya
    Saini, Poonam
    Syal, Rupali
    CURRENT DIABETES REVIEWS, 2021, 17 (02) : 143 - 155
  • [48] Predicting skilled delivery service use in Ethiopia: dual application of logistic regression and machine learning algorithms
    Tesfaye, Brook
    Atique, Suleman
    Azim, Tariq
    Kebede, Mihiretu M.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)
  • [49] Logistic regression analysis and machine learning for predicting post-stroke gait independence: a retrospective study
    Yuta Miyazaki
    Michiyuki Kawakami
    Kunitsugu Kondo
    Akiko Hirabe
    Takayuki Kamimoto
    Tomonori Akimoto
    Nanako Hijikata
    Masahiro Tsujikawa
    Kaoru Honaga
    Kanjiro Suzuki
    Tetsuya Tsuji
    Scientific Reports, 14 (1)
  • [50] An Analysis of various Machine Learning Techniques for Predicting Diabetes in its Early Stages
    Durga, P.
    Sudhakar, T.
    JOURNAL OF PHARMACEUTICAL NEGATIVE RESULTS, 2022, 13 : 2030 - 2038