A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population

被引:11
|
作者
Chowdhury, Mohammad Ziaul Islam [1 ,2 ,3 ]
Leung, Alexander A. A. [1 ,4 ]
Walker, Robin L. L. [1 ,5 ]
Sikdar, Khokan C. C. [6 ]
O'Beirne, Maeve [2 ]
Quan, Hude [1 ]
Turin, Tanvir C. C. [1 ,2 ]
机构
[1] Univ Calgary, Dept Community Hlth Sci, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada
[2] Univ Calgary, Dept Family Med, 3330 Hosp Drive NW, Calgary, AB T2N 4N1, Canada
[3] Univ Calgary, Dept Psychiat, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada
[4] Univ Calgary, Dept Med, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada
[5] Alberta Hlth Serv, Primary Hlth Care Integrat Network, Primary Hlth Care, Calgary, AB, Canada
[6] Alberta Hlth Serv, Hlth Status Assessment Surveillance & Reporting, Publ Hlth Surveillance & Infrastructure, Prov Populat & Publ Hlth, 10101 Southport Rd SW, Calgary, AB T2W 3N2, Canada
关键词
RISK PREDICTION; IMPUTATION; HEALTH; AGE;
D O I
10.1038/s41598-022-27264-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Risk prediction models are frequently used to identify individuals at risk of developing hypertension. This study evaluates different machine learning algorithms and compares their predictive performance with the conventional Cox proportional hazards (PH) model to predict hypertension incidence using survival data. This study analyzed 18,322 participants on 24 candidate features from the large Alberta's Tomorrow Project (ATP) to develop different prediction models. To select the top features, we applied five feature selection methods, including two filter-based: a univariate Cox p-value and C-index; two embedded-based: random survival forest and least absolute shrinkage and selection operator (Lasso); and one constraint-based: the statistically equivalent signature (SES). Five machine learning algorithms were developed to predict hypertension incidence: penalized regression Ridge, Lasso, Elastic Net (EN), random survival forest (RSF), and gradient boosting (GB), along with the conventional Cox PH model. The predictive performance of the models was assessed using C-index. The performance of machine learning algorithms was observed, similar to the conventional Cox PH model. Average C-indexes were 0.78, 0.78, 0.78, 0.76, 0.76, and 0.77 for Ridge, Lasso, EN, RSF, GB and Cox PH, respectively. Important features associated with each model were also presented. Our study findings demonstrate little predictive performance difference between machine learning algorithms and the conventional Cox PH regression model in predicting hypertension incidence. In a moderate dataset with a reasonable number of features, conventional regression-based models perform similar to machine learning algorithms with good predictive accuracy.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Comparison of models for predicting winter individual thermal comfort based on machine learning algorithms
    Yang, Bin
    Li, Xiaojing
    Liu, Yihang
    Chen, Lingge
    Guo, Ruiqi
    Wang, Faming
    Yan, Ke
    Building and Environment, 2022, 215
  • [32] Comparison of traditional model-based statistical methods with machine learning for the prediction of suicide behaviour
    Nicolas Grendas, Leandro
    Chiapella, Luciana
    Emanuel Rodante, Demian
    Manuel Daray, Federico
    JOURNAL OF PSYCHIATRIC RESEARCH, 2022, 145 : 85 - 91
  • [33] Comparison of machine learning and conventional statistical modeling for predicting readmission following acute heart failure hospitalization
    Abdul-Samad, Karem
    Ma, Shihao
    Austin, David E.
    Chong, Alice
    Wang, Chloe X.
    Wang, Xuesong
    Austin, Peter C.
    Ross, Heather J.
    Wang, Bo
    Lee, Douglas S.
    AMERICAN HEART JOURNAL, 2024, 277 : 93 - 103
  • [34] Comparison of Machine Learning and Conventional Statistical Modeling for Predicting Readmissions Following Acute Heart Failure Hospitalization
    Abdul-Samad, Karem
    Ma, Shihao
    Chong, Alice
    Wang, Chloe X.
    Wang, Xuesong
    Austin, Peter C.
    Porter, Joan
    Ross, Heather J.
    Wang, Bo
    Lee, Douglas S.
    CIRCULATION, 2023, 148
  • [35] Evaluation of soil quality of cultivated lands with classification and regression-based machine learning algorithms optimization under humid environmental condition
    Dengiz, Orhan
    Alaboz, Pelin
    Saygın, Fikret
    Adem, Kemal
    Yüksek, Emre
    Advances in Space Research, 2024, 74 (11) : 5514 - 5529
  • [36] Combining logistic regression-based hybrid optimized machine learning algorithms with sensitivity analysis to achieve robust landslide susceptibility mapping
    Alqadhi, Saeed
    Mallick, Javed
    Talukdar, Swapan
    Bindajam, Ahmed Ali
    Saha, Tamal Kanti
    Ahmed, Mohd
    Khan, Roohul Abad
    GEOCARTO INTERNATIONAL, 2022, 37 (25) : 9518 - 9543
  • [37] Predicting tensile strength of spliced and non-spliced steel bars using machine learning- and regression-based methods
    Dabiri, Hamed
    Kheyroddin, Ali
    Faramarzi, Asaad
    Construction and Building Materials, 2022, 325
  • [38] Predicting tensile strength of spliced and non-spliced steel bars using machine learning- and regression-based methods
    Dabiri, Hamed
    Kheyroddin, Ali
    Faramarzi, Asaad
    CONSTRUCTION AND BUILDING MATERIALS, 2022, 325
  • [39] Soft computing techniques for predicting the properties of raw rice husk concrete bricks using regression-based machine learning approaches
    Ganasen, Nakkeeran
    Krishnaraj, L.
    Onyelowe, Kennedy C.
    Alaneme, George Uwadiegwu
    Otu, Obeten Nicholas
    SCIENTIFIC REPORTS, 2023, 13 (01):
  • [40] Soft computing techniques for predicting the properties of raw rice husk concrete bricks using regression-based machine learning approaches
    Nakkeeran Ganasen
    L. Krishnaraj
    Kennedy C. Onyelowe
    George Uwadiegwu Alaneme
    Obeten Nicholas Otu
    Scientific Reports, 13 (1)