A comparison of machine learning algorithms and traditional regression-based statistical modeling for predicting hypertension incidence in a Canadian population

被引：11

作者：

Chowdhury, Mohammad Ziaul Islam ^{[1
,2
,3
]}

Leung, Alexander A. A. ^{[1
,4
]}

Walker, Robin L. L. ^{[1
,5
]}

Sikdar, Khokan C. C. ^{[6
]}

O'Beirne, Maeve ^{[2
]}

Quan, Hude ^{[1
]}

Turin, Tanvir C. C. ^{[1
,2
]}

机构：

[1] Univ Calgary, Dept Community Hlth Sci, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada

[2] Univ Calgary, Dept Family Med, 3330 Hosp Drive NW, Calgary, AB T2N 4N1, Canada

[3] Univ Calgary, Dept Psychiat, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada

[4] Univ Calgary, Dept Med, 3280 Hosp Drive NW, Calgary, AB T2N 4Z6, Canada

[5] Alberta Hlth Serv, Primary Hlth Care Integrat Network, Primary Hlth Care, Calgary, AB, Canada

[6] Alberta Hlth Serv, Hlth Status Assessment Surveillance & Reporting, Publ Hlth Surveillance & Infrastructure, Prov Populat & Publ Hlth, 10101 Southport Rd SW, Calgary, AB T2W 3N2, Canada

来源：

SCIENTIFIC REPORTS | 2023年 / 13卷 / 01期

关键词：

RISK PREDICTION; IMPUTATION; HEALTH; AGE;

D O I：

10.1038/s41598-022-27264-x

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Risk prediction models are frequently used to identify individuals at risk of developing hypertension. This study evaluates different machine learning algorithms and compares their predictive performance with the conventional Cox proportional hazards (PH) model to predict hypertension incidence using survival data. This study analyzed 18,322 participants on 24 candidate features from the large Alberta's Tomorrow Project (ATP) to develop different prediction models. To select the top features, we applied five feature selection methods, including two filter-based: a univariate Cox p-value and C-index; two embedded-based: random survival forest and least absolute shrinkage and selection operator (Lasso); and one constraint-based: the statistically equivalent signature (SES). Five machine learning algorithms were developed to predict hypertension incidence: penalized regression Ridge, Lasso, Elastic Net (EN), random survival forest (RSF), and gradient boosting (GB), along with the conventional Cox PH model. The predictive performance of the models was assessed using C-index. The performance of machine learning algorithms was observed, similar to the conventional Cox PH model. Average C-indexes were 0.78, 0.78, 0.78, 0.76, 0.76, and 0.77 for Ridge, Lasso, EN, RSF, GB and Cox PH, respectively. Important features associated with each model were also presented. Our study findings demonstrate little predictive performance difference between machine learning algorithms and the conventional Cox PH regression model in predicting hypertension incidence. In a moderate dataset with a reasonable number of features, conventional regression-based models perform similar to machine learning algorithms with good predictive accuracy.

引用

页数：13

共 50 条

[41] Predicting Language Difficulties in Middle Childhood From Early Developmental Milestones: A Comparison of Traditional Regression and Machine Learning Techniques
Armstrong, Rebecca
Symons, Martyn
Scott, James G.
Arnott, Wendy L.
Copland, David A.
McMahon, Katie L.
Whitehouse, Andrew J. O.
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2018, 61 (08): : 1926 - 1944
[42] Comparison of regression-based and machine learning techniques to explain alpha diversity of fish communities in streams of central and eastern India
Mondal, Rubina
Bhat, Anuradha
ECOLOGICAL INDICATORS, 2021, 129
[43] Machine Learning Models are More Accurate Than Regression-based Models for Predicting Functional Impairment Risk in Acute Ischemic Stroke.
Alaka, Shakiru A.
Brobbey, Anita
Menon, Bijoy K.
Williamson, Tyler
Goyal, Mayank
Demchuk, Andrew M.
Hill, Michael D.
Sajobi, Tolulope
STROKE, 2019, 50
[44] Soft computing techniques for predicting the compressive strength properties of fly ash geopolymer concrete using regression-based machine learning approaches
Philip S.
Nidhi M.
Nakkeeran G.
Journal of Building Pathology and Rehabilitation, 2024, 9 (2)
[45] Comparison of deep learning models to traditional Cox regression in predicting survival of colon cancer: Based on the SEER database
Qu, Zihan
Wang, Yashan
Guo, Dingjie
He, Guangliang
Sui, Chuanying
Duan, Yuqing
Zhang, Xin
Meng, Hengyu
Lan, Linwei
Liu, Xin
JOURNAL OF GASTROENTEROLOGY AND HEPATOLOGY, 2024,
[46] Geometrical positioning surveying-based features for BEOL line-end-pull-back modeling using regression-based machine-learning
Hamed, Ahmed Hamed Fathi
Hegazy, Hazem
El-Sewefy, Omar
Dessouky, Mohamed
Salem, Ashraf
JOURNAL OF MICRO-NANOPATTERNING MATERIALS AND METROLOGY-JM3, 2023, 22 (02):
[47] Statistical comparison of simple and machine learning based land use and land cover classification algorithms: A case study
Rawat, K. S.
Kumar, S.
Garg, N.
JOURNAL OF WATER MANAGEMENT MODELING, 2024, 32
[48] Comparison of Machine Learning and Logic Regression Algorithms for Predicting Lymph Node Metastasis in Patients with Gastric Cancer: A two-Center Study
Lu, Tong
Fang, Yu
Liu, Haonan
Chen, Chong
Li, Taotao
Lu, Miao
Song, Daqing
TECHNOLOGY IN CANCER RESEARCH & TREATMENT, 2024, 23
[49] Predicting the Need for Blood Transfusions in Cardiac Surgery: A Comparison between Machine Learning Algorithms and Established Risk Scores in the Brazilian Population
da Cunha, Cristiano Berardo Carneiro
Lima, Tiago Andrade
Ferraz, Diogo Luiz de Magalhaes
Silva, Igor Tiago Correia
Santiago, Matheus Kennedy Dionisio
Sena, Gabrielle Ribeiro
Monteiro, Veronica Soares
Andrade, Livia Barbosa
BRAZILIAN JOURNAL OF CARDIOVASCULAR SURGERY, 2024, 39 (02)
[50] Modeling the Leaf Area Index of Inner Mongolia Grassland Based on Machine Learning Regression Algorithms Incorporating Empirical Knowledge
Shen, Beibei
Ding, Lei
Ma, Leichao
Li, Zhenwang
Pulatov, Alim
Kulenbekov, Zheenbek
Chen, Jiquan
Mambetova, Saltanat
Hou, Lulu
Xu, Dawei
Wang, Xu
Xin, Xiaoping
REMOTE SENSING, 2022, 14 (17)

← 1 2 3 4 5 →