Improving cardiovascular risk prediction through machine learning modelling of irregularly repeated electronic health records

被引:6
|
作者
Li, Chaiquan [1 ]
Liu, Xiaofei [1 ]
Shen, Peng [2 ]
Sun, Yexiang [2 ]
Zhou, Tianjing [1 ]
Chen, Weiye [1 ]
Chen, Qi [2 ]
Lin, Hongbo [2 ]
Tang, Xun [1 ,3 ]
Gao, Pei [1 ,3 ,4 ]
机构
[1] Peking Univ Hlth Sci Ctr, Sch Publ Hlth, Dept Epidemiol & Biostat, 38 Xueyuan Rd, Beijing 100191, Peoples R China
[2] Yinzhou Dist Ctr Dis Control & Prevent, Xueshi Rd 1221, Ningbo 315199, Peoples R China
[3] Peking Univ, Key Lab Epidemiol Major Dis, Minist Educ, 38 Xueyuan Rd, Beijing 100191, Peoples R China
[4] Peking Univ Clin Res Inst, Ctr Real World Evidence Evaluat, 38 Xueyuan Rd, Beijing 100191, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Prediction; Preventive Cardiology; Risk; CHRONIC KIDNEY-DISEASE; BLOOD-PRESSURE; PRIMARY-CARE; VALIDATION; CHOLESTEROL; POPULATION; MORTALITY; COHORT;
D O I
10.1093/ehjdh/ztad058
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Aims Existing electronic health records (EHRs) often consist of abundant but irregular longitudinal measurements of risk factors. In this study, we aim to leverage such data to improve the risk prediction of atherosclerotic cardiovascular disease (ASCVD) by applying machine learning (ML) algorithms, which can allow automatic screening of the population. Methods and results A total of 215 744 Chinese adults aged between 40 and 79 without a history of cardiovascular disease were included (6081 cases) from an EHR-based longitudinal cohort study. To allow interpretability of the model, the predictors of demographic characteristics, medication treatment, and repeatedly measured records of lipids, glycaemia, obesity, blood pressure, and renal function were used. The primary outcome was ASCVD, defined as non-fatal acute myocardial infarction, coronary heart disease death, or fatal and non-fatal stroke. The eXtreme Gradient boosting (XGBoost) algorithm and Least Absolute Shrinkage and Selection Operator (LASSO) regression models were derived to predict the 5-year ASCVD risk. In the validation set, compared with the refitted Chinese guideline-recommended Cox model (i.e. the China-PAR), the XGBoost model had a significantly higher C-statistic of 0.792, (the differences in the C-statistics: 0.011, 0.006-0.017, P < 0.001), with similar results reported for LASSO regression (the differences in the C-statistics: 0.008, 0.005-0.011, P < 0.001). The XGBoost model demonstrated the best calibration performance (men: D-x = 0.598, P = 0.75; women: D-x = 1.867, P = 0.08). Moreover, the risk distribution of the ML algorithms differed from that of the conventional model. The net reclassification improvement rates of XGBoost and LASSO over the Cox model were 3.9% (1.4-6.4%) and 2.8% (0.7-4.9%), respectively. Conclusion Machine learning algorithms with irregular, repeated real-world data could improve cardiovascular risk prediction. They demonstrated significantly better performance for reclassification to identify the high-risk population correctly.
引用
收藏
页码:30 / 40
页数:11
相关论文
共 50 条
  • [1] Machine learning for suicide risk prediction in children and adolescents with electronic health records
    Chang Su
    Robert Aseltine
    Riddhi Doshi
    Kun Chen
    Steven C. Rogers
    Fei Wang
    [J]. Translational Psychiatry, 10
  • [2] Individualized melanoma risk prediction using machine learning with electronic health records
    Wan, G.
    Nguyen, N.
    Yan, B.
    Khattab, S.
    Estiri, H.
    Semenov, Y.
    [J]. JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2024, 144 (08) : S35 - S35
  • [3] Machine learning for suicide risk prediction in children and adolescents with electronic health records
    Su, Chang
    Aseltine, Robert
    Doshi, Riddhi
    Chen, Kun
    Rogers, Steven C.
    Wang, Fei
    [J]. TRANSLATIONAL PSYCHIATRY, 2020, 10 (01)
  • [4] Improvement in Cardiovascular Risk Prediction with Electronic Health Records
    Mindy M. Pike
    Paul A. Decker
    Nicholas B. Larson
    Jennifer L. St. Sauver
    Paul Y. Takahashi
    Véronique L. Roger
    Walter A. Rocca
    Virginia M. Miller
    Janet E. Olson
    Jyotishman Pathak
    Suzette J. Bielinski
    [J]. Journal of Cardiovascular Translational Research, 2016, 9 : 214 - 222
  • [5] Improvement in Cardiovascular Risk Prediction with Electronic Health Records
    Pike, Mindy M.
    Decker, Paul A.
    Larson, Nicholas B.
    St Sauver, Jennifer L.
    Takahashi, Paul Y.
    Roger, Veronique L.
    Rocca, Walter A.
    Miller, Virginia M.
    Olson, Janet E.
    Pathak, Jyotishman
    Bielinski, Suzette J.
    [J]. JOURNAL OF CARDIOVASCULAR TRANSLATIONAL RESEARCH, 2016, 9 (03) : 214 - 222
  • [6] Deep and machine learning models to improve risk prediction of cardiovascular disease using data extraction from electronic health records
    Korsakov, I.
    Gusev, A.
    Kuznetsova, T.
    Gavrilov, D.
    Novitskiy, R.
    [J]. EUROPEAN HEART JOURNAL, 2019, 40 : 1213 - 1213
  • [7] ENSEMBLE MACHINE LEARNING FOR SCREENING CARDIOVASCULAR DISEASES IN ELECTRONIC HEALTH RECORDS
    Stevens, C.
    Mahani, A.
    Ray, K.
    Vallejo-Vaz, A.
    Sharabiani, M.
    [J]. ATHEROSCLEROSIS, 2023, 379 : S194 - S194
  • [8] Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records
    Alhassan, Zakhriya
    Watson, Matthew
    Budgen, David
    Alshammari, Riyad
    Alessa, Ali
    Al Moubayed, Noura
    [J]. JMIR MEDICAL INFORMATICS, 2021, 9 (05)
  • [9] Early prediction of cardiovascular disease using machine learning: Unveiling risk factors from health records
    Deepa, Dr. R.
    Sadu, Vijaya Bhaskar
    Prashant, G. C.
    Sivasamy, A.
    [J]. AIP ADVANCES, 2024, 14 (03)
  • [10] Prediction of Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data
    Ward, Andrew
    Sarraju, Ashish
    Chung, Sukyung
    Palaniappan, Latha
    Scheinker, David
    Rodriguez, Fatima
    [J]. CIRCULATION, 2019, 140