Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts

被引:8
|
作者
Li, Yikuan [1 ,2 ]
Salimi-Khorshidi, Gholamreza [1 ,2 ]
Rao, Shishir [1 ,2 ]
Canoy, Dexter [1 ,2 ,3 ]
Hassaine, Abdelaali [1 ,2 ]
Lukasiewicz, Thomas [4 ]
Rahimi, Kazem [1 ,2 ,3 ]
Mamouei, Mohammad [1 ,2 ]
机构
[1] Univ Oxford, Oxford Martin Sch, Deep Med, Hayes House,75 George St, Oxford OX1 2BQ, England
[2] Univ Oxford, Nuffield Dept Womens & Reprod Hlth, Med Sci Div, Oxford, England
[3] Oxford Univ Hosp NHS Fdn Trust, NIHR Oxford Biomed Res Ctr, Oxford, England
[4] Univ Oxford, Dept Comp Sci, Oxford, England
来源
基金
英国科研创新办公室;
关键词
Cardiovascular disease risk; Heart Failure; Stroke; Coronary heart disease; Predictive modelling; Data shifts; PROFILE;
D O I
10.1093/ehjdh/ztac061
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
AimsDeep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models.Methods and resultsUsing linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve.ConclusionThe performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated. Graphical AbstractDesign and main results of the model evaluation in the presence of data shift. EHR, electronic health records; HES, hospital episode statistics; HF, heart failure; CHD, coronary heart disease; CPH, COX proportional hazard; ML, machine learning; DL, deep learning; RF, random forest.
引用
收藏
页码:535 / 547
页数:13
相关论文
共 50 条
  • [1] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Zhao, Juan
    Feng, QiPing
    Wu, Patrick
    Lupu, Roxana A.
    Wilke, Russell A.
    Wells, Quinn S.
    Denny, Joshua C.
    Wei, Wei-Qi
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [2] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Juan Zhao
    QiPing Feng
    Patrick Wu
    Roxana A. Lupu
    Russell A. Wilke
    Quinn S. Wells
    Joshua C. Denny
    Wei-Qi Wei
    Scientific Reports, 9
  • [3] External Validation of Postpartum Hemorrhage Prediction Models Using Electronic Health Record Data
    Meyer, Sean R.
    Carver, Alissa
    Joo, Hyeon
    Venkatesh, Kartik K.
    Jelovsek, J. Eric
    Klumpner, Thomas T.
    Singh, Karandeep
    AMERICAN JOURNAL OF PERINATOLOGY, 2024, 41 (05) : 598 - 605
  • [4] Improved Cardiovascular Risk Prediction Using Nonparametric Regression and Electronic Health Record Data
    Kennedy, Edward H.
    Wiitala, Wyndy L.
    Hayward, Rodney A.
    Sussman, Jeremy B.
    MEDICAL CARE, 2013, 51 (03) : 251 - 258
  • [5] The Impact of Longitudinal Data-Completeness of Electronic Health Record Data on the Prediction Performance of Clinical Risk Scores
    Jin, Yinzhu
    Weberpals, Janick G.
    Wang, Shirley V.
    Desai, Rishi J.
    Merola, David
    Lin, Kueiyu Joshua
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2023, 113 (06) : 1359 - 1367
  • [6] Prediction of Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data
    Ward, Andrew
    Sarraju, Ashish
    Chung, Sukyung
    Palaniappan, Latha
    Scheinker, David
    Rodriguez, Fatima
    CIRCULATION, 2019, 140
  • [7] The impact of longitudinal data-completeness of electronic health record (EHR) data on prediction performance of clinical risk scores
    Lin, Joshua
    Jin, Yinzhu
    Schneeweiss, Sebastian
    Merola, Dave
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 : 302 - 302
  • [8] Prediction of Gastrointestinal Tract Cancers Using Longitudinal Electronic Health Record Data
    Read, Andrew J. J.
    Zhou, Wenjing
    Saini, Sameer D. D.
    Zhu, Ji
    Waljee, Akbar K. K.
    CANCERS, 2023, 15 (05)
  • [9] Evaluation of Electronic Health Record-Based Suicide Risk Prediction Models on Contemporary Data
    Walker, Rod L.
    Shortreed, Susan M.
    Ziebell, Rebecca A.
    Johnson, Eric
    Boggs, Jennifer M.
    Lynch, Frances L.
    Daida, Yihe G.
    Ahmedani, Brian K.
    Rossom, Rebecca
    Coleman, Karen J.
    Simon, Gregory E.
    APPLIED CLINICAL INFORMATICS, 2021, 12 (04): : 778 - 787
  • [10] PREDICTION OF GASTROINTESTINAL TRACT CANCERS USING LONGITUDINAL ELECTRONIC HEALTH RECORD DATA
    Read, Andrew J.
    Zhou, Wenjing
    Saini, Sameer D.
    Zhu, Ji
    Waljee, Akbar K.
    GASTROENTEROLOGY, 2022, 162 (07) : S1045 - S1045