Data mining for censored time-to-event data: a Bayesian network model for predicting cardiovascular risk from electronic health record data

被引:50
|
作者
Bandyopadhyay, Sunayan [1 ]
Wolfson, Julian [2 ]
Vock, David M. [2 ]
Vazquez-Benitez, Gabriela [4 ]
Adomavicius, Gediminas [3 ]
Elidrisi, Mohamed [1 ]
Johnson, Paul E. [3 ]
O'Connor, Patrick J. [4 ]
机构
[1] Univ Minnesota, Dept Comp Sci, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
[3] Univ Minnesota, Carlson Sch Management, Dept Informat & Decis Sci, Minneapolis, MN 55455 USA
[4] HealthPartners Inst Educ & Res, Bloomington, MN USA
关键词
Bayesian networks; Electronic health data; Survival analysis; Mining censored data; Inverse probability of censoring weights; Risk prediction; Medical decision support; GENETIC ALGORITHMS; DECISION-SUPPORT; UNITED-KINGDOM; BLOOD-PRESSURE; HEART-DISEASE; SURVIVAL; REGRESSION; VALIDATION; SCORE; MANAGEMENT;
D O I
10.1007/s10618-014-0386-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Models for predicting the risk of cardiovascular (CV) events based on individual patient characteristics are important tools for managing patient care. Most current and commonly used risk prediction models have been built from carefully selected epidemiological cohorts. However, the homogeneity and limited size of such cohorts restrict the predictive power and generalizability of these risk models to other populations. Electronic health data (EHD) from large health care systems provide access to data on large, heterogeneous, and contemporaneous patient populations. The unique features and challenges of EHD, including missing risk factor information, non-linear relationships between risk factors and CV event outcomes, and differing effects from different patient subgroups, demand novel machine learning approaches to risk model development. In this paper, we present a machine learning approach based on Bayesian networks trained on EHD to predict the probability of having a CV event within 5 years. In such data, event status may be unknown for some individuals, as the event time is right-censored due to disenrollment and incomplete follow-up. Since many traditional data mining methods are not well-suited for such data, we describe how to modify both modeling and assessment techniques to account for censored observation times. We show that our approach can lead to better predictive performance than the Cox proportional hazards model (i.e., a regression-based approach commonly used for censored, time-to-event data) or a Bayesian network with ad hoc approaches to right-censoring. Our techniques are motivated by and illustrated on data from a large US Midwestern health care system.
引用
收藏
页码:1033 / 1069
页数:37
相关论文
共 50 条
  • [1] Data mining for censored time-to-event data: a Bayesian network model for predicting cardiovascular risk from electronic health record data
    Sunayan Bandyopadhyay
    Julian Wolfson
    David M. Vock
    Gabriela Vazquez-Benitez
    Gediminas Adomavicius
    Mohamed Elidrisi
    Paul E. Johnson
    Patrick J. O’Connor
    Data Mining and Knowledge Discovery, 2015, 29 : 1033 - 1069
  • [2] Differentiable sorting for censored time-to-event data
    Vauvelle, Andre
    Wild, Benjamin
    Eils, Roland
    Denaxas, Spiros
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] A Bayesian model for time-to-event data with informative censoring
    Kaciroti, Niko A.
    Raghunathan, Trivellore E.
    Taylor, Jeremy M. G.
    Julius, Stevo
    BIOSTATISTICS, 2012, 13 (02) : 341 - 354
  • [4] Approximation of Bayesian models for time-to-event data
    Catalano, Marta
    Lijoi, Antonio
    Prunster, Igor
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (02): : 3366 - 3395
  • [5] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Zhao, Juan
    Feng, QiPing
    Wu, Patrick
    Lupu, Roxana A.
    Wilke, Russell A.
    Wells, Quinn S.
    Denny, Joshua C.
    Wei, Wei-Qi
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [6] Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
    Juan Zhao
    QiPing Feng
    Patrick Wu
    Roxana A. Lupu
    Russell A. Wilke
    Quinn S. Wells
    Joshua C. Denny
    Wei-Qi Wei
    Scientific Reports, 9
  • [7] An ensemble method for interval-censored time-to-event data
    Yao, Weichi
    Frydman, Halina
    Simonoff, Jeffrey S.
    BIOSTATISTICS, 2021, 22 (01) : 198 - 213
  • [8] Estimation of a Concordance Probability for Doubly Censored Time-to-Event Data
    Hayashi K.
    Shimizu Y.
    Statistics in Biosciences, 2018, 10 (3) : 546 - 567
  • [9] A Bayesian semiparametric partially PH model for clustered time-to-event data
    Nipoti, Bernardo
    Jara, Alejandro
    Guindani, Michele
    SCANDINAVIAN JOURNAL OF STATISTICS, 2018, 45 (04) : 1016 - 1035
  • [10] Predicting Hospitalizations From Electronic Health Record Data
    Morawski, Kyle
    Dvorkis, Yoni
    Monsen, Craig B.
    AMERICAN JOURNAL OF MANAGED CARE, 2020, 26 (01): : E7 - +