Data mining for censored time-to-event data: a Bayesian network model for predicting cardiovascular risk from electronic health record data

被引:50
|
作者
Bandyopadhyay, Sunayan [1 ]
Wolfson, Julian [2 ]
Vock, David M. [2 ]
Vazquez-Benitez, Gabriela [4 ]
Adomavicius, Gediminas [3 ]
Elidrisi, Mohamed [1 ]
Johnson, Paul E. [3 ]
O'Connor, Patrick J. [4 ]
机构
[1] Univ Minnesota, Dept Comp Sci, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
[3] Univ Minnesota, Carlson Sch Management, Dept Informat & Decis Sci, Minneapolis, MN 55455 USA
[4] HealthPartners Inst Educ & Res, Bloomington, MN USA
关键词
Bayesian networks; Electronic health data; Survival analysis; Mining censored data; Inverse probability of censoring weights; Risk prediction; Medical decision support; GENETIC ALGORITHMS; DECISION-SUPPORT; UNITED-KINGDOM; BLOOD-PRESSURE; HEART-DISEASE; SURVIVAL; REGRESSION; VALIDATION; SCORE; MANAGEMENT;
D O I
10.1007/s10618-014-0386-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Models for predicting the risk of cardiovascular (CV) events based on individual patient characteristics are important tools for managing patient care. Most current and commonly used risk prediction models have been built from carefully selected epidemiological cohorts. However, the homogeneity and limited size of such cohorts restrict the predictive power and generalizability of these risk models to other populations. Electronic health data (EHD) from large health care systems provide access to data on large, heterogeneous, and contemporaneous patient populations. The unique features and challenges of EHD, including missing risk factor information, non-linear relationships between risk factors and CV event outcomes, and differing effects from different patient subgroups, demand novel machine learning approaches to risk model development. In this paper, we present a machine learning approach based on Bayesian networks trained on EHD to predict the probability of having a CV event within 5 years. In such data, event status may be unknown for some individuals, as the event time is right-censored due to disenrollment and incomplete follow-up. Since many traditional data mining methods are not well-suited for such data, we describe how to modify both modeling and assessment techniques to account for censored observation times. We show that our approach can lead to better predictive performance than the Cox proportional hazards model (i.e., a regression-based approach commonly used for censored, time-to-event data) or a Bayesian network with ad hoc approaches to right-censoring. Our techniques are motivated by and illustrated on data from a large US Midwestern health care system.
引用
收藏
页码:1033 / 1069
页数:37
相关论文
共 50 条
  • [31] ESTIMATING EVENT-SPECIFIC PROBABILITIES AND CONDITIONAL TIME-TO-EVENT DISTRIBUTIONS FROM CENSORED COMPETING RISKS DATA
    Degeling, K.
    Franchini, F.
    IJzerman, M.
    Fedyashov, V
    VALUE IN HEALTH, 2022, 25 (07) : S527 - S527
  • [32] PREDICTING BACTEREMIA USING ELECTRONIC HEALTH RECORD DATA
    Lonjers, Zachary
    Bhavani, Sivasubramanium
    Carey, Kyle
    Gilbert, Emily
    Afshar, Majid
    Churpek, Matthew
    CHEST, 2019, 156 (04) : 1607A - 1607A
  • [33] A Bayesian quantile joint modeling of multivariate longitudinal and time-to-event data
    Kundu, Damitri
    Krishnan, Shekhar
    Gogoi, Manash Pratim
    Das, Kiranmoy
    LIFETIME DATA ANALYSIS, 2024, 30 (03) : 680 - 699
  • [34] Predicting Inpatient Medication Orders From Electronic Health Record Data
    Rough, Kathryn
    Dai, Andrew M.
    Zhang, Kun
    Xue, Yuan
    Vardoulakis, Laura M.
    Cui, Claire
    Butte, Atul J.
    Howell, Michael D.
    Rajkomar, Alvin
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2020, 108 (01) : 145 - 154
  • [35] Bayesian functional joint models for multivariate longitudinal and time-to-event data
    Li, Kan
    Luo, Sheng
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 129 : 14 - 29
  • [36] τ-Inflated Beta Regression Model for Estimating τ-Restricted Means and Event-Free Probabilities for Censored Time-to-Event Data
    Wang, Yizhuo
    Murray, Susan
    BIOMETRICAL JOURNAL, 2024, 66 (08)
  • [37] Non-parametric Bayesian Intensity Model: Exploring Time-to-Event Data on Two Time Scales
    Harkanen, Tommi
    But, Anna
    Haukka, Jari
    SCANDINAVIAN JOURNAL OF STATISTICS, 2017, 44 (03) : 798 - 814
  • [38] Estimation of Conditional Mixture Weibull Distribution with Right Censored Data Using Neural Network for Time-to-Event Analysis
    Bennis, Achraf
    Mouysset, Sandrine
    Serrurier, Mathieu
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 687 - 698
  • [39] Combining Dynamic Predictions From Joint Models for Longitudinal and Time-to-Event Data Using Bayesian Model Averaging
    Rizopoulos, Dimitris
    Hatfield, Laura A.
    Carlin, Bradley P.
    Takkenberg, Johanna J. M.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (508) : 1385 - 1397
  • [40] Parallel data mining of Bayesian Networks from Telecommunications Network data
    Sterritt, R
    Adamson, K
    Shapcott, CM
    Curran, EP
    PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 2000, 1800 : 415 - 422