Risk prediction with imperfect survival outcome information from electronic health records

被引:2
|
作者
Hou, Jue [1 ]
Chan, Stephanie F. [1 ]
Wang, Xuan [1 ]
Cai, Tianxi [1 ,2 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Harvard Med Sch, Dept Biomed Informat, Boston, MA USA
关键词
current status data; measurement error; risk prediction; semisupervised learning; FAILURE-TIME REGRESSION; TRANSFORMATION MODELS; EFFICIENT ESTIMATION; HEART-FAILURE; INTERVAL; OBESITY; INDEX;
D O I
10.1111/biom.13599
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow-up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one-step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.
引用
收藏
页码:190 / 202
页数:13
相关论文
共 50 条
  • [1] Risk Prediction With Electronic Health Records
    Goldstein, Benjamin A.
    Navar, Ann Marie
    Pencina, Michael J.
    [J]. JAMA CARDIOLOGY, 2016, 1 (09) : 976 - 977
  • [2] Improvement in Cardiovascular Risk Prediction with Electronic Health Records
    Pike, Mindy M.
    Decker, Paul A.
    Larson, Nicholas B.
    St Sauver, Jennifer L.
    Takahashi, Paul Y.
    Roger, Veronique L.
    Rocca, Walter A.
    Miller, Virginia M.
    Olson, Janet E.
    Pathak, Jyotishman
    Bielinski, Suzette J.
    [J]. JOURNAL OF CARDIOVASCULAR TRANSLATIONAL RESEARCH, 2016, 9 (03) : 214 - 222
  • [3] Melanoma Risk Prediction with Structured Electronic Health Records
    Richter, Aaron N.
    Khoshgoftaar, Taghi M.
    [J]. ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 194 - 199
  • [4] Improvement in Cardiovascular Risk Prediction with Electronic Health Records
    Mindy M. Pike
    Paul A. Decker
    Nicholas B. Larson
    Jennifer L. St. Sauver
    Paul Y. Takahashi
    Véronique L. Roger
    Walter A. Rocca
    Virginia M. Miller
    Janet E. Olson
    Jyotishman Pathak
    Suzette J. Bielinski
    [J]. Journal of Cardiovascular Translational Research, 2016, 9 : 214 - 222
  • [5] Breast Cancer Risk Prediction using Electronic Health Records
    Wu, Yirong
    Burnside, Elizabeth S.
    Cox, Jennifer
    Fan, Jun
    Yuan, Ming
    Yin, Jie
    Peissig, Peggy
    Cobian, Alexander
    Page, David
    Craven, Mark
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 224 - 228
  • [6] Risk Prediction on Electronic Health Records with Prior Medical Knowledge
    Ma, Fenglong
    Gao, Jing
    Suo, Qiuling
    You, Quanzeng
    Zhou, Jing
    Zhang, Aidong
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1910 - 1919
  • [7] Breaches of Health Information: Are Electronic Records Different from Paper Records?
    Sade, Robert M.
    [J]. JOURNAL OF CLINICAL ETHICS, 2010, 21 (01): : 39 - 41
  • [8] EVALUATING RISK-PREDICTION MODELS USING DATA FROM ELECTRONIC HEALTH RECORDS
    Wang, Le
    Shaw, Pamela A.
    Mathelier, Hansie M.
    Kimmel, Stephen E.
    French, Benjamin
    [J]. ANNALS OF APPLIED STATISTICS, 2016, 10 (01): : 286 - 304
  • [9] Transformers for cardiac patient mortality risk prediction from heterogeneous electronic health records
    Emmi Antikainen
    Joonas Linnosmaa
    Adil Umer
    Niku Oksala
    Markku Eskola
    Mark van Gils
    Jussi Hernesniemi
    Moncef Gabbouj
    [J]. Scientific Reports, 13
  • [10] Transformers for cardiac patient mortality risk prediction from heterogeneous electronic health records
    Antikainen, Emmi
    Linnosmaa, Joonas
    Umer, Adil
    Oksala, Niku
    Eskola, Markku
    van Gils, Mark
    Hernesniemi, Jussi
    Gabbouj, Moncef
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)