A strategy for validation of variables derived from large-scale electronic health record data

被引:14
|
作者
Liu, Lin [1 ,2 ]
Bustamante, Ranier [2 ]
Earles, Ashley [3 ]
Demb, Joshua [2 ]
Messer, Karen [2 ]
Gupta, Samir [1 ,2 ]
机构
[1] VA San Diego Healthcare Syst, 3500 La Jolla Village Dr, San Diego, CA 92161 USA
[2] Univ Calif San Diego, 9500 Gilman Dr, La Jolla, CA 92093 USA
[3] Vet Med Res Fdn, 3350 La Jolla Village Dr, San Diego, CA 92161 USA
基金
美国国家卫生研究院;
关键词
Electronic phenotyping; Large-scale electronic health records; Data abstraction validation; Sample size; Positive predictive value; Negative predictive value; IDENTIFY PATIENTS; SAMPLE-SIZE; CODING ALGORITHM; DISEASE; ASTHMA;
D O I
10.1016/j.jbi.2021.103879
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Purpose: Standardized approaches for rigorous validation of phenotyping from large-scale electronic health record (EHR) data have not been widely reported. We proposed a methodologically rigorous and efficient approach to guide such validation, including strategies for sampling cases and controls, determining sample sizes, estimating algorithm performance, and terminating the validation process, hereafter referred to as the San Diego Approach to Variable Validation (SDAVV). Methods: We propose sample size formulae which should be used prior to chart review, based on pre-specified critical lower bounds for positive predictive value (PPV) and negative predictive value (NPV). We also propose a stepwise strategy for iterative algorithm development/validation cycles, updating sample sizes for data abstraction until both PPV and NPV achieve target performance. Results: We applied the SDAVV to a Department of Veterans Affairs study in which we created two phenotyping algorithms, one for distinguishing normal colonoscopy cases from abnormal colonoscopy controls and one for identifying aspirin exposure. Estimated PPV and NPV both reached 0.970 with a 95% confidence lower bound of 0.915, estimated sensitivity was 0.963 and specificity was 0.975 for identifying normal colonoscopy cases. The phenotyping algorithm for identifying aspirin exposure reached a PPV of 0.990 (a 95% lower bound of 0.950), an NPV of 0.980 (a 95% lower bound of 0.930), and sensitivity and specificity were 0.960 and 1.000. Conclusions: A structured approach for prospectively developing and validating phenotyping algorithms from large-scale EHR data can be successfully implemented, and should be considered to improve the quality of "big data" research.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Assessing primary health care readiness for large-scale electronic health record system implementation: Project team perspective
    Alzghaibi, Haitham
    Alharbi, Ali H.
    Mughal, Yasir H.
    Alwheeb, Mohammed H.
    Alhlayl, Adel S.
    HEALTH INFORMATICS JOURNAL, 2023, 29 (01)
  • [32] Introduction to instrumental variables and their application to large-scale assessment data
    Pokropek A.
    Large-scale Assessments in Education, 4 (1)
  • [33] Explaining the inexplicable: Irregularities in electronic health record derived data
    Delgado, Megan
    Dard, Sofia
    Funk, Michele Jonsson
    Carey, Timothy
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 337 - 337
  • [34] Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from the UK Biobank Stroke Outcomes Group
    Woodfield, Rebecca
    Grant, Ian
    Sudlow, Cathie L. M.
    PLOS ONE, 2015, 10 (10):
  • [35] Cardiometabolic disease, comorbidities and risk of death: findings using data from large-scale electronic health records
    Canoy, D.
    Zottoli, M.
    Tran, J.
    Ramakrishnan, R.
    Hasseine, A.
    Nazarzadeh, M.
    Rao, S.
    Li, Y.
    Salimi-Khorshidi, G.
    Norton, R.
    Rahimi, K.
    EUROPEAN HEART JOURNAL, 2020, 41 : 2848 - 2848
  • [36] Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data
    Klann, Jeffrey G.
    Estiri, Hossein
    Weber, Griffin M.
    Moal, Bertrand
    Avillach, Paul
    Hong, Chuan
    Tan, Amelia L. M.
    Beaulieu-Jones, Brett K.
    Castro, Victor
    Maulhardt, Thomas
    Geva, Alon
    Malovini, Alberto
    South, Andrew M.
    Visweswaran, Shyam
    Morris, Michele
    Samayamuthu, Malarkodi J.
    Omenn, Gilbert S.
    Ngiam, Kee Yuan
    Mandl, Kenneth D.
    Boeker, Martin
    Olson, Karen L.
    Mowery, Danielle L.
    Follett, Robert W.
    Hanauer, David A.
    Bellazzi, Riccardo
    Moore, Jason H.
    Loh, Ne-Hooi Will
    Bell, Douglas S.
    Wagholikar, Kavishwar B.
    Chiovato, Luca
    Tibollo, Valentina
    Rieg, Siegbert
    Li, Anthony L. L. J.
    Jouhet, Vianney
    Schriver, Emily
    Xia, Zongqi
    Hutch, Meghan
    Luo, Yuan
    Kohane, Isaac S.
    Brat, Gabriel A.
    Murphy, Shawn N.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (07) : 1411 - 1420
  • [37] Validation of multisource electronic health record data: an application to blood transfusion data
    Loan R. van Hoeven
    Martine C. de Bruijne
    Peter F. Kemper
    Maria M.W. Koopman
    Jan M.M. Rondeel
    Anja Leyte
    Hendrik Koffijberg
    Mart P. Janssen
    Kit C.B. Roes
    BMC Medical Informatics and Decision Making, 17
  • [38] Validation of multisource electronic health record data: an application to blood transfusion data
    van Hoeven, Loan R.
    de Bruijne, Martine C.
    Kemper, Peter F.
    Koopman, Maria M. W.
    Rondeel, Jan M. M.
    Leyte, Anja
    Koffijberg, Hendrik
    Janssen, Mart P.
    Roes, Kit C. B.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17
  • [39] Underdiagnosis Prediction Fingerprint for Antiphospholipid Syndrome Derived from Electronic Health Record Data
    Balczewski, Emily
    Ambati, Amala
    Liang, Wenying
    Madison, Jacqueline
    Zuo, Yu
    Singh, Karandeep
    Knight, Jason
    ARTHRITIS & RHEUMATOLOGY, 2024, 76 : 249 - 251
  • [40] How to organize for a large-scale openEHR-based Electronic Patient Record
    Ulriksen, Gro-Hilde
    Pedersen, Rune
    Wynn, Rolf
    Ellingsen, Gunnar
    DIGITAL HEALTHCARE EMPOWERING EUROPEANS, 2015, 210 : 808 - 812