A strategy for validation of variables derived from large-scale electronic health record data

被引：14

作者：

Liu, Lin ^{[1
,2
]}

Bustamante, Ranier ^{[2
]}

Earles, Ashley ^{[3
]}

Demb, Joshua ^{[2
]}

Messer, Karen ^{[2
]}

Gupta, Samir ^{[1
,2
]}

机构：

[1] VA San Diego Healthcare Syst, 3500 La Jolla Village Dr, San Diego, CA 92161 USA

[2] Univ Calif San Diego, 9500 Gilman Dr, La Jolla, CA 92093 USA

[3] Vet Med Res Fdn, 3350 La Jolla Village Dr, San Diego, CA 92161 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2021年 / 121卷

基金：

美国国家卫生研究院;

关键词：

Electronic phenotyping; Large-scale electronic health records; Data abstraction validation; Sample size; Positive predictive value; Negative predictive value; IDENTIFY PATIENTS; SAMPLE-SIZE; CODING ALGORITHM; DISEASE; ASTHMA;

D O I：

10.1016/j.jbi.2021.103879

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Purpose: Standardized approaches for rigorous validation of phenotyping from large-scale electronic health record (EHR) data have not been widely reported. We proposed a methodologically rigorous and efficient approach to guide such validation, including strategies for sampling cases and controls, determining sample sizes, estimating algorithm performance, and terminating the validation process, hereafter referred to as the San Diego Approach to Variable Validation (SDAVV). Methods: We propose sample size formulae which should be used prior to chart review, based on pre-specified critical lower bounds for positive predictive value (PPV) and negative predictive value (NPV). We also propose a stepwise strategy for iterative algorithm development/validation cycles, updating sample sizes for data abstraction until both PPV and NPV achieve target performance. Results: We applied the SDAVV to a Department of Veterans Affairs study in which we created two phenotyping algorithms, one for distinguishing normal colonoscopy cases from abnormal colonoscopy controls and one for identifying aspirin exposure. Estimated PPV and NPV both reached 0.970 with a 95% confidence lower bound of 0.915, estimated sensitivity was 0.963 and specificity was 0.975 for identifying normal colonoscopy cases. The phenotyping algorithm for identifying aspirin exposure reached a PPV of 0.990 (a 95% lower bound of 0.950), an NPV of 0.980 (a 95% lower bound of 0.930), and sensitivity and specificity were 0.960 and 1.000. Conclusions: A structured approach for prospectively developing and validating phenotyping algorithms from large-scale EHR data can be successfully implemented, and should be considered to improve the quality of "big data" research.

引用

页数：8

共 50 条

[1] A strategy for validation of variables derived from large-scale electronic health record data
Liu, Lin
Bustamante, Ranier
Earles, Ashley
Demb, Joshua
Messer, Karen
Gupta, Samir
Journal of Biomedical Informatics, 2021, 121
[2] FEASIBILITY AND VALIDATION OF LARGE-SCALE DATA ACQUISITION FROM THE ELECTRONIC HEALTH RECORD TO A SECURE RESEARCH DATABASE FOR NEPHROLITHIASIS
Sui, Wilson
Calvert, Joshua K.
Kavoussi, Nicholas L.
Lewis, Adam
Miller, Nicole L.
Bejan, Cosmin A.
His, Ryan S.
JOURNAL OF UROLOGY, 2020, 203 : E717 - E717
[3] A regression framework to uncover pleiotropy in large-scale electronic health record data
Li, Ruowang
Duan, Rui
Kember, Rachel L.
Rader, Daniel J.
Damrauer, Scott M.
Moore, Jason H.
Chen, Yong
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (10) : 1083 - 1090
[4] Ascertainment of Aspirin Exposure Using Structured and Unstructured Large-scale Electronic Health Record Data
Bustamante, Ranier
Earles, Ashley
Murphy, James D.
Bryant, Alex K.
Patterson, Olga V.
Gawron, Andrew J.
Kaltenbach, Tonya
Whooley, Mary A.
Fisher, Deborah A.
Saini, Sameer D.
Gupta, Samir
Liu, Lin
MEDICAL CARE, 2019, 57 (10) : E60 - E64
[5] Structured Approach for Evaluating Strategies for Cancer Ascertainment Using Large-Scale Electronic Health Record Data
Earles, Ashley
Liu, Lin
Bustamante, Ranier
Coke, Pat
Lynch, Julie
Messer, Karen
Martinez, Maria Elena
Murphy, James D.
Williams, Christina D.
Fisher, Deborah A.
Provenzale, Dawn T.
Gawron, Andrew J.
Kaltenbach, Tonya
Gupta, Samir
JCO CLINICAL CANCER INFORMATICS, 2018, 2 : 1 - 12
[6] Supervised Multi-Specialist Topic Model With Applications on Large-Scale Electronic Health Record Data
Song, Ziyang
Toral, Xavier Sumba
Xu, Yixin
Liu, Aihua
Guo, Liming
Powell, Guido
Verma, Aman
Buckeridge, David
Marelli, Ariane
Li, Yue
12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021), 2021,
[7] LEVERAGING LARGE-SCALE ELECTRONIC HEALTH RECORD (EHR) DATA TO IMPROVE QUALITY MEASUREMENT IN CHILD MENTAL HEALTH CARE
Ramtekkar, Ujjwal
JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2019, 58 (10): : S68 - S68
[8] CREATING A LARGE-SCALE PHYSICALLY INTEGRATED ELECTRONIC HEALTH RECORD DATA SYSTEM TO SUPPORT A LEARNING HEALTHCARE SYSTEM
Dore, D. D.
Ciofani, D.
Davis, S.
Nunes, A. P.
Bradley, J. M.
Seeger, J. D.
Berger, M.
VALUE IN HEALTH, 2017, 20 (05) : A321 - A322
[9] Stratifying risk for dementia onset using large-scale electronic health record data: A retrospective cohort study
McCoy, Thomas H., Jr.
Han, Larry
Pellegrini, Amelia M.
Tanzi, Rudolph E.
Berretta, Sabina
Perlis, Roy H.
ALZHEIMERS & DEMENTIA, 2020, 16 (03) : 531 - 540
[10] Effects of Antidepressants on COVID-19 Outcomes: Retrospective Study on Large-Scale Electronic Health Record Data
Rahman, Mahmudur
Mahi, Atqiya Munawara
Melamed, Rachel
Alam, Mohammad Arif Ul
INTERACTIVE JOURNAL OF MEDICAL RESEARCH, 2023, 12

← 1 2 3 4 5 →