A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients

被引:11
|
作者
Zhang, Lingjiao [1 ]
Ding, Xiruo [2 ]
Ma, Yanyuan [3 ]
Muthu, Naveen [4 ]
Ajmal, Imran [2 ]
Moore, Jason H. [1 ]
Herman, Daniel S. [2 ]
Chen, Jinbo [1 ]
机构
[1] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
[2] Univ Penn, Dept Pathol & Lab Med, Philadelphia, PA USA
[3] Penn State Univ, Dept Stat, Philadelphia, PA USA
[4] Univ Penn, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
关键词
electronic health record; phenotyping; maximum likelihood; anchor variable; phenotype prevalence; PRIMARY ALDOSTERONISM; PRIMARY-CARE; PREVALENCE; CHALLENGES; QUALITY; MODELS; IMPACT;
D O I
10.1093/jamia/ocz170
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Phenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls. Materials and Methods: Our framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotypingmodels, and compare model performance with existing algorithms. Results: Our method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled. Discussion: Upon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models. Conclusions: Our proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.
引用
收藏
页码:119 / 126
页数:8
相关论文
共 50 条
  • [1] Diabetes Phenotyping Using the Electronic Health Record
    Himali M. Weerahandi
    Leora I. Horwitz
    Saul B. Blecker
    [J]. Journal of General Internal Medicine, 2020, 35 : 3716 - 3718
  • [2] Diabetes Phenotyping Using the Electronic Health Record
    Weerahandi, Himali M.
    Horwitz, Leora I.
    Blecker, Saul B.
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2020, 35 (12) : 3716 - 3718
  • [3] Testing calibration of phenotyping models using positive-only electronic health record data
    Zhang, Lingjiao
    Ma, Yanyuan
    Herman, Daniel
    Chen, Jinbo
    [J]. BIOSTATISTICS, 2022, 23 (03) : 844 - 859
  • [4] DIABETES PHENOTYPING USING THE ELECTRONIC MEDICAL RECORD
    Weerahandi, Himali
    Hoang-Long Huynh
    Shariff, Amal
    Attia, Jonveen
    Horwitz, Leora I.
    Blecker, Saul
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2018, 33 : S158 - S158
  • [5] Accuracy of phenotyping chronic rhinosinusitis in the electronic health record
    Hsu, Joy
    Pacheco, Jennifer A.
    Stevens, Whitney W.
    Smith, Maureen E.
    Avila, Pedro C.
    [J]. AMERICAN JOURNAL OF RHINOLOGY & ALLERGY, 2014, 28 (02) : 140 - 144
  • [6] USING ELECTRONIC HEALTH RECORD DATA FOR COHORT DISCOVERY AND PHENOTYPING OF DEVELOPMENTAL LANGUAGE DISORDER
    Nitin, Rachana
    Walters, Courtney
    Boorom, Olivia
    Margulis, Katherine
    Davis, Lea
    Below, Jennifer
    Camarata, Stephen
    Gordon, Reyna
    [J]. EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2019, 29 : S205 - S205
  • [7] sureLDA: A multidisease automated phenotyping method for the electronic health record
    Ahuja, Yuri
    Zhou, Doudou
    He, Zeling
    Sun, Jiehuan
    Castro, Victor M.
    Gainer, Vivian
    Murphy, Shawn N.
    Hong, Chuan
    Cai, Tianxi
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (08) : 1235 - 1243
  • [8] ITERATING TOWARDS PRECISION PHENOTYPING OF SCHIZOPHRENIA IN THE ELECTRONIC HEALTH RECORD
    Lake, Allison M.
    Reddy, India A.
    Straub, Peter
    Davis, Lea K.
    [J]. EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2023, 75 : S213 - S213
  • [9] POSITIVE AND NEGATIVE AFFECT - A MAXIMUM-LIKELIHOOD APPROACH
    BENIN, MH
    STOCK, WA
    OKUN, MA
    [J]. SOCIAL INDICATORS RESEARCH, 1988, 20 (02) : 165 - 175
  • [10] Data Driven Phenotyping of Patients With Heart Failure using a Deeplearning Cluster Representation of Echocardiographic and Electronic Health Record Data
    Cerna, Alvaro E. Ulloa
    Wehner, Gregory
    Hartzel, Dustin N.
    Haggerty, Christopher
    Fornwalt, Brandon
    [J]. CIRCULATION, 2017, 136