Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods

被引:8
|
作者
Thangaraj, Phyllis M. [1 ,2 ]
Kummer, Benjamin R. [3 ]
Lorberbaum, Tal [1 ,2 ]
Elkind, Mitchell S., V [4 ,5 ]
Tatonetti, Nicholas P. [1 ,2 ]
机构
[1] Columbia Univ, Dept Biomed Informat, 622 W 168th St,PH 20, New York, NY 10032 USA
[2] Columbia Univ, Dept Syst Biol, New York, NY 10027 USA
[3] Icahn Sch Med Mt Sinai, Dept Neurol, New York, NY USA
[4] Columbia Univ, Vagelos Coll Phys & Surg, Dept Neurol, New York, NY USA
[5] Columbia Univ, Mailman Sch Publ Hlth, Dept Epidemiol, New York, NY USA
关键词
Phenotyping algorithms; Acute ischemic stroke; Machine learning; Electronic health record studies; BIG DATA; DIAGNOSIS; MODELS; RISK;
D O I
10.1186/s13040-020-00230-x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Accurate identification of acute ischemic stroke (AIS) patient cohorts is essential for a wide range of clinical investigations. Automated phenotyping methods that leverage electronic health records (EHRs) represent a fundamentally new approach cohort identification without current laborious and ungeneralizable generation of phenotyping algorithms. We systematically compared and evaluated the ability of machine learning algorithms and case-control combinations to phenotype acute ischemic stroke patients using data from an EHR. Materials and methods Using structured patient data from the EHR at a tertiary-care hospital system, we built and evaluated machine learning models to identify patients with AIS based on 75 different case-control and classifier combinations. We then estimated the prevalence of AIS patients across the EHR. Finally, we externally validated the ability of the models to detect AIS patients without AIS diagnosis codes using the UK Biobank. Results Across all models, we found that the mean AUROC for detecting AIS was 0.963 +/- 0.0520 and average precision score 0.790 +/- 0.196 with minimal feature processing. Classifiers trained with cases with AIS diagnosis codes and controls with no cerebrovascular disease codes had the best average F1 score (0.832 +/- 0.0383). In the external validation, we found that the top probabilities from a model-predicted AIS cohort were significantly enriched for AIS patients without AIS diagnosis codes (60-150 fold over expected). Conclusions Our findings support machine learning algorithms as a generalizable way to accurately identify AIS patients without using process-intensive manual feature curation. When a set of AIS patients is unavailable, diagnosis codes may be used to train classifier models.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods
    Phyllis M. Thangaraj
    Benjamin R. Kummer
    Tal Lorberbaum
    Mitchell S. V. Elkind
    Nicholas P. Tatonetti
    [J]. BioData Mining, 13
  • [2] Challenges in and Opportunities for Electronic Health Record-Based Data Analysis and Interpretation
    Kim, Michelle Kang
    Rouphael, Carol
    Mcmichael, John
    Welch, Nicole
    Dasarathy, Srinivasan
    [J]. GUT AND LIVER, 2024, 18 (02) : 201 - 208
  • [3] Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping
    Kagawa, Rina
    Shinohara, Emiko
    Imari, Takeshi
    Kawazoe, Yoshimasa
    Ohe, Kazuhiko
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 124 : 90 - 96
  • [4] Granite: Diversified, Sparse Tensor Factorization for Electronic Health Record-Based Phenotyping
    Henderson, Jette
    Ho, Joyce C.
    Kho, Abel N.
    Denny, Joshua C.
    Malin, Bradley A.
    Sun, Jimeng
    Ghosh, Joydeep
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 214 - 223
  • [5] An Automated, Electronic Health Record-Based Algorithm to Classify Ischemic Stroke Etiology
    Lee, Ho-Joon
    Schwamm, Lee
    Kamel, Hooman
    Sansing, Lauren
    Krishnaswamy, Smita
    Zhao, Hongyu
    Krumholz, Harlan
    Sharma, Richa
    [J]. ANNALS OF NEUROLOGY, 2022, 92 : S83 - S84
  • [6] A National, Electronic Health Record-Based Study of Perinatal Hemorrhagic and Ischemic Stroke
    Fraser, Stuart
    Levy, Samantha M.
    Talebi, Yashar
    Savitz, Sean I.
    Zha, Alicia
    Zhu, Gen
    Wu, Hulin
    [J]. JOURNAL OF CHILD NEUROLOGY, 2023, 38 (3-4) : 206 - 215
  • [7] An Automated, Electronic Health Record-based Algorithm To Classify Ischemic Stroke Etiology
    Sharma, Richa
    Lee, Ho-Joon
    Schwamm, Lee H.
    Kamel, Hooman
    Sansing, Lauren H.
    Kim, Jennifer
    Zhao, Hongyu
    Krumholz, Harlan M.
    Sharma, Richa
    [J]. STROKE, 2022, 53
  • [8] Novel Electronic Medical Record-Based Stroke Registry System
    Chang, Chien-Hung
    Lee, Tsong-Hai
    Chang, Yeu-Jhy
    Chang, Ku-Chou
    Shieh, Mengkai
    Shieh, Yao
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2014, : 187 - 188
  • [9] Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review
    Lee, Seungwon
    Doktorchik, Chelsea
    Martin, Elliot Asher
    D'Souza, Adam Giles
    Eastwood, Cathy
    Shaheen, Abdel Aziz
    Naugler, Christopher
    Lee, Joon
    Quan, Hude
    [J]. JMIR MEDICAL INFORMATICS, 2021, 9 (02)
  • [10] Measures of SES for Electronic Health Record-based Research
    Casey, Joan A.
    Pollak, Jonathan
    Glymour, M. Maria
    Mayeda, Elizabeth R.
    Hirsch, Annemarie G.
    Schwartz, Brian S.
    [J]. AMERICAN JOURNAL OF PREVENTIVE MEDICINE, 2018, 54 (03) : 430 - 439