Machine learning approaches for electronic health records phenotyping: a methodical review

被引:40
|
作者
Yang, Siyue [1 ]
Varghese, Paul [2 ]
Stephenson, Ellen [3 ]
Tu, Karen [3 ]
Gronsbell, Jessica [1 ,3 ,4 ,5 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
[2] Verily Life Sci, Cambridge, MA USA
[3] Univ Toronto, Dept Family & Community Med, Toronto, ON, Canada
[4] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[5] Univ Toronto, Dept Stat Sci, 700 Univ Ave, Toronto, ON M5G 1Z5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
electronic health records; phenotyping; cohort identification; machine learning; CLINICAL-TRIALS; INFORMATION; VALIDATION; ALGORITHMS; EXTRACTION; SELECTION; MODEL; TEXT;
D O I
10.1093/jamia/ocac216
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. Materials and methods We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. Results Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. Discussion Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. Conclusion Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
引用
收藏
页码:367 / 381
页数:15
相关论文
共 50 条
  • [21] Using Electronic Health Records and Machine Learning to Predict Postpartum Depression
    Wang, Shuojia
    Pathak, Jyotishman
    Zhang, Yiye
    MEDINFO 2019: HEALTH AND WELLBEING E-NETWORKS FOR ALL, 2019, 264 : 888 - 892
  • [22] Predicting opioid dependence from electronic health records with machine learning
    Ellis, Randall J.
    Wang, Zichen
    Genes, Nicholas
    Ma'ayan, Avi
    BIODATA MINING, 2019, 12 (1)
  • [23] Using Machine Learning and Electronic Health Records to Predict Postpartum Depression
    Zhang, Yiye
    Joly, Rochelle
    Hermann, Alison
    Pathak, Jyotishman
    OBSTETRICS AND GYNECOLOGY, 2020, 135 : 59S - 60S
  • [24] Using machine learning to detect sarcopenia from electronic health records
    Luo, Xiao
    Ding, Haoran
    Broyles, Andrea
    Warden, Stuart J.
    Moorthi, Ranjani N.
    Imel, Erik A.
    DIGITAL HEALTH, 2023, 9
  • [25] A machine learning approach to identifying delirium from electronic health records
    Kim, Jae Hyun
    Hua, May
    Whittington, Robert A.
    Lee, Junghwan
    Liu, Cong
    Ta, Casey N.
    Marcantonio, Edward R.
    Goldberg, Terry E.
    Weng, Chunhua
    JAMIA OPEN, 2022, 5 (02)
  • [26] ENSEMBLE MACHINE LEARNING FOR SCREENING CARDIOVASCULAR DISEASES IN ELECTRONIC HEALTH RECORDS
    Stevens, C.
    Mahani, A.
    Ray, K.
    Vallejo-Vaz, A.
    Sharabiani, M.
    ATHEROSCLEROSIS, 2023, 379 : S194 - S194
  • [27] Descriptive and Predictive Analytics on Electronic Health Records using Machine Learning
    Anandi, V
    Ramesh, M.
    2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [28] Cardea: An Open Automated Machine Learning Framework for Electronic Health Records
    Alnegheimish, Sarah
    Alrashed, Najat
    Aleissa, Faisal
    Althobaiti, Shahad
    Liu, Dongyu
    Alsaleh, Mansour
    Veeramachaneni, Kalyan
    2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020), 2020, : 536 - 545
  • [29] Predicting opioid dependence from electronic health records with machine learning
    Randall J. Ellis
    Zichen Wang
    Nicholas Genes
    Avi Ma’ayan
    BioData Mining, 12
  • [30] Machine Learning-Based Early Prediction of Sepsis Using Electronic Health Records: A Systematic Review
    Islam, Khandaker Reajul
    Prithula, Johayra
    Kumar, Jaya
    Tan, Toh Leong
    Reaz, Mamun Bin Ibne
    Sumon, Md. Shaheenur Islam
    Chowdhury, Muhammad E. H.
    JOURNAL OF CLINICAL MEDICINE, 2023, 12 (17)