Machine learning approaches for electronic health records phenotyping: a methodical review

被引:40
|
作者
Yang, Siyue [1 ]
Varghese, Paul [2 ]
Stephenson, Ellen [3 ]
Tu, Karen [3 ]
Gronsbell, Jessica [1 ,3 ,4 ,5 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
[2] Verily Life Sci, Cambridge, MA USA
[3] Univ Toronto, Dept Family & Community Med, Toronto, ON, Canada
[4] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[5] Univ Toronto, Dept Stat Sci, 700 Univ Ave, Toronto, ON M5G 1Z5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
electronic health records; phenotyping; cohort identification; machine learning; CLINICAL-TRIALS; INFORMATION; VALIDATION; ALGORITHMS; EXTRACTION; SELECTION; MODEL; TEXT;
D O I
10.1093/jamia/ocac216
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. Materials and methods We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. Results Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. Discussion Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. Conclusion Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
引用
收藏
页码:367 / 381
页数:15
相关论文
共 50 条
  • [1] A Review of Automatic Phenotyping Approaches using Electronic Health Records
    Alzoubi, Hadeel
    Alzubi, Raid
    Ramzan, Naeem
    West, Daune
    Al-Hadhrami, Tawfik
    Alazab, Mamoun
    ELECTRONICS, 2019, 8 (11)
  • [2] The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data
    Ding, Daisy Yi
    Simpson, Chloe
    Pfohl, Stephen
    Kale, Dave C.
    Jung, Kenneth
    Shah, Nigam H.
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019, 2019, : 18 - 29
  • [3] Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
    Fernandez-Gutierrez, Fabiola
    Kennedy, Jonathan I.
    Cooksey, Roxanne
    Atkinson, Mark
    Choy, Ernest
    Brophy, Sinead
    Huo, Lin
    Zhou, Shang-Ming
    DIAGNOSTICS, 2021, 11 (10)
  • [4] Interpretable Phenotyping for Electronic Health Records
    Allen, Christine
    Hu, Juhua
    Kumar, Vikas
    Ahmad, Muhammad Aurangzeb
    Teredesai, Ankur
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 161 - 170
  • [5] Ensemble machine learning methods in screening electronic health records: A scoping review
    Stevens, Christophe A. T.
    Lyons, Alexander R. M.
    Dharmayat, Kanika, I
    Mahani, Alireza
    Ray, Kausik K.
    Vallejo-Vaz, Antonio J.
    Sharabiani, Mansour T. A.
    DIGITAL HEALTH, 2023, 9
  • [6] Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
    Abraham, Abin
    Le, Brian
    Kosti, Idit
    Straub, Peter
    Velez-Edwards, Digna R.
    Davis, Lea K.
    Newton, J. M.
    Muglia, Louis J.
    Rokas, Antonis
    Bejan, Cosmin A.
    Sirota, Marina
    Capra, John A.
    BMC MEDICINE, 2022, 20 (01)
  • [7] Dense phenotyping from electronic health records enables machine learning-based prediction of preterm birth
    Abin Abraham
    Brian Le
    Idit Kosti
    Peter Straub
    Digna R. Velez-Edwards
    Lea K. Davis
    J. M. Newton
    Louis J. Muglia
    Antonis Rokas
    Cosmin A. Bejan
    Marina Sirota
    John A. Capra
    BMC Medicine, 20
  • [8] Machine Learning and Electronic Health Records: A Paradigm Shift
    Adkins, Daniel E.
    AMERICAN JOURNAL OF PSYCHIATRY, 2017, 174 (02): : 93 - 94
  • [9] MACHINE LEARNING APPROACHES TOWARDS IDENTIFICATION OF PHENOTYPES IN VARIOUS DISEASES USING ELECTRONIC HEALTH RECORDS
    Kumar, A.
    Pradhan, H.
    Adhikary, R. R.
    VALUE IN HEALTH, 2022, 25 (12) : S372 - S373
  • [10] Relational machine learning for electronic health record-driven phenotyping
    Peissig, Peggy L.
    Costa, Vitor Santos
    Caldwell, Michael D.
    Rottscheit, Carla
    Berg, Richard L.
    Mendonca, Eneida A.
    Page, David
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 52 : 260 - 270