Machine learning approaches for electronic health records phenotyping: a methodical review

被引:40
|
作者
Yang, Siyue [1 ]
Varghese, Paul [2 ]
Stephenson, Ellen [3 ]
Tu, Karen [3 ]
Gronsbell, Jessica [1 ,3 ,4 ,5 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
[2] Verily Life Sci, Cambridge, MA USA
[3] Univ Toronto, Dept Family & Community Med, Toronto, ON, Canada
[4] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[5] Univ Toronto, Dept Stat Sci, 700 Univ Ave, Toronto, ON M5G 1Z5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
electronic health records; phenotyping; cohort identification; machine learning; CLINICAL-TRIALS; INFORMATION; VALIDATION; ALGORITHMS; EXTRACTION; SELECTION; MODEL; TEXT;
D O I
10.1093/jamia/ocac216
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. Materials and methods We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. Results Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. Discussion Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. Conclusion Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
引用
收藏
页码:367 / 381
页数:15
相关论文
共 50 条
  • [31] Applying active learning to high-throughput phenotyping algorithms for electronic health records data
    Chen, Yukun
    Carroll, Robert J.
    Hinz, Eugenia R. McPeek
    Shah, Anushi
    Eyler, Anne E.
    Denny, Joshua C.
    Xu, Hua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (E2) : E253 - E259
  • [32] High Throughput Phenotyping for Dimensional Psychopathology in Electronic Health Records
    McCoy, Thomas H., Jr.
    Yu, Sheng
    Hart, Kamber L.
    Castro, Victor M.
    Brown, Hannah E.
    Rosenquist, James N.
    Doyle, Alysa E.
    Vuijk, Pieter J.
    Cai, Tianxi
    Perlis, Roy H.
    BIOLOGICAL PSYCHIATRY, 2018, 83 (12) : 997 - 1004
  • [33] Machine learning model to predict mental health crises from electronic health records
    Roger Garriga
    Javier Mas
    Semhar Abraha
    Jon Nolan
    Oliver Harrison
    George Tadros
    Aleksandar Matic
    Nature Medicine, 2022, 28 : 1240 - 1248
  • [34] Machine learning model to predict mental health crises from electronic health records
    Garriga, Roger
    Mas, Javier
    Abraha, Semhar
    Nolan, Jon
    Harrison, Oliver
    Tadros, George
    Matic, Aleksandar
    NATURE MEDICINE, 2022, 28 (06) : 1240 - +
  • [35] Bayesian Double Feature Allocation for Phenotyping With Electronic Health Records
    Ni, Yang
    Mueller, Peter
    Ji, Yuan
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (532) : 1620 - 1634
  • [36] Ascertaining and Phenotyping Suicidality at Scale Using Electronic Health Records
    Walsh, Colin
    Ripperger, Michael
    Wilimitis, Drew
    Ahmed, Ryan
    Kang, Jooeun
    Ruderfer, Douglas
    Morley, Theodore
    Bejan, Cosmin
    BIOLOGICAL PSYCHIATRY, 2022, 91 (09) : S30 - S31
  • [37] A Review of Machine Learning and Deep Learning Approaches on Mental Health Diagnosis
    Iyortsuun, Ngumimi Karen
    Kim, Soo-Hyung
    Jhon, Min
    Yang, Hyung-Jeong
    Pant, Sudarshan
    HEALTHCARE, 2023, 11 (03)
  • [38] Disease Prediction Using Graph Machine Learning Based on Electronic Health Data: A Review of Approaches and Trends
    Lu, Haohui
    Uddin, Shahadat
    HEALTHCARE, 2023, 11 (07)
  • [39] Machine learning models for atrial fibrillation detection in primary care using electronic health records: systematic review
    Chalati, Mhd Diaa
    Shirvankar, Chetan
    Rahimi, Samira
    ANNALS OF FAMILY MEDICINE, 2024, 22
  • [40] Machine learning models to detect and predict patient safety events using electronic health records: A systematic review
    Deimazar, Ghasem
    Sheikhtaheri, Abbas
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 180