Machine learning approaches for electronic health records phenotyping: a methodical review

被引:40
|
作者
Yang, Siyue [1 ]
Varghese, Paul [2 ]
Stephenson, Ellen [3 ]
Tu, Karen [3 ]
Gronsbell, Jessica [1 ,3 ,4 ,5 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
[2] Verily Life Sci, Cambridge, MA USA
[3] Univ Toronto, Dept Family & Community Med, Toronto, ON, Canada
[4] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[5] Univ Toronto, Dept Stat Sci, 700 Univ Ave, Toronto, ON M5G 1Z5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
electronic health records; phenotyping; cohort identification; machine learning; CLINICAL-TRIALS; INFORMATION; VALIDATION; ALGORITHMS; EXTRACTION; SELECTION; MODEL; TEXT;
D O I
10.1093/jamia/ocac216
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. Materials and methods We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. Results Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. Discussion Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. Conclusion Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
引用
收藏
页码:367 / 381
页数:15
相关论文
共 50 条
  • [41] A review of approaches to identifying patient phenotype cohorts using electronic health records
    Shivade, Chaitanya
    Raghavan, Preethi
    Fosler-Lussier, Eric
    Embi, Peter J.
    Elhadad, Noemie
    Johnson, Stephen B.
    Lai, Albert M.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (02) : 221 - 230
  • [42] THE USE OF MACHINE LEARNING IN ELECTRONIC HEALTH RECORDS DISEASE ANALYSIS: AN UPDATED PERSPECTIVE
    Cossio, C. M.
    Gilardino, R.
    VALUE IN HEALTH, 2022, 25 (07) : S576 - S577
  • [43] Machine learning identifies long COVID patterns from electronic health records
    Wang, Fei
    NATURE MEDICINE, 2023, 29 (01) : 47 - 48
  • [44] A machine learning approach to leveraging electronic health records for enhanced omics analysis
    Mataraso, Samson J.
    Espinosa, Camilo A.
    Seong, David
    Reincke, S. Momsen
    Berson, Eloise
    Reiss, Jonathan D.
    Kim, Yeasul
    Ghanem, Marc
    Shu, Chi-Hung
    James, Tomin
    Tan, Yuqi
    Shome, Sayane
    Stelzer, Ina A.
    Feyaerts, Dorien
    Wong, Ronald J.
    Shaw, Gary M.
    Angst, Martin S.
    Gaudilliere, Brice
    Stevenson, David K.
    Aghaeepour, Nima
    NATURE MACHINE INTELLIGENCE, 2025, 7 (02) : 293 - 306
  • [45] Machine learning for suicide risk prediction in children and adolescents with electronic health records
    Chang Su
    Robert Aseltine
    Riddhi Doshi
    Kun Chen
    Steven C. Rogers
    Fei Wang
    Translational Psychiatry, 10
  • [46] Predicting the Risk of Inpatient Hypoglycemia With Machine Learning Using Electronic Health Records
    Ruan, Yue
    Bellot, Alexis
    Moysova, Zuzana
    Tan, Garry D.
    Lumb, Alistair
    Davies, Jim
    van der Schaar, Mihaela
    Rea, Rustam
    DIABETES CARE, 2020, 43 (07) : 1504 - 1511
  • [47] Using Electronic Health Records and Machine Learning to Predict Incident Psychiatric Hospitalization
    DeFerio, Joseph
    Banerjee, Samprit
    Alexopoulos, George
    Pathak, Jyotishman
    BIOLOGICAL PSYCHIATRY, 2020, 87 (09) : S68 - S69
  • [48] Postprediction Inference for Clinical Characteristics Extracted With Machine Learning on Electronic Health Records
    Sondhi, Arjun
    Rich, Alexander S.
    Wang, Siruo
    Leek, Jeffery T.
    JCO CLINICAL CANCER INFORMATICS, 2023, 7 : e2200174
  • [49] Postprediction Inference for Clinical Characteristics Extracted With Machine Learning on Electronic Health Records
    Sondhi, Arjun
    Rich, Alexander S.
    Wang, Siruo
    Leek, Jeffery T.
    JCO CLINICAL CANCER INFORMATICS, 2023, 7