Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data

被引:46
|
作者
Wong, Jenna [1 ,2 ]
Murray Horwitz, Mara [1 ,2 ]
Zhou, Li [3 ,4 ]
Toh, Sengwee [1 ,2 ]
机构
[1] Harvard Med Sch, Dept Populat Med, 401 Pk Dr,Suite 401 East, Boston, MA 02215 USA
[2] Harvard Pilgrim Hlth Care Inst, 401 Pk Dr,Suite 401 East, Boston, MA 02215 USA
[3] Brigham & Womens Hosp, Div Gen Internal Med & Primary Care, 75 Francis St, Boston, MA 02115 USA
[4] Harvard Med Sch, Boston, MA 02115 USA
基金
美国医疗保健研究与质量局;
关键词
Electronic health records; Machine learning; Health outcomes; Phenotyping; Cohort identification; MEDICAL-RECORDS; NEURAL-NETWORKS; TASK-FORCE; BIG DATA; CLASSIFICATION; TEXT; GUIDELINE; DIAGNOSIS; CRITERIA; COLLEGE;
D O I
10.1007/s40471-018-0165-9
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose of ReviewElectronic health records (EHRs) contain valuable data for identifying health outcomes, but these data also present numerous challenges when creating computable phenotyping algorithms. Machine learning methods could help with some of these challenges. In this review, we discuss four common scenarios that researchers may find helpful for thinking critically about when and for what tasks machine learning may be used to identify health outcomes from EHR data.Recent FindingsWe first consider the conditions in which machine learning may be especially useful with respect to two dimensions of a health outcome: (1) the characteristics of its diagnostic criteria and (2) the format in which its diagnostic data are usually stored within EHR systems. In the first dimension, we propose that for health outcomes with diagnostic criteria involving many clinical factors, vague definitions, or subjective interpretations, machine learning may be useful for modeling the complex diagnostic decision-making process from a vector of clinical inputs to identify individuals with the health outcome. In the second dimension, we propose that for health outcomes where diagnostic information is largely stored in unstructured formats such as free text or images, machine learning may be useful for extracting and structuring this information as part of a natural language processing system or an image recognition task. We then consider these two dimensions jointly to define four common scenarios of health outcomes. For each scenario, we discuss the potential uses for machine learning-first assuming accurate and complete EHR data and then relaxing these assumptions to accommodate the limitations of real-world EHR systems. We illustrate these four scenarios using concrete examples and describe how recent studies have used machine learning to identify these health outcomes from EHR data.SummaryMachine learning has great potential to improve the accuracy and efficiency of health outcome identification from EHR systems, especially under certain conditions. To promote the use of machine learning in EHR-based phenotyping tasks, future work should prioritize efforts to increase the transportability of machine learning algorithms for use in multi-site settings.
引用
收藏
页码:331 / 342
页数:12
相关论文
共 50 条
  • [1] Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data
    Jenna Wong
    Mara Murray Horwitz
    Li Zhou
    Sengwee Toh
    [J]. Current Epidemiology Reports, 2018, 5 : 331 - 342
  • [2] Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
    McDermott, Sean P.
    Wasan, Ajay D.
    [J]. JOURNAL OF PAIN RESEARCH, 2023, 16 : 2133 - 2140
  • [3] Machine Learning Model to Accurately Identify Rheumatoid Arthritis Patients Using Raw Electronic Health Record Data
    Gilvaz, Vinit
    Reginato, Anthony
    Dalal, Deepan
    Crough, Brad
    [J]. ARTHRITIS & RHEUMATOLOGY, 2022, 74 : 2766 - 2767
  • [4] Machine Learning to Identify Patients at High Risk for Peripheral Arterial Disease From Electronic Health Record Data
    Sonderman, Mark
    Farber-Eger, Eric
    Aday, Aaron W.
    Freiberg, Matthew S.
    Beckman, Joshua A.
    Wells, Quinn
    [J]. CIRCULATION, 2020, 142
  • [5] Classifying Pseudogout Using Machine Learning Approaches with Electronic Health Record Data
    Tedeschi, Sara K.
    Cai, Tianrun
    He, Zeling
    Ahuja, Yuri
    Hong, Chuan
    Yates, Katherine
    Dahal, Kumar
    Xu, Chang
    Lyu, Houchen
    Yoshida, Kazuki
    Solomon, Daniel
    Cai, Tianxi
    Liao, Katherine
    [J]. ARTHRITIS & RHEUMATOLOGY, 2019, 71
  • [6] Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data
    Gianfrancesco, Milena A.
    Tamang, Suzanne
    Yazdany, Jinoos
    Schmajuk, Gabriela
    [J]. JAMA INTERNAL MEDICINE, 2018, 178 (11) : 1544 - 1547
  • [7] Classifying Pseudogout Using Machine Learning Approaches With Electronic Health Record Data
    Tedeschi, Sara K.
    Cai, Tianrun
    He, Zeling
    Ahuja, Yuri
    Hong, Chuan
    Yates, Katherine A.
    Dahal, Kumar
    Xu, Chang
    Lyu, Houchen
    Yoshida, Kazuki
    Solomon, Daniel H.
    Cai, Tianxi
    Liao, Katherine P.
    [J]. ARTHRITIS CARE & RESEARCH, 2021, 73 (03) : 442 - 448
  • [8] Identification of postoperative complications using electronic health record data and machine learning
    Bronsert, Michael
    Singh, Abhinav B.
    Henderson, William G.
    Hammermeister, Karl
    Meguid, Robert A.
    Colborn, Kathryn L.
    [J]. AMERICAN JOURNAL OF SURGERY, 2020, 220 (01): : 114 - 119
  • [9] Development of a Machine Learning Model Using Electronic Health Record Data to Identify Antibiotic Use Among Hospitalized Patients
    Moehring, Rebekah W.
    Phelan, Matthew
    Lofgren, Eric
    Nelson, Alicia
    Dodds Ashley, Elizabeth
    Anderson, Deverick J.
    Goldstein, Benjamin A.
    [J]. JAMA NETWORK OPEN, 2021, 4 (03)
  • [10] Predicting Severe Sepsis from the Electronic Health Record Using Machine Learning
    Gallant, S.
    Culliton, P.
    Levinson, M.
    Ehresman, A.
    Wherry, J.
    Steingrub, J.
    [J]. AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2018, 197