Using Machine Learning to Identify Health Outcomes from Electronic Health Record Data

被引:46
|
作者
Wong, Jenna [1 ,2 ]
Murray Horwitz, Mara [1 ,2 ]
Zhou, Li [3 ,4 ]
Toh, Sengwee [1 ,2 ]
机构
[1] Harvard Med Sch, Dept Populat Med, 401 Pk Dr,Suite 401 East, Boston, MA 02215 USA
[2] Harvard Pilgrim Hlth Care Inst, 401 Pk Dr,Suite 401 East, Boston, MA 02215 USA
[3] Brigham & Womens Hosp, Div Gen Internal Med & Primary Care, 75 Francis St, Boston, MA 02115 USA
[4] Harvard Med Sch, Boston, MA 02115 USA
基金
美国医疗保健研究与质量局;
关键词
Electronic health records; Machine learning; Health outcomes; Phenotyping; Cohort identification; MEDICAL-RECORDS; NEURAL-NETWORKS; TASK-FORCE; BIG DATA; CLASSIFICATION; TEXT; GUIDELINE; DIAGNOSIS; CRITERIA; COLLEGE;
D O I
10.1007/s40471-018-0165-9
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose of ReviewElectronic health records (EHRs) contain valuable data for identifying health outcomes, but these data also present numerous challenges when creating computable phenotyping algorithms. Machine learning methods could help with some of these challenges. In this review, we discuss four common scenarios that researchers may find helpful for thinking critically about when and for what tasks machine learning may be used to identify health outcomes from EHR data.Recent FindingsWe first consider the conditions in which machine learning may be especially useful with respect to two dimensions of a health outcome: (1) the characteristics of its diagnostic criteria and (2) the format in which its diagnostic data are usually stored within EHR systems. In the first dimension, we propose that for health outcomes with diagnostic criteria involving many clinical factors, vague definitions, or subjective interpretations, machine learning may be useful for modeling the complex diagnostic decision-making process from a vector of clinical inputs to identify individuals with the health outcome. In the second dimension, we propose that for health outcomes where diagnostic information is largely stored in unstructured formats such as free text or images, machine learning may be useful for extracting and structuring this information as part of a natural language processing system or an image recognition task. We then consider these two dimensions jointly to define four common scenarios of health outcomes. For each scenario, we discuss the potential uses for machine learning-first assuming accurate and complete EHR data and then relaxing these assumptions to accommodate the limitations of real-world EHR systems. We illustrate these four scenarios using concrete examples and describe how recent studies have used machine learning to identify these health outcomes from EHR data.SummaryMachine learning has great potential to improve the accuracy and efficiency of health outcome identification from EHR systems, especially under certain conditions. To promote the use of machine learning in EHR-based phenotyping tasks, future work should prioritize efforts to increase the transportability of machine learning algorithms for use in multi-site settings.
引用
收藏
页码:331 / 342
页数:12
相关论文
共 50 条
  • [31] Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data
    Mandair, Divneet
    Tiwari, Premanand
    Simon, Steven
    Colborn, Kathryn L.
    Rosenberg, Michael A.
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (01)
  • [32] Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data
    Divneet Mandair
    Premanand Tiwari
    Steven Simon
    Kathryn L. Colborn
    Michael A. Rosenberg
    [J]. BMC Medical Informatics and Decision Making, 20
  • [33] Identifying Stroke Patients At Risk For Atrial Fibrillation Using Electronic Health Record Data And Machine Learning
    Su, Tongli
    Hasan, S. M. Shafiul
    Nahab, Fadi B.
    Hu, Xiao
    [J]. STROKE, 2023, 54
  • [34] Prediction of Recurrent Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data
    Sarraju, Ashish
    Ward, Andrew
    Chung, Sukyung
    Li, Jiang
    Scheinker, David
    Rodriguez, Fatima
    [J]. CIRCULATION, 2020, 142
  • [35] Use of electronic health record data and machine learning to identify candidates for HIV pre-exposure prophylaxis: a modelling study
    Marcus, Julia L.
    Hurley, Leo B.
    Krakower, Douglas S.
    Alexeeff, Stacey
    Silverberg, Michael J.
    Volk, Jonathan E.
    [J]. LANCET HIV, 2019, 6 (10): : E688 - E695
  • [36] Using electronic health record data to identify comparator populations for comparative effectiveness research
    Ramsey, Scott D.
    Adamson, Blythe J.
    Wang, Xiaoliang
    Bargo, Danielle
    Baxi, Shrujal S.
    Ghosh, Shuhag
    Meropol, Neal J.
    [J]. JOURNAL OF MEDICAL ECONOMICS, 2020, 23 (12) : 1618 - 1622
  • [37] Machine learning applied to electronic health record data in home healthcare: A scoping review
    Hobensack, Mollie
    Song, Jiyoun
    Scharp, Danielle
    Bowles, Kathryn H.
    Topaz, Maxim
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 170
  • [38] Machine Learning for Prediction in Electronic Health Data
    Rose, Sherri
    [J]. JAMA NETWORK OPEN, 2018, 1 (04)
  • [39] Using the Electronic Health Record to Identify Subjects with Rheumatic Disease
    Taxter, Alysha
    Basiaga, Matthew
    Pooni, Rajdeep
    Pinotti, Caitlan
    Buckley, Lisa
    [J]. ARTHRITIS & RHEUMATOLOGY, 2023, 75 : 37 - 40
  • [40] Assessing Resident Cataract Surgical Outcomes Using Electronic Health Record Data
    Xiao, Grace
    Snkumaran, Divya
    Sikder, Shameema
    Woreta, Fasika
    Boland, Michael V.
    [J]. OPHTHALMOLOGY SCIENCE, 2023, 3 (02):