EHR phenotyping via jointly embedding medical concepts and words into a unified vector space

被引:23
|
作者
Bai, Tian [1 ]
Chanda, Ashis Kumar [1 ]
Egleston, Brian L. [2 ]
Vucetic, Slobodan [1 ]
机构
[1] Temple Univ, Dept Comp & Informat Sci, Philadelphia, PA 19122 USA
[2] Temple Univ, Fox Chase Canc Ctr, Philadelphia, PA 19122 USA
基金
美国国家卫生研究院;
关键词
Electronic health records; Distributed representation; Natural language processing; Healthcare; CLAIMS DATA; PERFORMANCE; MORTALITY; RATES;
D O I
10.1186/s12911-018-0672-0
中图分类号
R-058 [];
学科分类号
摘要
BackgroundThere has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide more detail about condition and treatment of patients.MethodsIn this work, we propose a method that jointly learns vector representations of medical concepts and words. This is achieved by a novel learning scheme based on the word2vec model. Our model learns those relationships by integrating clinical notes and sets of accompanying medical codes and by defining joint contexts for each observed word and medical code.ResultsIn our experiments, we learned joint representations using MIMIC-III data. Using the learned representations of words and medical codes, we evaluated phenotypes for 6 diseases discovered by our and baseline method. The experimental results show that for each of the 6 diseases our method finds highly relevant words. We also show that our representations can be very useful when predicting the reason for the next visit.ConclusionsThe jointly learned representations of medical concepts and words capture not only similarity between codes or words themselves, but also similarity between codes and words. They can be used to extract phenotypes of different diseases. The representations learned by the joint model are also useful for construction of patient features.
引用
收藏
页数:11
相关论文
共 7 条
  • [1] EHR phenotyping via jointly embedding medical concepts and words into a unified vector space
    Tian Bai
    Ashis Kumar Chanda
    Brian L. Egleston
    Slobodan Vucetic
    [J]. BMC Medical Informatics and Decision Making, 18
  • [2] Mirroring Vector Space Embedding for New Words
    Kim, Jihye
    Jeong, Ok-Ran
    [J]. IEEE ACCESS, 2021, 9 : 99954 - 99967
  • [3] Joint Learning of Representations of Medical Concepts and Words from EHR Data
    Bai, Tian
    Chanda, Ashis Kumar
    Egleston, Brian L.
    Vucetic, Slobodan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 764 - 769
  • [4] Enriching Unsupervised User Embedding via Medical Concepts
    Huang, Xiaolei
    Dernoncourt, Franck
    Dredze, Mark
    [J]. CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 63 - 78
  • [5] Embedding Words in Non-Vector Space with Unsupervised Graph Learning
    Ryabinin, Max
    Popov, Sergei
    Prokhorenkova, Liudmila
    Voita, Elena
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7317 - 7331
  • [6] Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts
    Mao, Yuqing
    Fung, Kin Wah
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (10) : 1538 - 1546
  • [7] Free-text medical document retrieval via phrase-based vector space model
    Mao, WL
    Chu, WW
    [J]. AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 489 - 493