Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts

被引:20
|
作者
Mao, Yuqing [1 ]
Fung, Kin Wah [1 ]
机构
[1] NIH, Natl Lib Med, Bldg 10, Bethesda, MD 20892 USA
基金
美国国家卫生研究院;
关键词
UMLS; semantic relatedness; medical terminologies; deep learning; word embedding; graph embedding; SIMILARITY;
D O I
10.1093/jamia/ocaa136
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The study sought to explore the use of deep learning techniques to measure the semantic relatedness between Unified Medical Language System (UMLS) concepts. Materials and Methods: Concept sentence embeddings were generated for UMLS concepts by applying the word embedding models BioWordVec and various flavors of BERT to concept sentences formed by concatenating UMLS terms. Graph embeddings were generated by the graph convolutional networks and 4 knowledge graph embedding models, using graphs built from UMLS hierarchical relations. Semantic relatedness was measured by the cosine between the concepts' embedding vectors. Performance was compared with 2 traditional path-based (shortest path and Leacock-Chodorow) measurements and the publicly available concept embeddings, cui2vec, generated from large biomedical corpora. The concept sentence embeddings were also evaluated on a word sense disambiguation (WSD) task. Reference standards used included the semantic relatedness and semantic similarity datasets from the University of Minnesota, concept pairs generated from the Standardized MedDRA Queries and the MeSH (Medical Subject Headings) WSD corpus. Results: Sentence embeddings generated by BioWordVec outperformed all other methods used individually in semantic relatedness measurements. Graph convolutional network graph embedding uniformly outperformed path-based measurements and was better than some word embeddings for the Standardized MedDRA Queries dataset. When used together, combined word and graph embedding achieved the best performance in all datasets. For WSD, the enhanced versions of BERT outperformed BioWordVec. Conclusions: Word and graph embedding techniques can be used to harness terms and relations in the UMLS to measure semantic relatedness between concepts. Concept sentence embedding outperforms path-based measurements and cui2vec, and can be further enhanced by combining with graph embedding.
引用
收藏
页码:1538 / 1546
页数:9
相关论文
共 17 条
  • [1] Dental concepts in the Unified Medical Language System
    Barac'h, V
    Schleyer, TKL
    [J]. QUINTESSENCE INTERNATIONAL, 2002, 33 (01): : 69 - 74
  • [2] Auditing the unified medical language system with semantic methods
    Cimino, JJ
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1998, 5 (01) : 41 - 51
  • [3] A chemical specialty semantic network for the Unified Medical Language System
    Morrey, C. Paul
    Perl, Yehoshua
    Halper, Michael
    Chen, Ling
    Gu, Huanying Helen
    [J]. JOURNAL OF CHEMINFORMATICS, 2012, 4
  • [4] A chemical specialty semantic network for the Unified Medical Language System
    C Paul Morrey
    Yehoshua Perl
    Michael Halper
    Ling Chen
    Huanying “Helen” Gu
    [J]. Journal of Cheminformatics, 4
  • [5] Coverage of pediatric endocrinologic concepts in the Unified Medical Language System
    Spoonet, SA
    Danish, RK
    [J]. PEDIATRIC RESEARCH, 2002, 51 (04) : 118A - 118A
  • [6] A structural partition of the Unified Medical Language System's Semantic Network
    Chen, Z
    Halper, M
    Geller, J
    Perl, Y
    [J]. 2000 IEEE EMBS INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY APPLICATIONS IN BIOMEDICINE, PROCEEDINGS, 2000, : 296 - 301
  • [7] An enriched Unified Medical Language System Semantic Network with a multiple subsumption hierarchy
    Zhang, L
    Perl, Y
    Halper, M
    Geller, J
    Cimino, JJ
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2004, 11 (03) : 195 - 206
  • [8] Using Semantic and Structural Properties of the Unified Medical Language System to Discover Potential Terminological Relationships
    Patel, Chintan O.
    Cimino, James J.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (03) : 346 - 353
  • [9] ODMSummary: A Tool for Automatic Structured Comparison of Multiple Medical Forms Based on Semantic Annotation with the Unified Medical Language System
    Storck, Michael
    Krumm, Rainer
    Dugas, Martin
    [J]. PLOS ONE, 2016, 11 (10):