Collocation analysis for UMLS knowledge-based word sense disambiguation

被引:6
|
作者
Jimeno-Yepes, Antonio [1 ]
McInnes, Bridget T. [2 ]
Aronson, Alan R. [1 ]
机构
[1] Natl Lib Med, Bethesda, MD 20894 USA
[2] Univ Minnesota Twin Cities, Dept Pharmacol, Minneapolis, MN 55455 USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
关键词
Semantic Type; Ambiguous Word; Unify Medical Language System; Word Sense Disambiguation; Semantic Group;
D O I
10.1186/1471-2105-12-S3-S4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The effectiveness of knowledge-based word sense disambiguation (WSD) approaches depends in part on the information available in the reference knowledge resource. Off the shelf, these resources are not optimized for WSD and might lack terms to model the context properly. In addition, they might include noisy terms which contribute to false positives in the disambiguation results. Methods: We analyzed some collocation types which could improve the performance of knowledge-based disambiguation methods. Collocations are obtained by extracting candidate collocations from MEDLINE and then assigning them to one of the senses of an ambiguous word. We performed this assignment either using semantic group profiles or a knowledge-based disambiguation method. In addition to collocations, we used second-order features from a previously implemented approach. Specifically, we measured the effect of these collocations in two knowledge-based WSD methods. The first method, AEC, uses the knowledge from the UMLS to collect examples from MEDLINE which are used to train a Naive Bayes approach. The second method, MRD, builds a profile for each candidate sense based on the UMLS and compares the profile to the context of the ambiguous word. We have used two WSD test sets which contain disambiguation cases which are mapped to UMLS concepts. The first one, the NLM WSD set, was developed manually by several domain experts and contains words with high frequency occurrence in MEDLINE. The second one, the MSH WSD set, was developed automatically using the MeSH indexing in MEDLINE. It contains a larger set of words and covers a larger number of UMLS semantic types. Results: The results indicate an improvement after the use of collocations, although the approaches have different performance depending on the data set. In the NLM WSD set, the improvement is larger for the MRD disambiguation method using second-order features. Assignment of collocations to a candidate sense based on UMLS semantic group profiles is more effective in the AEC method. In the MSH WSD set, the increment in performance is modest for all the methods. Collocations combined with the MRD disambiguation method have the best performance. The MRD disambiguation method and second-order features provide an insignificant change in performance. The AEC disambiguation method gives a modest improvement in performance. Assignment of collocations to a candidate sense based on knowledge-based methods has better performance. Conclusions: Collocations improve the performance of knowledge-based disambiguation methods, although results vary depending on the test set and method used. Generally, the AEC method is sensitive to query drift. Using AEC, just a few selected terms provide a large improvement in disambiguation performance. The MRD method handles noisy terms better but requires a larger set of terms to improve performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Collocation analysis for UMLS knowledge-based word sense disambiguation
    Antonio Jimeno-Yepes
    Bridget T Mclnnes
    Alan R Aronson
    BMC Bioinformatics, 12
  • [2] Using Context Information for Knowledge-Based Word Sense Disambiguation
    Simov, Kiril
    Osenova, Petya
    Popov, Alexander
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, AIMSA 2016, 2016, 9883 : 130 - 139
  • [3] Knowledge-based biomedical word sense disambiguation: comparison of approaches
    Antonio J Jimeno-Yepes
    Alan R Aronson
    BMC Bioinformatics, 11
  • [4] Knowledge-Based Word Sense Disambiguation Using Topic Models
    Chaplot, Devendra Singh
    Salakhutdinov, Ruslan
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5062 - 5069
  • [5] Knowledge-based biomedical word sense disambiguation: comparison of approaches
    Jimeno-Yepes, Antonio J.
    Aronson, Alan R.
    BMC BIOINFORMATICS, 2010, 11
  • [6] Word sense disambiguation based on context selection using knowledge-based word similarity
    Kwon, Sunjae
    Oh, Dongsuk
    Ko, Youngjoong
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (04)
  • [7] Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation
    Han, Shangzhuang
    Shirai, Kiyoaki
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 1218 - 1225
  • [8] Word clustering for collocation-based word sense disambiguation
    Jin, Peng
    Sun, Xu
    Wu, Yunfang
    Yu, Shiwen
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 267 - +
  • [9] Knowledge-Based Method for Word Sense Disambiguation by Using Hindi WordNet
    Sharma, Pooja
    Joshi, Nisheeth
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2019, 9 (02) : 3985 - 3989
  • [10] Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings
    Sabbir, A. K. M.
    Jimeno-Yepes, Antonio
    Kavuluru, Ramakanth
    2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 163 - 170