Unsupervised Word Sense Disambiguation Using Word Embeddings

被引:0
|
作者
Moradi, Behzad [1 ]
Ansari, Ebrahim [1 ,2 ]
Zabokrtsky, Zdenek [2 ]
机构
[1] Inst Adv Studies Basic Sci IASBS, Zanjan, Iran
[2] Charles Univ Prague, Inst Formal & Appl Linguist, Prague, Czech Republic
关键词
D O I
10.23919/fruct48121.2019.8981526
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Word sense disambiguation is the task of assigning the correct sense of a polysemous word in the context in which it appears. In recent years, word embeddings have been applied successfully to many NLP tasks. Thanks to their ability to capture distributional semantics, more recent attention have been focused on utilizing word embeddings to disambiguate words. In this paper, a novel unsupervised method is proposed to disambiguate words from the first language by deploying a trained word embeddings model of the second language using only a bilingual dictionary. While the translated words are useful clues for the disambiguation process, the main idea of this work is to use the information provided by English-translated surrounding words to disambiguate Persian words using trained English word2vec; well-known word embeddings model. Each translation of the polysemous word is compared against word embeddings of translated surrounding words to calculate word similarity scores and the most similar word to vectors of translated surrounding words is selected as the correct translation. This method only requires a raw corpus and a bilingual dictionary to disambiguate the word under question. The experimental results on a manually-created test dataset demonstrate the accuracy of the proposed method.
引用
收藏
页码:228 / 233
页数:6
相关论文
共 50 条
  • [1] Domain Adaptation for Word Sense Disambiguation Using Word Embeddings
    Komiya, Kanako
    Suzuki, Shota
    Sasaki, Minoru
    Shinnou, Hiroyuki
    Okumura, Manabu
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 195 - 206
  • [2] Biomedical Word Sense Disambiguation with Word Embeddings
    Antunes, Rui
    Matos, Sergio
    [J]. 11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 273 - 279
  • [3] Unsupervised Word Sense Disambiguation Using The WWW
    Klapaftis, Ioannis P.
    Manandhar, Suresh
    [J]. STAIRS 2006, 2006, 142 : 174 - 183
  • [4] Word Sense Disambiguation for 158 Languages using Word Embeddings Only
    Logacheva, Varvara
    Teslenko, Denis
    Shelmanov, Artem
    Remus, Steffen
    Ustalov, Dmitry
    Kutuzov, Andrey
    Artemova, Ekaterina
    Biemann, Chris
    Ponzetto, Simone Paolo
    Panchenko, Alexander
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5943 - 5952
  • [5] Supervised word sense disambiguation using new features based on word embeddings
    Sadi, Majid Fahandezi
    Ansari, Ebrahim
    Afsharchi, Mohsen
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (01) : 1467 - 1476
  • [6] An unsupervised method for word sense disambiguation
    Rahman, Nazreena
    Borah, Bhogeswar
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 6643 - 6651
  • [7] Word Embeddings of Monosemous Words in Dictionary for Word Sense Disambiguation
    Sasaki, Minoru
    [J]. SEMAPRO 2018: THE TWELFTH INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, 2018, : 4 - 7
  • [8] Unsupervised Korean Word Sense Disambiguation using CoreNet
    Han, Kijong
    Nam, Sangha
    Kim, Jiseong
    Hahm, Younggyun
    Choi, Key-Sun
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1023 - 1026
  • [9] Unsupervised word sense disambiguation using WordNet relatives
    Seo, HC
    Chung, HJ
    Rim, HC
    Myaeng, SH
    Kim, SH
    [J]. COMPUTER SPEECH AND LANGUAGE, 2004, 18 (03): : 253 - 273
  • [10] Embeddings for Word Sense Disambiguation: An Evaluation Study
    Iacobacci, Ignacio
    Pilehvar, Mohammad Taher
    Navigli, Roberto
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 897 - 907