A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation

被引:0
|
作者
Vilarino, Darnes [1 ]
Pinto, David [1 ]
Tovar, Mireya [1 ]
Balderas, Carlos [1 ]
Beltran, Beatriz [1 ]
机构
[1] Benemerita Univ Autonoma Puebla, Fac Comp Sci, Puebla, Mexico
关键词
Bilingual word sense disambiguation; Naive Bayes classifier; Parallel corpus;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing. Even if the problem of WSD is difficult, when we consider its bilingual version, this problem becomes to be much more complex. In this case, it is needed not only to find the correct translation, but this translation must consider the contextual senses of the original sentence (in a source language), in order to find the correct sense (in the target language) of the source word. In this paper we propose a model based on n-grams (3-grams and 5-grams) that significantly outperforms the last results that we presented at the cross-lingual word sense disambiguation task at the SemEval-2 forum. We use a naive Bayes classifier for determining the probability of a target sense (in a target language) given a sentence which contains the ambiguous word (in a source language). For this purpose, we use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to determine the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). As we mentioned, the results were compared with those of an international competition, obtaining a good performance.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 50 条
  • [1] Automatic word spacing using probabilistic models based on character n-grams
    Lee, Do-Gil
    Rim, Hae-Chang
    Yook, Dongsuk
    [J]. IEEE INTELLIGENT SYSTEMS, 2007, 22 (01) : 28 - 35
  • [2] IDF for Word N-grams
    Shirakawa, Masumi
    Hara, Takahiro
    Nishio, Shojiro
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 36 (01)
  • [3] Probabilistic word sense disambiguation
    Preiss, J
    [J]. COMPUTER SPEECH AND LANGUAGE, 2004, 18 (03): : 319 - 337
  • [4] SPEECH RECOGNITION USING FUNCTION-WORD N-GRAMS AND CONTENT-WORD N-GRAMS
    ISOTANI, R
    MATSUNAGA, S
    SAGAYAMA, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 692 - 697
  • [5] The subjective frequency of word n-grams
    Shaoul, Cyrus
    Westbury, Chris F.
    Baayen, R. Harald
    [J]. PSIHOLOGIJA, 2013, 46 (04) : 497 - 537
  • [6] Variable word rate n-grams
    Gotoh, Y
    Renals, S
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1591 - 1594
  • [7] AN IMPROVED UNSUPERVISED LEARNING PROBABILISTIC MODEL OF WORD SENSE DISAMBIGUATION
    Li, Xu
    Zhao, Xiuyan
    Ban, Fenglong
    Liu, Bai
    [J]. PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 1071 - 1075
  • [8] Interpolated N-Grams for Model Based Testing
    Tonella, Paolo
    Tiella, Roberto
    Cu Duy Nguyen
    [J]. 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014), 2014, : 562 - 572
  • [9] Diacritics restoration based on word n-grams for Slovak texts
    Toth, Stefan
    Zaymus, Emanuel
    Duracik, Michal
    Hrkut, Patrik
    Mesko, Matej
    [J]. OPEN COMPUTER SCIENCE, 2021, 11 (01): : 180 - 189
  • [10] Which Granularity to Bootstrap a Multilingual Method of Document Alignment: Character N-grams or Word N-grams?
    Lecluze, Charlotte
    Rigouste, Lois
    Giguet, Emmanuel
    Lucas, Nadine
    [J]. CORPUS RESOURCES FOR DESCRIPTIVE AND APPLIED STUDIES. CURRENT CHALLENGES AND FUTURE DIRECTIONS: SELECTED PAPERS FROM THE 5TH INTERNATIONAL CONFERENCE ON CORPUS LINGUISTICS (CILC2013), 2013, 95 : 473 - 481