Evaluating n-gram Models for a Bilingual Word Sense Disambiguation Task

被引:0
|
作者
Pinto, David [1 ]
Vilarino, Darnes [1 ]
Balderas, Carlos [1 ]
Tovar, Mireya [1 ]
Beltran, Beatriz [1 ]
机构
[1] Benemerita Univ Autonoma Puebla, Fac Ciencias Computac, Puebla, Mexico
来源
COMPUTACION Y SISTEMAS | 2011年 / 15卷 / 02期
关键词
Bilingual word sense disambiguation; machine translation; parallel corpus; Naive Bayes classifier;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of Word Sense Disambiguation (WSD) is about selecting the correct sense of an ambiguous word in a given context. However, even if the problem of WSD is difficult, when we consider its bilingual version, this problem becomes much more complex. In this case, it is necessary not only to find the correct translation, but such translation must consider the contextual senses of the original sentence (in the source language), in order to find the correct sense (in the target language) of the source word. In this paper we present a probabilistic model for bilingual WSD based on n-grams (2-grams, 3-grams, 5-grams and k-grams, for a sentence S of a length k). The aim is to analyze the behavior of the system with different representations of a given sentence containing an ambiguous word. We use a Naive Bayes classifier for determining the probability of the target sense (in the target language) given a sentence which contains an ambiguous word (in the source language). For this purpose, we use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus. On the average, the representation model based on 5-grams with mutual information demonstrated the best performance.
引用
收藏
页码:209 / 220
页数:12
相关论文
共 50 条
  • [1] Unsupervised word sense disambiguation with N-gram features
    Preotiuc-Pietro, Daniel
    Hristea, Florentina
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2014, 41 (02) : 241 - 260
  • [2] Unsupervised word sense disambiguation with N-gram features
    Daniel Preotiuc-Pietro
    Florentina Hristea
    [J]. Artificial Intelligence Review, 2014, 41 : 241 - 260
  • [3] Evaluating Word Sense Disambiguation Tools for Information Retrieval Task
    Martinez-Santiago, Fernando
    Perea-Ortega, Jose M.
    Garcia-Cumbreras, Miguel A.
    [J]. EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 113 - 117
  • [4] Web-Scale N-gram Models for Lexical Disambiguation
    Bergsma, Shane
    Lin, Dekang
    Goebel, Randy
    [J]. 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1507 - 1512
  • [5] Bag-Of-Word normalized n-gram models
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1594 - 1597
  • [6] A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation
    Vilarino, Darnes
    Pinto, David
    Tovar, Mireya
    Balderas, Carlos
    Beltran, Beatriz
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, MICAI 2010, PT I, 2010, 6437 : 82 - 91
  • [7] Factored bilingual n-gram language models for statistical machine translation
    Crego, Josep M.
    Yvon, Francois
    [J]. MACHINE TRANSLATION, 2010, 24 (02) : 159 - 175
  • [8] Word n-gram attention models for sentence similarity and inference
    Lopez-Gazpio, I
    Maritxalar, M.
    Lapata, M.
    Agirre, E.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 132 : 1 - 11
  • [9] Evaluating Word Sense Induction and Disambiguation Methods
    Klapaftis, Ioannis P.
    Manandhar, Suresh
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (03) : 579 - 605
  • [10] Evaluating Word Sense Induction and Disambiguation Methods
    Ioannis P. Klapaftis
    Suresh Manandhar
    [J]. Language Resources and Evaluation, 2013, 47 : 579 - 605