Evaluating n-gram Models for a Bilingual Word Sense Disambiguation Task

被引:0
|
作者
Pinto, David [1 ]
Vilarino, Darnes [1 ]
Balderas, Carlos [1 ]
Tovar, Mireya [1 ]
Beltran, Beatriz [1 ]
机构
[1] Benemerita Univ Autonoma Puebla, Fac Ciencias Computac, Puebla, Mexico
来源
COMPUTACION Y SISTEMAS | 2011年 / 15卷 / 02期
关键词
Bilingual word sense disambiguation; machine translation; parallel corpus; Naive Bayes classifier;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of Word Sense Disambiguation (WSD) is about selecting the correct sense of an ambiguous word in a given context. However, even if the problem of WSD is difficult, when we consider its bilingual version, this problem becomes much more complex. In this case, it is necessary not only to find the correct translation, but such translation must consider the contextual senses of the original sentence (in the source language), in order to find the correct sense (in the target language) of the source word. In this paper we present a probabilistic model for bilingual WSD based on n-grams (2-grams, 3-grams, 5-grams and k-grams, for a sentence S of a length k). The aim is to analyze the behavior of the system with different representations of a given sentence containing an ambiguous word. We use a Naive Bayes classifier for determining the probability of the target sense (in the target language) given a sentence which contains an ambiguous word (in the source language). For this purpose, we use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus. On the average, the representation model based on 5-grams with mutual information demonstrated the best performance.
引用
收藏
页码:209 / 220
页数:12
相关论文
共 50 条
  • [21] Gold standard datasets for evaluating word sense disambiguation programs
    Kilgarriff, A
    [J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (04): : 453 - 472
  • [22] Neural Network Models for Word Sense Disambiguation: An Overview
    Popov, Alexander
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (01) : 139 - 151
  • [23] Analysis and Evaluation of Language Models for Word Sense Disambiguation
    Loureiro, Daniel
    Rezaee, Kiamehr
    Pilehvar, Mohammad Taher
    Camacho-Collados, Jose
    [J]. COMPUTATIONAL LINGUISTICS, 2021, 47 (02) : 387 - 443
  • [24] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    [J]. AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 557 - +
  • [25] A Comparison of Word Embeddings and N-gram Models for DBpedia Type and Invalid Entity Detection
    Zhou, Hanqing
    Zouaq, Amal
    Inkpen, Diana
    [J]. INFORMATION, 2019, 10 (01)
  • [26] Polish Word Recognition Based on n-Gram Methods
    Wojcicki, Piotr
    Zientarski, Tomasz
    [J]. IEEE ACCESS, 2024, 12 : 49817 - 49825
  • [27] An Effective and Efficient Utterance Verification Technology Using Word N-gram Filler Models
    Yu, Dong
    Ju, Yun Cheng
    Acero, Alex
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2318 - 2321
  • [28] Perplexity of n-Gram and Dependency Language Models
    Popel, Martin
    Marecek, David
    [J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 173 - 180
  • [29] MIXTURE OF MIXTURE N-GRAM LANGUAGE MODELS
    Sak, Hasim
    Allauzen, Cyril
    Nakajima, Kaisuke
    Beaufays, Francoise
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 31 - 36
  • [30] Distributional analysis of related synsets in WordNet* for a word sense disambiguation task
    Fragos, K
    Maistros, Y
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2005, 14 (06) : 919 - 934