Stemming to improve translation lexicon creation form bitexts

被引:15
|
作者
Fattah, MA [1 ]
Ren, FJ [1 ]
Kuroiwa, S [1 ]
机构
[1] Univ Tokushima, Fac Engn, Tokushima 7708506, Japan
关键词
multilingual dictionaries; English/Arabic translation; multilingual thesaurus; stemming;
D O I
10.1016/j.ipm.2005.07.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic is a morphologically rich language that presents significant challenges to many natural language processing applications because a word often conveys complex meanings decomposable into several morphemes (i.e. prefix, stem, suffix). By segmenting words into morphemes, we could improve the performance of English/Arabic translation pair's extraction from parallel texts. This paper describes two algorithms and their combination to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive after using an Arabic light stemmer as a preprocessing step. Before using the Arabic light stemmer, the total system precision and recall were 88.6% and 81.5% respectively, then the system precision an recall increased to 91.6% and 82.6% respectively after applying the Arabic light stemmer on the Arabic documents. The algorithms have certain variables which values can be changed to control the system precision and recall. Like most of the systems do, the accuracy of our system is directly proportional to the number of sentence pairs used. However our system is able to extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each others, synonyms and the explanation of the word in the other language as well. By controlling the system variables, we could achieve 100% precision for the output bilingual dictionary with a small recall. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1003 / 1016
页数:14
相关论文
共 50 条
  • [21] Creation, translation, self-translation
    Etxebarria, Arantza
    Anokhina, Olga
    Arcocha, Aurelia
    EUSKERA, 2023, 68 (02):
  • [22] Named entity translation method based on machine translation lexicon
    Panpan Li
    Mengxiang Wang
    Jian Wang
    Neural Computing and Applications, 2021, 33 : 3977 - 3985
  • [23] Named entity translation method based on machine translation lexicon
    Li, Panpan
    Wang, Mengxiang
    Wang, Jian
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (09): : 3977 - 3985
  • [24] The Creation of Retronyms in the Teaching Lexicon in the Digital Age
    Zollo, Silvia Domenica
    ESTUDIOS ROMANICOS, 2022, 31 : 291 - 308
  • [25] Sentiment Lexicon Creation from Lexical Resources
    Heerschop, Bas
    Hogenboom, Alexander
    Frasincar, Flavius
    BUSINESS INFORMATION SYSTEMS, 2011, 87 : 185 - 196
  • [26] The Creation of Retronyms in the Teaching Lexicon in the Digital Age
    Zollo, Silvia Domenica
    ESTUDIOS ROMANICOS, 2022, 31 : 291 - 308
  • [27] PRESENTATION: TRANSLATION AND CREATION
    Paganine, Carolina Geaquinto
    Lourenco Hanes, Vanessa Lopes
    CADERNOS DE TRADUCAO, 2020, 40 (03): : 10 - 13
  • [28] CREATION, IMITATION, AND TRANSLATION
    TAMPLIN, R
    COLLEGE ENGLISH, 1976, 37 (08) : 808 - 812
  • [29] Poetry: creation and translation
    Britto, Paulo Henriques
    IPOTESI-REVISTA DE ESTUDOS LITERARIOS, 2008, 12 (02): : 11 - 17
  • [30] Stemming Algorithm for Different Tenses to Improve Persian Dictionary
    Ghazvini, Arash
    Ab Aziz, Mohd Juzaidin
    2012 IEEE SYMPOSIUM ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ISIEA 2012), 2012,