Stemming to improve translation lexicon creation form bitexts

被引:15
|
作者
Fattah, MA [1 ]
Ren, FJ [1 ]
Kuroiwa, S [1 ]
机构
[1] Univ Tokushima, Fac Engn, Tokushima 7708506, Japan
关键词
multilingual dictionaries; English/Arabic translation; multilingual thesaurus; stemming;
D O I
10.1016/j.ipm.2005.07.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic is a morphologically rich language that presents significant challenges to many natural language processing applications because a word often conveys complex meanings decomposable into several morphemes (i.e. prefix, stem, suffix). By segmenting words into morphemes, we could improve the performance of English/Arabic translation pair's extraction from parallel texts. This paper describes two algorithms and their combination to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive after using an Arabic light stemmer as a preprocessing step. Before using the Arabic light stemmer, the total system precision and recall were 88.6% and 81.5% respectively, then the system precision an recall increased to 91.6% and 82.6% respectively after applying the Arabic light stemmer on the Arabic documents. The algorithms have certain variables which values can be changed to control the system precision and recall. Like most of the systems do, the accuracy of our system is directly proportional to the number of sentence pairs used. However our system is able to extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each others, synonyms and the explanation of the word in the other language as well. By controlling the system variables, we could achieve 100% precision for the output bilingual dictionary with a small recall. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1003 / 1016
页数:14
相关论文
共 50 条
  • [1] A lexicon-based stemming procedure
    Silva, G
    Oliveira, C
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS, 2003, 2721 : 159 - 166
  • [2] Lexicon-free stemming for Kazakh language information retrieval
    Tukeyev, Ualsher
    Turganbayeva, Aliya
    Abduali, Balzhan
    Rakhimova, Diana
    Amirova, Dina
    Karibayeva, Aidana
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 95 - 98
  • [3] Lexicon of musical Form
    Drabkin, William
    FONTES ARTIS MUSICAE, 2012, 59 (04) : 422 - 424
  • [4] Lexicon of Musical Form
    Ender, Daniel
    OSTERREICHISCHE MUSIKZEITSCHRIFT, 2012, 67 (03): : 104 - 104
  • [5] The Medieval French Lexicon of Translation
    Jessica Stoll
    Neophilologus, 2015, 99 : 191 - 207
  • [6] The Medieval French Lexicon of Translation
    Stoll, Jessica
    NEOPHILOLOGUS, 2015, 99 (02) : 191 - 207
  • [7] Bilingual lexicon through translation
    Hummel, KM
    EXPLORATION OF LEXICONS, 1997, B-20 : 3 - 14
  • [8] Translation and creation
    Xu, YH
    PERSPECTIVES-STUDIES IN TRANSLATOLOGY, 2000, 8 (03): : 231 - 234
  • [9] TRANSLATION AND CREATION
    Bezerra, Paulo
    LINHA D AGUA, 2012, 25 (02): : 15 - 23
  • [10] Creation and Translation
    Dimitriu, Ileana
    Mann, Chris
    CURRENT WRITING-TEXT AND RECEPTION IN SOUTHERN AFRICA, 2007, 19 (01) : 21 - 52