Stemming to improve translation lexicon creation form bitexts

被引:15
|
作者
Fattah, MA [1 ]
Ren, FJ [1 ]
Kuroiwa, S [1 ]
机构
[1] Univ Tokushima, Fac Engn, Tokushima 7708506, Japan
关键词
multilingual dictionaries; English/Arabic translation; multilingual thesaurus; stemming;
D O I
10.1016/j.ipm.2005.07.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic is a morphologically rich language that presents significant challenges to many natural language processing applications because a word often conveys complex meanings decomposable into several morphemes (i.e. prefix, stem, suffix). By segmenting words into morphemes, we could improve the performance of English/Arabic translation pair's extraction from parallel texts. This paper describes two algorithms and their combination to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive after using an Arabic light stemmer as a preprocessing step. Before using the Arabic light stemmer, the total system precision and recall were 88.6% and 81.5% respectively, then the system precision an recall increased to 91.6% and 82.6% respectively after applying the Arabic light stemmer on the Arabic documents. The algorithms have certain variables which values can be changed to control the system precision and recall. Like most of the systems do, the accuracy of our system is directly proportional to the number of sentence pairs used. However our system is able to extract translation pairs from a very small parallel corpus. This new system can extract translations from only two sentences in one language and two sentences in the other language if the requirements of the system accomplished. Moreover, this system is able to extract word pairs that are translation of each others, synonyms and the explanation of the word in the other language as well. By controlling the system variables, we could achieve 100% precision for the output bilingual dictionary with a small recall. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1003 / 1016
页数:14
相关论文
共 50 条
  • [41] Multilingual domain modeling in twenty-one - Automatic creation of a bi-directional translation lexicon from a parallel corpus
    Hiemstra, D
    COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS 1997: SELECTED PAPERS FROM THE EIGHTH CLIN MEETING, 1998, (25): : 41 - 57
  • [42] Machine translation. A view from the lexicon
    Radzinski, Daniel
    Computational Linguistics, 1994, 20 (04)
  • [43] The creation of urban form
    Maravelea, Kalliopi
    Grant, M.
    CAADRIA 2007: Proceedings of the 12th International Conference on Computer-Aided Architectural Design Research in Asia: DIGITIZATION AND GLOBALIZATION, 2007, : 633 - 638
  • [44] The highest form of creation
    Patterson, CP
    JOURNAL OF FORESTRY, 1998, 96 (07) : 50 - 50
  • [45] Lexicon creation to promote faculty development in medical communication
    Arnold, Richard W.
    Losh, David P.
    Mauksch, Larry B.
    Maresca, Theresa M.
    Storck, Michael G.
    Wenrich, Marjorie D.
    Goldstein, Erika A.
    PATIENT EDUCATION AND COUNSELING, 2009, 74 (02) : 179 - 183
  • [46] Effective Use of Dependency Structure for Bilingual Lexicon Creation
    Andrade, Daniel
    Matsuzaki, Takuya
    Tsujii, Jun'ichi
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 80 - +
  • [47] Method of Subjective Lexicon Creation for Chinese Sentiment Analysis
    Zhang Jing
    Jin Hao
    MECHANICAL ENGINEERING AND GREEN MANUFACTURING, PTS 1 AND 2, 2010, : 801 - +
  • [48] Effective use of dependency structure for bilingual lexicon creation
    Department of Computer Science, University of Tokyo, Tokyo, Japan
    不详
    不详
    Lect. Notes Comput. Sci., PART 2 (80-92):
  • [49] Using Hybrid-Stemming Approach to Enhance Lexicon-based Sentiment Analysis in Arabic
    Awwad, Hunaida
    Alpkocak, Adil
    2017 INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2017, : 229 - 235
  • [50] Bilingual LSA-based Translation Lexicon Adaptation for Spoken Language Translation
    Tam, Yik-Cheung
    Schultz, Tanja
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2444 - 2447