Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework

被引:19
|
作者
Rahimi, Razieh [1 ]
Shakery, Azadeh [1 ,2 ]
King, Irwin [3 ]
机构
[1] Univ Tehran, Coll Engn, Sch Elect & Comp Engn, POB 14395-515, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, POB 19395-5746, Tehran, Iran
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Shatin, Hong Kong, Peoples R China
关键词
Translation model; Bilingual lexicon; Comparable corpora; Cross-Language Information Retrieval; Language modeling framework; CORPUS;
D O I
10.1016/j.ipm.2015.08.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A main challenge in Cross-Language Information Retrieval (CLIR) is to estimate a proper translation model from available translation resources, since translation quality directly affects the retrieval performance. Among different translation resources, we focus on obtaining translation models from comparable corpora, because they provide appropriate translations for both languages and domains with limited linguistic resources. In this paper, we employ a two-step approach to build an effective translation model from comparable corpora, without requiring any additional linguistic resources, for the CLIR task. In the first step, translations are extracted by deriving correlations between source-target word pairs. These correlations are used to estimate word translation probabilities in the second step. We propose a language modeling approach for the first step, where modeling based on probability distribution provides two key advantages. First, our approach can be tuned easier in comparison with heuristically adjusted previous work. Second, it provides a principled basis for integrating additional lexical and translational relations to improve the accuracy of translations from comparable corpora. As an indication, we integrate monolingual relations of word co-occurrences into the process of translation extraction, which helps to extract more reliable translations for low-frequency words in a comparable corpus. Experimental results on an English-Persian comparable corpus show that our method outperforms the previous approaches in terms of both translation quality and the performance of CUR. Indeed, the proposed method is naturally applicable to any comparable corpus, regardless of its languages. In addition, we demonstrate the significant impact of word translation probabilities, estimated in the second step of our approach, on the performance of CLIR. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:299 / 318
页数:20
相关论文
共 50 条
  • [31] Translation Techniques in Cross-Language Information Retrieval
    Zhou, Dong
    Truran, Mark
    Brailsford, Tim
    Wade, Vincent
    Ashman, Helen
    ACM COMPUTING SURVEYS, 2012, 45 (01)
  • [32] Translation Ambiguity in Cross-Language Information Retrieval
    Sadat, Fatiha
    BUSINESS TRANSFORMATION THROUGH INNOVATION AND KNOWLEDGE MANAGEMENT: AN ACADEMIC PERSPECTIVE, VOLS 1-2, 2010, : 301 - 303
  • [33] Cross-language Information Retrieval Based on Multiple Information
    Liu, Pengyuan
    Zheng, Zhijun
    Su, Qi
    2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 623 - 626
  • [34] Relevance feedback and cross-language information retrieval
    Orengo, Viviane Moreira
    Huyck, Christian
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (05) : 1203 - 1217
  • [35] Cross-Language Information Retrieval: An analysis of errors
    Ruiz, ME
    Srinivasan, P
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1998, 35 : 153 - 165
  • [36] The BETTER Cross-Language Information Retrieval Datasets
    Soboroff, Ian
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3047 - 3053
  • [37] Arabic Cross-Language Information Retrieval: A Review
    Elayeb, Bilel
    Bounhas, Ibrahim
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 15 (03)
  • [38] Cross-Language Information Retrieval using Japanese and English WordNets
    Ueno, Ryo
    Klyuev, Vitaly
    2012 INTERNATIONAL CONFERENCE ON APPLIED INFORMATICS AND COMMUNICATION (ICAIC 2012), 2013, : 198 - 203
  • [39] Cross-Language Information Retrieval Using PARAFAC2
    Chew, Peter A.
    Bader, Brett W.
    Kolda, Tamara G.
    Abdelali, Ahmed
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 143 - +
  • [40] Cross-Language Information Retrieval: An analysis of errors
    Ruiz, ME
    Srinivasan, P
    ASIS '98 - PROCEEDINGS OF THE 61ST ASIS ANNUAL MEETING, VOL 35, 1998: INFORMATION ACCESS IN THE GLOBAL INFORMATION ECONOMY, 1998, 35 : 153 - 165