Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework

被引:19
|
作者
Rahimi, Razieh [1 ]
Shakery, Azadeh [1 ,2 ]
King, Irwin [3 ]
机构
[1] Univ Tehran, Coll Engn, Sch Elect & Comp Engn, POB 14395-515, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, POB 19395-5746, Tehran, Iran
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Shatin, Hong Kong, Peoples R China
关键词
Translation model; Bilingual lexicon; Comparable corpora; Cross-Language Information Retrieval; Language modeling framework; CORPUS;
D O I
10.1016/j.ipm.2015.08.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A main challenge in Cross-Language Information Retrieval (CLIR) is to estimate a proper translation model from available translation resources, since translation quality directly affects the retrieval performance. Among different translation resources, we focus on obtaining translation models from comparable corpora, because they provide appropriate translations for both languages and domains with limited linguistic resources. In this paper, we employ a two-step approach to build an effective translation model from comparable corpora, without requiring any additional linguistic resources, for the CLIR task. In the first step, translations are extracted by deriving correlations between source-target word pairs. These correlations are used to estimate word translation probabilities in the second step. We propose a language modeling approach for the first step, where modeling based on probability distribution provides two key advantages. First, our approach can be tuned easier in comparison with heuristically adjusted previous work. Second, it provides a principled basis for integrating additional lexical and translational relations to improve the accuracy of translations from comparable corpora. As an indication, we integrate monolingual relations of word co-occurrences into the process of translation extraction, which helps to extract more reliable translations for low-frequency words in a comparable corpus. Experimental results on an English-Persian comparable corpus show that our method outperforms the previous approaches in terms of both translation quality and the performance of CUR. Indeed, the proposed method is naturally applicable to any comparable corpus, regardless of its languages. In addition, we demonstrate the significant impact of word translation probabilities, estimated in the second step of our approach, on the performance of CLIR. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:299 / 318
页数:20
相关论文
共 50 条
  • [41] Cross-Language Information Retrieval in Web application
    Yu, SF
    Li, ZZ
    Thomassen, W
    ICCC2004: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION VOL 1AND 2, 2004, : 1198 - 1202
  • [42] Cross-language information retrieval: the way ahead
    Gey, FC
    Kando, N
    Peters, C
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (03) : 415 - 431
  • [43] Combining evidence for cross-language information retrieval
    Kamps, J
    Monz, C
    de Rijke, M
    ADVANCES IN CROSS-LANGUAGE INFORMATION RETRIEVAL, 2003, 2785 : 111 - 126
  • [44] Disambiguation strategies for Cross-Language Information Retrieval
    Hiemstra, D
    de Jong, F
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 1999, 1696 : 274 - 293
  • [45] Matching meaning for cross-language information retrieval
    Wang, Jianqiang
    Oard, Douglas W.
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (04) : 631 - 653
  • [46] Influence of WSD on cross-language information retrieval
    Kang, IS
    Na, SH
    Lee, JH
    NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 358 - 366
  • [47] Different approaches to cross-language information retrieval
    Kraaij, W
    Pohlmann, R
    COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS 2000, 2001, (37): : 97 - 110
  • [48] Cross-language information retrieval using EuroWordNet and word sense disambiguation
    Clough, P
    Stevenson, M
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2004, 2997 : 327 - 337
  • [49] Query disambiguation for Cross-Language Information Retrieval using Web directories
    Kimura, F
    Maeda, A
    Miyazaki, J
    Uemura, S
    INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, : 151 - 156
  • [50] Using cross-language information retrieval methods for bilingual search of the web
    Shim, Sung J.
    International Conference on Computational Intelligence for Modelling, Control & Automation Jointly with International Conference on Intelligent Agents, Web Technologies & Internet Commerce, Vol 2, Proceedings, 2006, : 19 - 23