Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework

被引：19

作者：

Rahimi, Razieh ^{[1
]}

Shakery, Azadeh ^{[1
,2
]}

King, Irwin ^{[3
]}

机构：

[1] Univ Tehran, Coll Engn, Sch Elect & Comp Engn, POB 14395-515, Tehran, Iran

[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, POB 19395-5746, Tehran, Iran

[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Shatin, Hong Kong, Peoples R China

来源：

INFORMATION PROCESSING & MANAGEMENT | 2016年 / 52卷 / 02期

关键词：

Translation model; Bilingual lexicon; Comparable corpora; Cross-Language Information Retrieval; Language modeling framework; CORPUS;

D O I：

10.1016/j.ipm.2015.08.001

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A main challenge in Cross-Language Information Retrieval (CLIR) is to estimate a proper translation model from available translation resources, since translation quality directly affects the retrieval performance. Among different translation resources, we focus on obtaining translation models from comparable corpora, because they provide appropriate translations for both languages and domains with limited linguistic resources. In this paper, we employ a two-step approach to build an effective translation model from comparable corpora, without requiring any additional linguistic resources, for the CLIR task. In the first step, translations are extracted by deriving correlations between source-target word pairs. These correlations are used to estimate word translation probabilities in the second step. We propose a language modeling approach for the first step, where modeling based on probability distribution provides two key advantages. First, our approach can be tuned easier in comparison with heuristically adjusted previous work. Second, it provides a principled basis for integrating additional lexical and translational relations to improve the accuracy of translations from comparable corpora. As an indication, we integrate monolingual relations of word co-occurrences into the process of translation extraction, which helps to extract more reliable translations for low-frequency words in a comparable corpus. Experimental results on an English-Persian comparable corpus show that our method outperforms the previous approaches in terms of both translation quality and the performance of CUR. Indeed, the proposed method is naturally applicable to any comparable corpus, regardless of its languages. In addition, we demonstrate the significant impact of word translation probabilities, estimated in the second step of our approach, on the performance of CLIR. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：299 / 318

页数：20

共 50 条

[31] Translation Techniques in Cross-Language Information Retrieval
Zhou, Dong
Truran, Mark
Brailsford, Tim
Wade, Vincent
Ashman, Helen
ACM COMPUTING SURVEYS, 2012, 45 (01)
[32] Translation Ambiguity in Cross-Language Information Retrieval
Sadat, Fatiha
BUSINESS TRANSFORMATION THROUGH INNOVATION AND KNOWLEDGE MANAGEMENT: AN ACADEMIC PERSPECTIVE, VOLS 1-2, 2010, : 301 - 303
[33] Cross-language Information Retrieval Based on Multiple Information
Liu, Pengyuan
Zheng, Zhijun
Su, Qi
2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 623 - 626
[34] Relevance feedback and cross-language information retrieval
Orengo, Viviane Moreira
Huyck, Christian
INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (05) : 1203 - 1217
[35] Cross-Language Information Retrieval: An analysis of errors
Ruiz, ME
Srinivasan, P
PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1998, 35 : 153 - 165
[36] The BETTER Cross-Language Information Retrieval Datasets
Soboroff, Ian
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3047 - 3053
[37] Arabic Cross-Language Information Retrieval: A Review
Elayeb, Bilel
Bounhas, Ibrahim
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 15 (03)
[38] Cross-Language Information Retrieval using Japanese and English WordNets
Ueno, Ryo
Klyuev, Vitaly
2012 INTERNATIONAL CONFERENCE ON APPLIED INFORMATICS AND COMMUNICATION (ICAIC 2012), 2013, : 198 - 203
[39] Cross-Language Information Retrieval Using PARAFAC2
Chew, Peter A.
Bader, Brett W.
Kolda, Tamara G.
Abdelali, Ahmed
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 143 - +
[40] Cross-Language Information Retrieval: An analysis of errors
Ruiz, ME
Srinivasan, P
ASIS '98 - PROCEEDINGS OF THE 61ST ASIS ANNUAL MEETING, VOL 35, 1998: INFORMATION ACCESS IN THE GLOBAL INFORMATION ECONOMY, 1998, 35 : 153 - 165

← 1 2 3 4 5 →