Corpus-based cross-language information retrieval in retrieval of highly relevant documents

被引:10
|
作者
Talvensaari, Tuomas
Juhola, Martti
Laurikkala, Jorma
Jarvelin, Kalervo
机构
[1] Univ Tampere, Dept Comp Sci, FIN-33014 Tampere, Finland
[2] Univ Tampere, Dept Informat Studies, FIN-33014 Tampere, Finland
关键词
Information retrieval systems;
D O I
10.1002/asi.20495
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish-Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionary-based query translation program; the two translation methods were also combined. The results indicate that corpus-based CUR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.
引用
收藏
页码:322 / 334
页数:13
相关论文
共 50 条
  • [31] Automatic construction of parallel English-Chinese corpus for cross-language information retrieval
    Chen, J
    Nie, JY
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 21 - 28
  • [32] Approaches to Cross-Language Retrieval of Similar Legal Documents Based on Machine Learning
    Zhebel, V. V.
    Devyatkin, D. A.
    Zubarev, D. V.
    Sochenkov, I. V.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2023, 50 (05) : 494 - 499
  • [33] Approaches to Cross-Language Retrieval of Similar Legal Documents Based on Machine Learning
    V. V. Zhebel
    D. A. Devyatkin
    D. V. Zubarev
    I. V. Sochenkov
    Scientific and Technical Information Processing, 2023, 50 : 494 - 499
  • [34] Domain-specific cross-language relevant question retrieval
    Xu, Bowen
    Xing, Zhenchang
    Xia, Xin
    Lo, David
    Li, Shanping
    EMPIRICAL SOFTWARE ENGINEERING, 2018, 23 (02) : 1084 - 1122
  • [35] Domain-specific cross-language relevant question retrieval
    Bowen Xu
    Zhenchang Xing
    Xin Xia
    David Lo
    Shanping Li
    Empirical Software Engineering, 2018, 23 : 1084 - 1122
  • [36] Domain-Specific Cross-Language Relevant Question Retrieval
    Xu, Bowen
    Xing, Zhenchang
    Xia, Xin
    Lo, David
    Wang, Qingye
    Li, Shanping
    13TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2016), 2016, : 413 - 424
  • [37] Word sense disambiguation for cross-language information retrieval
    Liu, MX
    Diamond, T
    Diekema, AR
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : B35 - B40
  • [38] Using Mutual Information Technique in Cross-Language Information Retrieval
    Sari, Syandra
    Adriani, Mirna
    DIGITAL LIBRARIES: UNIVERSAL AND UBIQUITOUS ACCESS TO INFORMATION, PROCEEDINGS, 2008, 5362 : 276 - +
  • [39] Online Learning to Rank for Cross-Language Information Retrieval
    Rahimi, Razieh
    Shakery, Azadeh
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1033 - 1036
  • [40] Comparative evaluation of cross-language information retrieval systems
    Peters, Carol
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2005, 3379 LNCS : 152 - 161