Corpus-based cross-language information retrieval in retrieval of highly relevant documents

被引:10
|
作者
Talvensaari, Tuomas
Juhola, Martti
Laurikkala, Jorma
Jarvelin, Kalervo
机构
[1] Univ Tampere, Dept Comp Sci, FIN-33014 Tampere, Finland
[2] Univ Tampere, Dept Informat Studies, FIN-33014 Tampere, Finland
关键词
Information retrieval systems;
D O I
10.1002/asi.20495
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish-Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionary-based query translation program; the two translation methods were also combined. The results indicate that corpus-based CUR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.
引用
收藏
页码:322 / 334
页数:13
相关论文
共 50 条
  • [21] Combining evidence for cross-language information retrieval
    Kamps, J
    Monz, C
    de Rijke, M
    ADVANCES IN CROSS-LANGUAGE INFORMATION RETRIEVAL, 2003, 2785 : 111 - 126
  • [22] Disambiguation strategies for Cross-Language Information Retrieval
    Hiemstra, D
    de Jong, F
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 1999, 1696 : 274 - 293
  • [23] Matching meaning for cross-language information retrieval
    Wang, Jianqiang
    Oard, Douglas W.
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (04) : 631 - 653
  • [24] Influence of WSD on cross-language information retrieval
    Kang, IS
    Na, SH
    Lee, JH
    NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 358 - 366
  • [25] Different approaches to cross-language information retrieval
    Kraaij, W
    Pohlmann, R
    COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS 2000, 2001, (37): : 97 - 110
  • [26] Cross-Language Information Retrieval: An analysis of errors
    Ruiz, ME
    Srinivasan, P
    ASIS '98 - PROCEEDINGS OF THE 61ST ASIS ANNUAL MEETING, VOL 35, 1998: INFORMATION ACCESS IN THE GLOBAL INFORMATION ECONOMY, 1998, 35 : 153 - 165
  • [27] Cross-Language Information Retrieval in Web application
    Yu, SF
    Li, ZZ
    Thomassen, W
    ICCC2004: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION VOL 1AND 2, 2004, : 1198 - 1202
  • [28] Cross-language information retrieval: the way ahead
    Gey, FC
    Kando, N
    Peters, C
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (03) : 415 - 431
  • [29] Improving Retrieval Performance Of English-Hindi Based Cross-Language Information Retrieval
    Varshney, Saurabh
    Bajpai, Jyoti
    PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE IN MOOC, INNOVATION AND TECHNOLOGY IN EDUCATION (MITE), 2013, : 300 - 305
  • [30] Dictionary-based techniques for cross-language information retrieval
    Levow, GA
    Oard, DW
    Resnik, P
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (03) : 523 - 547