Using corpus-based approaches in a system for multilingual information retrieval

被引:13
|
作者
Braschler, M [1 ]
Schäuble, P [1 ]
机构
[1] Eurospider Informat Technol AG, CH-8006 Zurich, Switzerland
来源
INFORMATION RETRIEVAL | 2000年 / 3卷 / 03期
关键词
multilingual information retrieval; cross-language information retrieval; corpus-based approaches; document alignments;
D O I
10.1023/A:1026525127581
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.
引用
收藏
页码:273 / 284
页数:12
相关论文
共 50 条
  • [1] Using Corpus-Based Approaches in a System for Multilingual Information Retrieval
    Martin Braschler
    Peter Schäuble
    [J]. Information Retrieval, 2000, 3 : 273 - 284
  • [2] Corpus-based semantic role approach in information retrieval
    Moreda, Palorna
    Navarro, Borja
    Palomar, Manuel
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 61 (03) : 467 - 483
  • [3] Neural Approaches to Multilingual Information Retrieval
    Lawrie, Dawn
    Yang, Eugene
    Oard, Douglas W.
    Mayfield, James
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 : 521 - 536
  • [4] Corpus-based Approaches to ELT
    Curado Fuentes, Alejandro
    [J]. IBERICA, 2011, (21): : 174 - 177
  • [5] Corpus-based cross-language information retrieval in retrieval of highly relevant documents
    Talvensaari, Tuomas
    Juhola, Martti
    Laurikkala, Jorma
    Jarvelin, Kalervo
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (03): : 322 - 334
  • [6] An axiomatic approach to corpus-based cross-language information retrieval
    Rahimi, Razieh
    Montazeralghaem, Ali
    Shakery, Azadeh
    [J]. INFORMATION RETRIEVAL JOURNAL, 2020, 23 (03): : 191 - 215
  • [7] Complementing WordNet with Roget's and corpus-based thesauri for information retrieval
    Mandala, R
    Tokunaga, T
    Tanaka, H
    [J]. NINTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS, 1999, : 94 - 101
  • [8] An axiomatic approach to corpus-based cross-language information retrieval
    Razieh Rahimi
    Ali Montazeralghaem
    Azadeh Shakery
    [J]. Information Retrieval Journal, 2020, 23 : 191 - 215
  • [9] Multilingual information retrieval system
    Hong, Z
    Syin, C
    Lia, KF
    [J]. MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS, 1996, 2916 : 33 - 44
  • [10] Corpus-based Error Detection in a Multilingual Medical Thesaurus
    Andrade, Roosewelt L.
    Pacheco, Edson
    Cancian, Pindaro S.
    Nohama, Percy
    Schulz, Stefan
    [J]. MEDINFO 2007: PROCEEDINGS OF THE 12TH WORLD CONGRESS ON HEALTH (MEDICAL) INFORMATICS, PTS 1 AND 2: BUILDING SUSTAINABLE HEALTH SYSTEMS, 2007, 129 : 529 - +