Neural Approaches to Multilingual Information Retrieval

被引:3
|
作者
Lawrie, Dawn [1 ]
Yang, Eugene [1 ]
Oard, Douglas W. [1 ,2 ]
Mayfield, James [1 ]
机构
[1] Johns Hopkins Univ, HLTCOE, Baltimore, MD 21211 USA
[2] Univ Maryland, College Pk, MD 20742 USA
关键词
Multilingual ad-hoc retrieval; ColBERT-X; DPR-X; Multilingual training of MPLM;
D O I
10.1007/978-3-031-28244-7_33
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Providing access to information across languages has been a goal of Information Retrieval (IR) for decades. While progress has been made on Cross Language IR (CLIR) where queries are expressed in one language and documents in another, the multilingual (MLIR) task to create a single ranked list of documents across many languages is considerably more challenging. This paper investigates whether advances in neural document translation and pretrained multilingual neural language models enable improvements in the state of the art over earlier MLIR techniques. The results show that although combining neural document translation with neural ranking yields the best Mean Average Precision (MAP), 98% of that MAP score can be achieved with an 84% reduction in indexing time by using a pretrained XLM-R multilingual language model to index documents in their native language, and that 2% difference in effectiveness is not statistically significant. Key to achieving these results for MLIR is to fine-tune XLM-R using mixed-language batches from neural translations of MS MARCO passages.
引用
收藏
页码:521 / 536
页数:16
相关论文
共 50 条
  • [1] MIRACLE approaches to multilingual information retrieval:: A baseline for future research
    Martínez, JL
    Villena, J
    Fombella, J
    Serrano, AG
    Martínez, P
    Goñi, JM
    González, JC
    [J]. COMPARATIVE EVALUATION OF MULTILINGUAL INFORMATION ACCESS SYSTEMS, 2003, 3237 : 210 - 219
  • [2] A multilingual approach to multilingual information retrieval
    Nie, JY
    Jin, F
    [J]. ADVANCES IN CROSS-LANGUAGE INFORMATION RETRIEVAL, 2003, 2785 : 101 - 110
  • [3] Using Corpus-Based Approaches in a System for Multilingual Information Retrieval
    Martin Braschler
    Peter Schäuble
    [J]. Information Retrieval, 2000, 3 : 273 - 284
  • [4] Using corpus-based approaches in a system for multilingual information retrieval
    Braschler, M
    Schäuble, P
    [J]. INFORMATION RETRIEVAL, 2000, 3 (03): : 273 - 284
  • [5] Distillation for Multilingual Information Retrieval
    Yang, Eugene
    Lawrie, Dawn
    Mayfield, James
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2368 - 2373
  • [6] Multilingual information retrieval system
    Hong, Z
    Syin, C
    Lia, KF
    [J]. MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS, 1996, 2916 : 33 - 44
  • [7] Information Retrieval in Multilingual Environment
    Chaware, S. M.
    Rao, Srikantha
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 198 - +
  • [8] Combination approaches for multilingual text retrieval
    Braschler, M
    [J]. INFORMATION RETRIEVAL, 2004, 7 (1-2): : 183 - 204
  • [9] Combination Approaches for Multilingual Text Retrieval
    Martin Braschler
    [J]. Information Retrieval, 2004, 7 : 183 - 204
  • [10] Multilingual Information Retrieval using GHSOM
    Yang, Hsin-Chang
    Lee, Chung-Hong
    [J]. ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 1, PROCEEDINGS, 2008, : 225 - +