Match Your Words! A Study of Lexical Matching in Neural Information Retrieval

被引:7
|
作者
Formal, Thibault [1 ,2 ]
Piwowarski, Benjamin [2 ,3 ]
Clinchant, Stephane [1 ]
机构
[1] Naver Labs Europe, Meylan, France
[2] Sorbonne Univ, Inst Intelligent Syst & Robot, UMR 7222, Paris, France
[3] CNRS, Paris, France
来源
关键词
Neural Information Retrieval; BERT; Lexical matching;
D O I
10.1007/978-3-030-99739-7_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural Information Retrieval models hold the promise to replace lexical matching models, e.g. BM25, in modern search engines. While their capabilities have fully shone on in-domain datasets like MS MARCO, they have recently been challenged on out-of-domain zero-shot settings (BEIR benchmark), questioning their actual generalization capabilities compared to bag-of-words approaches. Particularly, we wonder if these shortcomings could (partly) be the consequence of the inability of neural IR models to perform lexical matching off-the-shelf. In this work, we propose a measure of discrepancy between the lexical matching performed by any (neural) model and an "ideal" one. Based on this, we study the behavior of different state-of-the-art neural IR models, focusing on whether they are able to perform lexical matching when it's actually useful, i.e. for important terms. Overall, we show that neural IR models fail to properly generalize term importance on out-of-domain collections or terms almost unseen during training.
引用
收藏
页码:120 / 127
页数:8
相关论文
共 50 条
  • [1] Having Your Cake and Eating it Too: Training Neural Retrieval for Language Inference without Losing Lexical Match
    Yadav, Vikas
    Bethard, Steven
    Surdeanu, Mihai
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1625 - 1628
  • [2] COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List
    Gao, Luyu
    Dai, Zhuyun
    Callan, Jamie
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3030 - 3042
  • [3] LEXICAL STORAGE AND RETRIEVAL OF PREFIXED WORDS
    TAFT, M
    FORSTER, KI
    [J]. JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR, 1975, 14 (06): : 638 - 647
  • [4] A neural basis for lexical retrieval
    Damasio, H
    Grabowski, TJ
    Tranel, D
    Hichwa, RD
    Damasio, AR
    [J]. NATURE, 1996, 380 (6574) : 499 - 505
  • [5] LEXICAL STORAGE AND RETRIEVAL OF POLYMORPHEMIC AND POLYSYLLABIC WORDS
    TAFT, M
    FORSTER, KI
    [J]. JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR, 1976, 15 (06): : 607 - 620
  • [6] Lexical entailment for information retrieval
    Clinchant, Stephane
    Goutte, Cyril
    Gaussier, Eric
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2006, 3936 : 217 - 228
  • [7] LEXICAL AMBIGUITY AND INFORMATION-RETRIEVAL
    KROVETZ, R
    CROFT, WB
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1992, 10 (02) : 115 - 141
  • [8] Computing with words in Information Retrieval
    Berzal, F
    Martín-Bautista, MJ
    Vila, MA
    Larsen, HL
    [J]. JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 3088 - 3092
  • [9] Lexical and Syntactic knowledge for Information Retrieval
    Ferrandez, Antonio
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (05) : 692 - 705
  • [10] HyperLex:: lexical cartography for information retrieval
    Véronis, J
    [J]. COMPUTER SPEECH AND LANGUAGE, 2004, 18 (03): : 223 - 252