Pre-training Methods in Information Retrieval

被引:21
|
作者
Fan, Yixing [1 ]
Xie, Xiaohui [2 ]
Cai, Yinqiong [1 ]
Chen, Jia [2 ]
Ma, Xinyu [1 ]
Li, Xiangsheng [2 ]
Zhang, Ruqing [1 ]
Guo, Jiafeng [1 ]
机构
[1] Chinese Acad Sci, ICT, Beijing, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
MODELS;
D O I
10.1561/1500000100
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. In recent years, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Recently, a large number of works, which are dedicated to the application of PTMs in IR, have been introduced to promote the retrieval performance. Considering the rapid progress of this direction, this survey aims to provide a systematic review of pre-training methods in IR. To be specific, we present an overview of PTMs applied in different components of an IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and highlight several promising directions, with the hope of inspiring and facilitating more works on these topics for future research.
引用
收藏
页码:178 / 317
页数:140
相关论文
共 50 条
  • [1] Webformer: Pre-training with Web Pages for Information Retrieval
    Guo, Yu
    Ma, Zhengyi
    Mao, Jiaxin
    Qian, Hongjin
    Zhang, Xinyu
    Jiang, Hao
    Cao, Zhao
    Dou, Zhicheng
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1502 - 1512
  • [2] Condenser: a Pre-training Architecture for Dense Retrieval
    Gao, Luyu
    Callan, Jamie
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 981 - 993
  • [3] Pre-Training for Mathematics-Aware Retrieval
    Reusch, Anja
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3496 - 3496
  • [4] Pre-training Methods for Neural Machine Translation
    Wang, Mingxuan
    Li, Lei
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: TUTORIAL ABSTRACTS, 2021, : 21 - 25
  • [5] REALM: Retrieval-Augmented Language Model Pre-Training
    Guu, Kelvin
    Lee, Kenton
    Tung, Zora
    Pasupat, Panupong
    Chang, Ming-Wei
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [6] SIMLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval
    Wang, Liang
    Yang, Nan
    Huang, Xiaolong
    Jiao, Binxing
    Yang, Linjun
    Jiang, Daxin
    Majumder, Rangan
    Wei, Furu
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2244 - 2258
  • [7] Comparing Evolutionary Methods for Reservoir Computing Pre-training
    Ferreira, Aida A.
    Ludermir, Teresa B.
    [J]. 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 283 - 290
  • [8] GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
    Luo, Chuwei
    Cheng, Changxu
    Zheng, Qi
    Yao, Cong
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7092 - 7101
  • [9] Graph Neural Pre-training for Recommendation with Side Information
    Liu, Siwei
    Meng, Zaiqiao
    Macdonald, Craig
    Ounis, Iadh
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (03)
  • [10] A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval
    Ma, Xinyu
    Zhang, Ruqing
    Guo, Jiafeng
    Fan, Yixing
    Cheng, Xueqi
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4314 - 4318