I3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage Retrieval

被引：4

作者：

Dong, Qian ^{[1
,2
,3
]}

Liu, Yiding ^{[4
]}

Ai, Qingyao ^{[1
,2
,3
]}

Li, Haitao ^{[1
,2
,3
]}

Wang, Shuaiqiang ^{[4
]}

Liu, Yiqun ^{[1
,2
,3
]}

Yin, Dawei ^{[4
]}

Ma, Shaoping ^{[1
,2
,3
]}

机构：

[1] Tsinghua Univ, DCST, Beijing, Peoples R China

[2] Quan Cheng Lab, Beijing, Peoples R China

[3] Zhongguancun Lab, Beijing, Peoples R China

[4] Baidu Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年

关键词：

Learning to Rank; Language models; Semantic Matching;

D O I：

10.1145/3583780.3614923

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Passage retrieval is a fundamental task in many information systems, such as web search and question answering, where both efficiency and effectiveness are critical concerns. In recent years, neural retrievers based on pre-trained language models (PLM), such as dual-encoders, have achieved huge success. Yet, studies have found that the performance of dual-encoders are often limited due to the neglecting of the interaction information between queries and candidate passages. Therefore, various interaction paradigms have been proposed to improve the performance of vanilla dualencoders. Particularly, recent state-of-the-art methods often introduce late-interaction during the model inference process. However, such late-interaction based methods usually bring extensive computation and storage cost on large corpus. Despite their effectiveness, the concern of efficiency and space footprint is still an important factor that limits the application of interaction-based neural retrieval models. To tackle this issue, we Incorporate Implicit Interaction into dual-encoders, and propose I3 retriever. In particular, our implicit interaction paradigm leverages generated pseudo-queries to simulate query-passage interaction, which jointly optimizes with query and passage encoders in an end-to-end manner. It can be fully pre-computed and cached, and its inference process only involves simple dot product operation of the query vector and passage vector, which makes it as efficient as the vanilla dual encoders. We conduct comprehensive experiments on MSMARCO and TREC2019 Deep Learning Datasets, demonstrating the I-3 retriever's superiority in terms of both effectiveness and efficiency. Moreover, the proposed implicit interaction is compatible with special pre-training and knowledge distillation for passage retrieval, which brings a new state-of-the-art performance. The codes are available at https://github.com/Deriq-Qian-Dong/III-Retriever.

引用

页码：441 / 451

页数：11

共 50 条

[41] Understanding Online Attitudes with Pre-Trained Language Models
Power, William
Obradovic, Zoran
PROCEEDINGS OF THE 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2023, 2023, : 745 - 752
[42] Memorisation versus Generalisation in Pre-trained Language Models
Tanzer, Michael
Ruder, Sebastian
Rei, Marek
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7564 - 7578
[43] Exploring Lottery Prompts for Pre-trained Language Models
Chen, Yulin
Ding, Ning
Wang, Xiaobin
Hu, Shengding
Zheng, Hai-Tao
Liu, Zhiyuan
Xie, Pengjun
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15428 - 15444
[44] Context Analysis for Pre-trained Masked Language Models
Lai, Yi-An
Lalwani, Garima
Zhang, Yi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3789 - 3804
[45] On the Sentence Embeddings from Pre-trained Language Models
Li, Bohan
Zhou, Hao
He, Junxian
Wang, Mingxuan
Yang, Yiming
Li, Lei
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9119 - 9130
[46] Compressing Pre-trained Language Models by Matrix Decomposition
Ben Noach, Matan
Goldberg, Yoav
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 884 - 889
[47] Pre-trained language models for keyphrase prediction: A review
Umair, Muhammad
Sultana, Tangina
Lee, Young-Koo
ICT EXPRESS, 2024, 10 (04): : 871 - 890
[48] Machine Unlearning of Pre-trained Large Language Models
Yao, Jin
Chien, Eli
Du, Minxin
Niu, Xinyao
Wang, Tianhao
Cheng, Zezhou
Yue, Xiang
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 8403 - 8419
[49] Evaluating the Summarization Comprehension of Pre-Trained Language Models
D. I. Chernyshev
B. V. Dobrov
Lobachevskii Journal of Mathematics, 2023, 44 : 3028 - 3039
[50] Pre-trained models for natural language processing: A survey
QIU XiPeng
SUN TianXiang
XU YiGe
SHAO YunFan
DAI Ning
HUANG XuanJing
Science China(Technological Sciences), 2020, (10) : 1872 - 1897

← 1 2 3 4 5 →