Injecting the BM25 Score as Text Improves BERT-Based Re-rankers

被引：10

作者：

Askari, Arian ^{[1
]}

Abolghasemi, Amin ^{[1
]}

Pasi, Gabriella ^{[2
]}

Kraaij, Wessel ^{[1
]}

Verberne, Suzan ^{[1
]}

机构：

[1] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands

[2] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy

来源：

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I | 2023年 / 13980卷

基金：

欧盟地平线“2020”;

关键词：

Injecting BM25; Two-stage retrieval; Transformer-based rankers; BM25; Combining lexical and neural rankers;

D O I：

10.1007/978-3-031-28244-7_5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we propose a novel approach for combining first-stage lexical retrieval models and Transformer-based re-rankers: we inject the relevance score of the lexical model as a token in the middle of the input of the cross-encoder re-ranker. It was shown in prior work that interpolation between the relevance score of lexical and BERT-based re-rankers may not consistently result in higher effectiveness. Our idea is motivated by the finding that BERT models can capture numeric information. We compare several representations of the BM25 score and inject them as text in the input of four different cross-encoders. We additionally analyze the effect for different query types, and investigate the effectiveness of our method for capturing exact matching relevance. Evaluation on the MSMARCO Passage collection and the TREC DL collections shows that the proposed method significantly improves over all cross-encoder re-rankers as well as the common interpolation methods. We show that the improvement is consistent for all query types. We also find an improvement in exact matching capabilities over both BM25 and the cross-encoders. Our findings indicate that cross-encoder re-rankers can efficiently be improved without additional computational burden and extra steps in the pipeline by explicitly adding the output of the first-stage ranker to the model input, and this effect is robust for different models and query types.

引用

页码：66 / 83

页数：18

共 5 条

[1] Injecting the score of the first-stage retriever as text improves BERT-based re-rankers
Askari, Arian
Abolghasemi, Amin
Pasi, Gabriella
Kraaij, Wessel
Verberne, Suzan
[J]. DISCOVER COMPUTING, 2024, 27 (01)
[2] INCREMENTAL CLUSTERING IN SHORT TEXT STREAMS BASED ON BM25
Xu, Lixin
Chen, Guang
Yang, Lei
[J]. 2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 8 - 12
[3] Multimodal Prediction of Social Responsiveness Score with BERT-Based Text Features
Saga, Takeshi
Tanaka, Hiroki
Iwasaka, Hidemi
Nakamura, Satoshi
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (03) : 578 - 586
[4] OnSeS: A Novel Online Short Text Summarization based on BM25 and Neural Network
Niu, Jianwei
Zhao, Qingjuan
Wang, Lei
Chen, Huan
Atiquzzaman, Mohammed
Peng, Fei
[J]. 2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
[5] BDNet: A BERT-based dual-path network for text-to-image cross-modal person re-identification
Liu, Qiang
He, Xiaohai
Teng, Qizhi
Qing, Linbo
Chen, Honggang
[J]. PATTERN RECOGNITION, 2023, 141

← 1 →