A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

被引：0

作者：

Pappas, Dimitris ^{[1
,2
]}

Androutsopoulos, Ion ^{[1
]}

机构：

[1] Athens Univ Econ & Business, Dept Informat, Athens, Greece

[2] Res Ctr Athena, Inst Language & Speech Proc, Maroussi, Greece

来源：

59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Question answering (QA) systems for large document collections typically use pipelines that (i) retrieve possibly relevant documents, (ii) re-rank them, (iii) rank paragraphs or other snippets of the top-ranked documents, and (iv) select spans of the top-ranked snippets as exact answers. Pipelines are conceptually simple, but errors propagate from one component to the next, without later components being able to revise earlier decisions. We present an architecture for joint document and snippet ranking, the two middle stages, which leverages the intuition that relevant documents have good snippets and good snippets come from relevant documents. The architecture is general and can be used with any neural text relevance ranker. We experiment with two main instantiations of the architecture, based on POSITDRMM (PDRMM) and a BERT-based ranker. Experiments on biomedical data from BIOASQ show that our joint models vastly outperform the pipelines in snippet retrieval, the main goal for QA, with fewer trainable parameters, also remaining competitive in document retrieval. Furthermore, our joint PDRMM-based model is competitive with BERT-based models, despite using orders of magnitude fewer parameters. These claims are also supported by human evaluation on two test batches of BIOASQ. To test our key findings on another dataset, we modified the Natural Questions dataset so that it can also be used for document and snippet retrieval. Our joint PDRMM-based model again outperforms the corresponding pipeline in snippet retrieval on the modified Natural Questions dataset, even though it performs worse than the pipeline in document retrieval. We make our code and the modified Natural Questions dataset publicly available.

引用

页码：3896 / 3907

页数：12

共 50 条

[21] ICDAR 2021 Competition on Document Visual Question Answering
Tito, Ruben
Mathew, Minesh
Jawahar, C., V
Valveny, Ernest
Karatzas, Dimosthenis
[J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 635 - 649
[22] E-document retrieval by question answering system
Dang, Nguyen Tuan
Tuyen, Do Thi Thanh
[J]. World Academy of Science, Engineering and Technology, 2009, 38 : 395 - 398
[23] Neural ranking models for document retrieval
Trabelsi, Mohamed
Chen, Zhiyu
Davison, Brian D.
Heflin, Jeff
[J]. INFORMATION RETRIEVAL JOURNAL, 2021, 24 (06): : 400 - 444
[24] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
Mahamoud, Ibrahim Souleiman
Coustaty, Mickael
Joseph, Aurelie
d'Andecy, Vincent Poulain
Ogier, Jean-Marc
[J]. DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
[25] Neural ranking models for document retrieval
Mohamed Trabelsi
Zhiyu Chen
Brian D. Davison
Jeff Heflin
[J]. Information Retrieval Journal, 2021, 24 : 400 - 444
[26] Some Experiments in Question Answering with a Disambiguated Document Collection
Buscaldi, Davide
Rosso, Paolo
[J]. EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 442 - 447
[27] A Scalable Model for Tracking Topical Evolution in Large Document Collections
Naim, Sheikh Motahar
Boedihardjo, Arnold P.
Hossain, M. Shahriar
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 726 - 735
[28] Fast categorisation of large document collections
Shanks, V
Williams, HE
[J]. EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, : 194 - 204
[29] Facilitating Understanding of Large Document Collections
Bae, Jae Hyeon
Xu, Weijia
Esteva, Maria
[J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1334 - 1338
[30] Simple Question Answering with Subgraph Ranking and Joint-Scoring
Zhao, Wenbo
Chung, Tagyoung
Goyal, Anuj
Metallinou, Angeliki
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 324 - 334

← 1 2 3 4 5 →