A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections

被引:0
|
作者
Pappas, Dimitris [1 ,2 ]
Androutsopoulos, Ion [1 ]
机构
[1] Athens Univ Econ & Business, Dept Informat, Athens, Greece
[2] Res Ctr Athena, Inst Language & Speech Proc, Maroussi, Greece
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Question answering (QA) systems for large document collections typically use pipelines that (i) retrieve possibly relevant documents, (ii) re-rank them, (iii) rank paragraphs or other snippets of the top-ranked documents, and (iv) select spans of the top-ranked snippets as exact answers. Pipelines are conceptually simple, but errors propagate from one component to the next, without later components being able to revise earlier decisions. We present an architecture for joint document and snippet ranking, the two middle stages, which leverages the intuition that relevant documents have good snippets and good snippets come from relevant documents. The architecture is general and can be used with any neural text relevance ranker. We experiment with two main instantiations of the architecture, based on POSITDRMM (PDRMM) and a BERT-based ranker. Experiments on biomedical data from BIOASQ show that our joint models vastly outperform the pipelines in snippet retrieval, the main goal for QA, with fewer trainable parameters, also remaining competitive in document retrieval. Furthermore, our joint PDRMM-based model is competitive with BERT-based models, despite using orders of magnitude fewer parameters. These claims are also supported by human evaluation on two test batches of BIOASQ. To test our key findings on another dataset, we modified the Natural Questions dataset so that it can also be used for document and snippet retrieval. Our joint PDRMM-based model again outperforms the corresponding pipeline in snippet retrieval on the modified Natural Questions dataset, even though it performs worse than the pipeline in document retrieval. We make our code and the modified Natural Questions dataset publicly available.
引用
收藏
页码:3896 / 3907
页数:12
相关论文
共 50 条
  • [1] Question answering beyond CLEF document collections
    Costa, Luis
    [J]. Evaluation of Multilingual and Multi-modal Information Retrieval, 2007, 4730 : 405 - 414
  • [2] Recognition-Free Question Answering on Handwritten Document Collections
    Tueselmann, Oliver
    Mueller, Friedrich
    Wolf, Fabian
    Fink, Gernot A.
    [J]. FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 259 - 273
  • [3] Toward a document model for question answering systems
    Pérez-Coutiño, M
    Solorio, T
    Montes-y-Gómez, M
    López-López, A
    Villaseñor-Pineda, L
    [J]. ADVANCES IN WEB INTELLIGENCE, PROCEEDINGS, 2004, 3034 : 145 - 154
  • [4] Document Retrieval for Biomedical Question Answering with Neural Sentence Matching
    Noh, Jiho
    Kavuluru, Ramakanth
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 194 - 201
  • [5] RANKING LARGE DOCUMENT COLLECTIONS BY A STATE-SPACE SEARCH
    GORDON, MD
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1991, 27 (01) : 27 - 41
  • [6] Document image retrieval in a question answering system for document images
    Kise, K
    Fukushima, S
    Matsumoto, K
    [J]. DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 521 - 532
  • [7] Document Collection Visual Question Answering
    Tito, Ruben
    Karatzas, Dimosthenis
    Valveny, Ernest
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 778 - 792
  • [8] Document retrieval in the context of question answering
    Monz, C
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 571 - 579
  • [9] Convolutional Deep Neural Networks for Document-Based Question Answering
    Fu, Jian
    Qiu, Xipeng
    Huang, Xuanjing
    [J]. NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 790 - 797
  • [10] SESAME - self-supervised framework for extractive question answering over document collections
    Batista, Vitor A.
    Gomes, Diogo S. M.
    Evsukoff, Alexandre
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024,