Natural Questions: A Benchmark for Question Answering Research

被引:767
|
作者
Kwiatkowski, Tom [1 ]
Palomaki, Jennimaria [1 ]
Redfield, Olivia [1 ]
Collins, Michael [1 ,2 ]
Parikh, Ankur [1 ]
Alberti, Chris [1 ]
Epstein, Danielle [1 ]
Polosukhin, Illia [1 ]
Devlin, Jacob [1 ]
Lee, Kenton [1 ]
Toutanova, Kristina [1 ]
Jones, Llion [1 ]
Kelcey, Matthew [1 ]
Chang, Ming-Wei [1 ]
Dai, Andrew M. [1 ]
Uszkoreit, Jakob [1 ]
Quoc Le [1 ]
Petrov, Slav [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Columbia Univ, New York, NY 10027 USA
关键词
D O I
10.1162/tacl_a_00276/1923288
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.
引用
收藏
页码:453 / 466
页数:14
相关论文
共 50 条
  • [1] Natural Questions: A Benchmark for Question Answering Research
    Kwiatkowski T.
    Palomaki J.
    Redfield O.
    Collins M.
    Parikh A.
    Alberti C.
    Epstein D.
    Polosukhin I.
    Devlin J.
    Lee K.
    Toutanova K.
    Jones L.
    Kelcey M.
    Chang M.-W.
    Dai A.M.
    Uszkoreit J.
    Le Q.
    Petrov S.
    [J]. Transactions of the Association for Computational Linguistics, 2019, 7 : 453 - 466
  • [2] TempQuestions: A Benchmark for Temporal Question Answering
    Jia, Zhen
    Abujabal, Abdalghani
    Roy, Rishiraj Saha
    Stroetgen, Jannik
    Weikum, Gerhard
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1057 - 1062
  • [3] Routing Questions for Collaborative Answering in Community Question Answering
    Chang, Shuo
    Pal, Aditya
    [J]. 2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2013, : 500 - 507
  • [4] Question and Answer Classification in Czech Question Answering Benchmark Dataset
    Kusnirakova, Dasa
    Medved, Marek
    Horak, Ales
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 701 - 706
  • [5] KoBBQ: Korean Bias Benchmark for Question Answering
    Jin, Jiho
    Kim, Jiseon
    Lee, Nayeon
    Yoo, Haneul
    Oh, Alice
    Lee, Hwaran
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 507 - 524
  • [7] Localized Questions in Medical Visual Question Answering
    Tascon-Morales, Sergio
    Marquez-Neila, Pablo
    Sznitman, Raphael
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 361 - 370
  • [8] AgXQA: A benchmark for advanced Agricultural Extension question answering
    Kpodo, Josue
    Kordjamshidi, Parisa
    Nejadhashemi, A. Pouyan
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 225
  • [9] The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge
    Sören Auer
    Dante A. C. Barone
    Cassiano Bartz
    Eduardo G. Cortes
    Mohamad Yaser Jaradeh
    Oliver Karras
    Manolis Koubarakis
    Dmitry Mouromtsev
    Dmitrii Pliukhin
    Daniil Radyush
    Ivan Shilin
    Markus Stocker
    Eleni Tsalapati
    [J]. Scientific Reports, 13
  • [10] The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge
    Auer, Soeren
    Barone, Dante A. C.
    Bartz, Cassiano
    Cortes, Eduardo G.
    Jaradeh, Mohamad Yaser
    Karras, Oliver
    Koubarakis, Manolis
    Mouromtsev, Dmitry
    Pliukhin, Dmitrii
    Radyush, Daniil
    Shilin, Ivan
    Stocker, Markus
    Tsalapati, Eleni
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)