VQuAnDa: Verbalization QUestion ANswering DAtaset

被引:9
|
作者
Kacupaj, Endri [1 ]
Zafar, Hamid [1 ]
Lehmann, Jens [1 ,2 ]
Maleshkova, Maria [1 ]
机构
[1] Univ Bonn, Bonn, Germany
[2] Fraunhofer IAIS, Dresden, Germany
来源
SEMANTIC WEB (ESWC 2020) | 2020年 / 12123卷
关键词
Verbalization; Question Answering; Knowledge Graph; Dataset;
D O I
10.1007/978-3-030-49461-2_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Question Answering (QA) systems over Knowledge Graphs (KGs) aim to provide a concise answer to a given natural language question. Despite the significant evolution of QA methods over the past years, there are still some core lines of work, which are lagging behind. This is especially true for methods and datasets that support the verbalization of answers in natural language. Specifically, to the best of our knowledge, none of the existing Question Answering datasets provide any verbalization data for the question-query pairs. Hence, we aim to fill this gap by providing the first QA dataset VQuAnDa that includes the verbalization of each answer. We base VQuAnDa on a commonly used large-scale QA dataset - LC-QuAD, in order to support compatibility and continuity of previous work. We complement the dataset with baseline scores for measuring future training and evaluation work, by using a set of standard sequence to sequence models and sharing the results of the experiments. This resource empowers researchers to train and evaluate a variety of models to generate answer verbalizations.
引用
收藏
页码:531 / 547
页数:17
相关论文
共 50 条
  • [21] Building a benchmark dataset for the Kurdish news question answering
    Saeed, Ari M.
    [J]. DATA IN BRIEF, 2024, 57
  • [22] EgoVQA - An Egocentric Video Question Answering Benchmark Dataset
    Fan, Chenyou
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4359 - 4366
  • [23] OVQA: A Clinically Generated Visual Question Answering Dataset
    Huang, Yefan
    Wang, Xiaoli
    Liu, Feiyan
    Huang, Guofeng
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2924 - 2938
  • [24] A Large Visual Question Answering Dataset for Cultural Heritage
    Asprino, Luigi
    Bulla, Luana
    Marinucci, Ludovica
    Mongiovi, Misael
    Presutti, Valentina
    [J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE (LOD 2021), PT II, 2022, 13164 : 193 - 197
  • [25] A dataset for medical instructional video classification and question answering
    Gupta, Deepak
    Attal, Kush
    Demner-Fushman, Dina
    [J]. SCIENTIFIC DATA, 2023, 10 (01)
  • [26] ToolQA: A Dataset for LLM Question Answering with External Tools
    Zhuang, Yuchen
    Yu, Yue
    Wang, Kuan
    Sun, Haotian
    Zhang, Chao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] DAWQAS: A Dataset for Arabic Why Question Answering System
    Ismail, Walaa Saber
    Homsi, Masun Nabhan
    [J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 123 - 131
  • [28] QASC: A Dataset for Question Answering via Sentence Composition
    Khot, Tushar
    Clark, Peter
    Guerquin, Michal
    Jansen, Peter
    Sabharwal, Ashish
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8082 - 8090
  • [29] A dataset for medical instructional video classification and question answering
    Deepak Gupta
    Kush Attal
    Dina Demner-Fushman
    [J]. Scientific Data, 10
  • [30] MultiSpanQA: A Dataset for Multi-Span Question Answering
    Li, Haonan
    Vasardani, Maria
    Tomko, Martin
    Baldwin, Timothy
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1250 - 1260