VQuAnDa: Verbalization QUestion ANswering DAtaset

被引：9

作者：

Kacupaj, Endri ^{[1
]}

Zafar, Hamid ^{[1
]}

Lehmann, Jens ^{[1
,2
]}

Maleshkova, Maria ^{[1
]}

机构：

[1] Univ Bonn, Bonn, Germany

[2] Fraunhofer IAIS, Dresden, Germany

来源：

SEMANTIC WEB (ESWC 2020) | 2020年 / 12123卷

关键词：

Verbalization; Question Answering; Knowledge Graph; Dataset;

D O I：

10.1007/978-3-030-49461-2_31

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Question Answering (QA) systems over Knowledge Graphs (KGs) aim to provide a concise answer to a given natural language question. Despite the significant evolution of QA methods over the past years, there are still some core lines of work, which are lagging behind. This is especially true for methods and datasets that support the verbalization of answers in natural language. Specifically, to the best of our knowledge, none of the existing Question Answering datasets provide any verbalization data for the question-query pairs. Hence, we aim to fill this gap by providing the first QA dataset VQuAnDa that includes the verbalization of each answer. We base VQuAnDa on a commonly used large-scale QA dataset - LC-QuAD, in order to support compatibility and continuity of previous work. We complement the dataset with baseline scores for measuring future training and evaluation work, by using a set of standard sequence to sequence models and sharing the results of the experiments. This resource empowers researchers to train and evaluate a variety of models to generate answer verbalizations.

引用

页码：531 / 547

页数：17

共 50 条

[1] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
Travis R. Goodwin
Dina Demner-Fushman
Kyle Lo
Lucy Lu Wang
Hoa T. Dang
Ian M. Soboroff
[J]. Scientific Data, 9
[2] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
Goodwin, Travis R.
Demner-Fushman, Dina
Lo, Kyle
Wang, Lucy Lu
Dang, Hoa T.
Soboroff, Ian M.
[J]. SCIENTIFIC DATA, 2022, 9 (01)
[3] PQuAD: A Persian question answering dataset
Darvishi, Kasra
Shahbodaghkhan, Newsha
Abbasiantaeb, Zahra
Momtazi, Saeedeh
[J]. COMPUTER SPEECH AND LANGUAGE, 2023, 80
[4] FQuAD: French Question Answering Dataset
d'Hoffschmidt, Martin
Belblidia, Wacim
Heinrich, Quentin
Brendle, Tom
Vidal, Maxime
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1193 - 1208
[5] Slovak Dataset for Multilingual Question Answering
Hladek, Daniel
Stas, Jan
Juhar, Jozef
Koctur, Tomas
[J]. IEEE ACCESS, 2023, 11 : 32869 - 32881
[6] LLQA - Lifelog Question Answering Dataset
Tran, Ly-Duyen
Thanh Cong Ho
Lan Anh Pham
Binh Nguyen
Gurrin, Cathal
Zhou, Liting
[J]. MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 217 - 228
[7] Question and Answer Classification in Czech Question Answering Benchmark Dataset
Kusnirakova, Dasa
Medved, Marek
Horak, Ales
[J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 701 - 706
[8] PubMedQA: A Dataset for Biomedical Research Question Answering
Jin, Qiao
Dhingra, Bhuwan
Liu, Zhengping
Cohen, William W.
Lu, Xinghua
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2567 - 2577
[9] ArabicaQA: A Comprehensive Dataset for Arabic Question Answering
Abdallah, Abdelrahman
Kasem, Mahmoud
Abdalla, Mahmoud
Mahmoud, Mohamed
Elkasaby, Mohamed
Elbendary, Yasser
Jatowt, Adam
[J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2049 - 2059
[10] VQuAD: Video Question Answering Diagnostic Dataset
Gupta, Vivek
Patro, Badri N.
Parihar, Hemant
Namboodiri, Vinay P.
[J]. 2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 282 - 291

← 1 2 3 4 5 →