FQuAD: French Question Answering Dataset

被引:0
|
作者
d'Hoffschmidt, Martin [1 ]
Belblidia, Wacim [1 ]
Heinrich, Quentin [1 ]
Brendle, Tom [1 ]
Vidal, Maxime [2 ]
机构
[1] Illuin Technol, Paris, France
[2] Swiss Fed Inst Technol, Zurich, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in the field of language modeling have improved state-of-the-art results on many Natural Language Processing tasks. Among them, Reading Comprehension has made significant progress over the past few years. However, most results are reported in English since labeled resources available in other languages, such as French, remain scarce. In the present work, we introduce the French Question Answering Dataset (FQuAD). FQuAD is a French Native Reading Comprehension dataset of questions and answers on a set of Wikipedia articles that consists of 25,000+ samples for the 1.0 version and 60,000+ samples for the 1.1 version. We train a baseline model which achieves an F1 score of 92.2 and an exact match ratio of 82.1 on the test set. In an effort to track the progress of French Question Answering models we propose a leaderboard and we have made the 1.0 version of our dataset freely available at https://illuin-tech. github.io/FQuAD- explorer/.
引用
收藏
页码:1193 / 1208
页数:16
相关论文
共 50 条
  • [11] PubMedQA: A Dataset for Biomedical Research Question Answering
    Jin, Qiao
    Dhingra, Bhuwan
    Liu, Zhengping
    Cohen, William W.
    Lu, Xinghua
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2567 - 2577
  • [12] ArabicaQA: A Comprehensive Dataset for Arabic Question Answering
    Abdallah, Abdelrahman
    Kasem, Mahmoud
    Abdalla, Mahmoud
    Mahmoud, Mohamed
    Elkasaby, Mohamed
    Elbendary, Yasser
    Jatowt, Adam
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2049 - 2059
  • [13] VQuAD: Video Question Answering Diagnostic Dataset
    Gupta, Vivek
    Patro, Badri N.
    Parihar, Hemant
    Namboodiri, Vinay P.
    [J]. 2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 282 - 291
  • [14] PerCQA: Persian Community Question Answering Dataset
    Jamali, Naghme
    Yaghoobzadeh, Yadollah
    Faili, Heshaam
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6083 - 6092
  • [15] MemoriQA: A Question-Answering Lifelog Dataset
    Tran, Quang-Linh
    Nguyen, Binh
    Jones, Gareth J. F.
    Gurrin, Cathal
    [J]. PROCEEDINGS OF THE FIRST ACM WORKSHOP ON AI-POWERED QUESTION ANSWERING SYSTEMS FOR MULTIMEDIA, AIQAM 2024, 2024, : 7 - 12
  • [16] Towards a Polish Question Answering Dataset (PoQuAD)
    Tuora, Ryszard
    Zawadzka-Paluektau, Natalia
    Klamra, Cezary
    Zwierzchowska, Aleksandra
    Kobylinski, Lukasz
    [J]. FROM BORN-PHYSICAL TO BORN-VIRTUAL: AUGMENTING INTELLIGENCE IN DIGITAL LIBRARIES, ICADL 2022, 2022, 13636 : 194 - 203
  • [17] TutorialVQA: Question Answering Dataset for Tutorial Videos
    Colas, Anthony
    Kim, Seokhwan
    Dernoncourt, Franck
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5450 - 5455
  • [18] QED: A Framework and Dataset for Explanations in Question Answering
    Lamm, Matthew
    Palomaki, Jennimaria
    Alberti, Chris
    Andor, Daniel
    Choi, Eunsol
    Soares, Livio Baldini
    Collins, Michael
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 790 - 806
  • [19] A Portuguese Dataset for Evaluation of Semantic Question Answering
    de Araujo, Denis Andrei
    Rigo, Sandro Jose
    Quaresma, Paulo
    Muniz, Joao Henrique
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 217 - 227
  • [20] Single-dataset Experts for Multi-dataset Question Answering
    Friedman, Dan
    Dodge, Ben
    Chen, Danqi
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6128 - 6137