FQuAD: French Question Answering Dataset

被引:0
|
作者
d'Hoffschmidt, Martin [1 ]
Belblidia, Wacim [1 ]
Heinrich, Quentin [1 ]
Brendle, Tom [1 ]
Vidal, Maxime [2 ]
机构
[1] Illuin Technol, Paris, France
[2] Swiss Fed Inst Technol, Zurich, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in the field of language modeling have improved state-of-the-art results on many Natural Language Processing tasks. Among them, Reading Comprehension has made significant progress over the past few years. However, most results are reported in English since labeled resources available in other languages, such as French, remain scarce. In the present work, we introduce the French Question Answering Dataset (FQuAD). FQuAD is a French Native Reading Comprehension dataset of questions and answers on a set of Wikipedia articles that consists of 25,000+ samples for the 1.0 version and 60,000+ samples for the 1.1 version. We train a baseline model which achieves an F1 score of 92.2 and an exact match ratio of 82.1 on the test set. In an effort to track the progress of French Question Answering models we propose a leaderboard and we have made the 1.0 version of our dataset freely available at https://illuin-tech. github.io/FQuAD- explorer/.
引用
收藏
页码:1193 / 1208
页数:16
相关论文
共 50 条
  • [1] FQuAD2.0: French Question Answering and Learning When You Don't Know
    Heinrich, Quentin
    Viaud, Gautier
    Belblidia, Wacim
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2205 - 2214
  • [2] Project PIAF: Building a Native French Question-Answering Dataset
    Keraron, Rachel
    Lancrenon, Guillaume
    Bras, Mathilde
    Allary, Frederic
    Moyse, Gilles
    Scialom, Thomas
    Soriano-Morales, Edmundo-Pavel
    Staiano, Jacopo
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5481 - 5490
  • [3] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Travis R. Goodwin
    Dina Demner-Fushman
    Kyle Lo
    Lucy Lu Wang
    Hoa T. Dang
    Ian M. Soboroff
    [J]. Scientific Data, 9
  • [4] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Goodwin, Travis R.
    Demner-Fushman, Dina
    Lo, Kyle
    Wang, Lucy Lu
    Dang, Hoa T.
    Soboroff, Ian M.
    [J]. SCIENTIFIC DATA, 2022, 9 (01)
  • [5] PQuAD: A Persian question answering dataset
    Darvishi, Kasra
    Shahbodaghkhan, Newsha
    Abbasiantaeb, Zahra
    Momtazi, Saeedeh
    [J]. COMPUTER SPEECH AND LANGUAGE, 2023, 80
  • [6] Slovak Dataset for Multilingual Question Answering
    Hladek, Daniel
    Stas, Jan
    Juhar, Jozef
    Koctur, Tomas
    [J]. IEEE ACCESS, 2023, 11 : 32869 - 32881
  • [7] VQuAnDa: Verbalization QUestion ANswering DAtaset
    Kacupaj, Endri
    Zafar, Hamid
    Lehmann, Jens
    Maleshkova, Maria
    [J]. SEMANTIC WEB (ESWC 2020), 2020, 12123 : 531 - 547
  • [8] LLQA - Lifelog Question Answering Dataset
    Tran, Ly-Duyen
    Thanh Cong Ho
    Lan Anh Pham
    Binh Nguyen
    Gurrin, Cathal
    Zhou, Liting
    [J]. MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 217 - 228
  • [9] A question answering system for French
    Perret, L
    [J]. MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 392 - 403
  • [10] Question and Answer Classification in Czech Question Answering Benchmark Dataset
    Kusnirakova, Dasa
    Medved, Marek
    Horak, Ales
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 701 - 706