ArabicaQA: A Comprehensive Dataset for Arabic Question Answering

被引:0
|
作者
Abdallah, Abdelrahman [1 ]
Kasem, Mahmoud [2 ]
Abdalla, Mahmoud [3 ]
Mahmoud, Mohamed [2 ]
Elkasaby, Mohamed [3 ]
Elbendary, Yasser [3 ]
Jatowt, Adam [1 ]
机构
[1] Univ Innsbruck, Innsbruck, Austria
[2] Assiut Univ, Assiut, Egypt
[3] DISCO AI, Cairo, Egypt
关键词
Arabic question answering; Question generation; Information retrieval; LLM;
D O I
10.1145/3626772.3657889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address the significant gap in Arabic natural language processing (NLP) resources by introducing ArabicaQA, the first large-scale dataset for machine reading comprehension and open-domain question answering in Arabic. This comprehensive dataset, consisting of 89,095 answerable and 3,701 unanswerable questions created by crowdworkers to look similar to answerable ones, along with additional labels of open-domain questions marks a crucial advancement in Arabic NLP resources. We also present AraDPR, the first dense passage retrieval model trained on the Arabic Wikipedia corpus, specifically designed to tackle the unique challenges of Arabic text retrieval. Furthermore, our study includes extensive benchmarking of large language models (LLMs) for Arabic question answering, critically evaluating their performance in the Arabic language context. In conclusion, ArabicaQA, AraDPR, and the benchmarking of LLMs in Arabic question answering offer significant advancements in the field of Arabic NLP. The dataset and code are publicly accessible for further research.
引用
收藏
页码:2049 / 2059
页数:11
相关论文
共 50 条
  • [1] DAWQAS: A Dataset for Arabic Why Question Answering System
    Ismail, Walaa Saber
    Homsi, Masun Nabhan
    [J]. ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 123 - 131
  • [2] A comprehensive survey of techniques for developing an Arabic question answering system
    Alkhurayyif, Yazeed
    Sait, Abdul Rahaman Wahab
    [J]. PEERJ COMPUTER SCIENCE, 2023, 9
  • [3] Arabic community question answering
    Nakov, Preslav
    Marquez, Lluis
    Moschitti, Alessandro
    Mubarak, Hamdy
    [J]. NATURAL LANGUAGE ENGINEERING, 2019, 25 (01) : 5 - 41
  • [4] Neural Arabic Question Answering
    Mozannar, Hussein
    El Hajal, Karl
    Maamary, Elie
    Hajj, Hazem
    [J]. FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 108 - 118
  • [5] ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram
    Oh, Jungwoo
    Lee, Gyubok
    Bae, Seongsu
    Kwon, Joon-Myoung
    Choi, Edward
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Question Classification for Arabic Question Answering Systems
    Al Chalabi, Hani Maluf
    Ray, Santosh Kumar
    Shaalan, Khaled
    [J]. 2015 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY RESEARCH (ICTRC), 2015, : 310 - 313
  • [7] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Travis R. Goodwin
    Dina Demner-Fushman
    Kyle Lo
    Lucy Lu Wang
    Hoa T. Dang
    Ian M. Soboroff
    [J]. Scientific Data, 9
  • [8] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Goodwin, Travis R.
    Demner-Fushman, Dina
    Lo, Kyle
    Wang, Lucy Lu
    Dang, Hoa T.
    Soboroff, Ian M.
    [J]. SCIENTIFIC DATA, 2022, 9 (01)
  • [9] Arabic Question Answering Using Ontology
    Albarghothi, Ali
    Khater, Feras
    Shaalan, Khaled
    [J]. ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 183 - 191
  • [10] Arabic question answering system: a survey
    Tahani H. Alwaneen
    Aqil M. Azmi
    Hatim A. Aboalsamh
    Erik Cambria
    Amir Hussain
    [J]. Artificial Intelligence Review, 2022, 55 : 207 - 253