TwEETQA: A Social Media Focused Question Answering Dataset

被引:0
|
作者
Xiong, Wenhan [1 ]
Wu, Jiawei [1 ]
Wang, Hong [1 ]
Kulkarni, Vivek [1 ]
Yu, Mo [2 ]
Chang, Shiyu [2 ]
Guo, Xiaoxiao [2 ]
Wang, William Yang [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] IBM Res, Yorktown Hts, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text.(1)
引用
收藏
页码:5020 / 5031
页数:12
相关论文
共 50 条
  • [31] RuBQ 2.0: An Innovated Russian Question Answering Dataset
    Rybin, Ivan
    Korablinov, Vladislav
    Efimov, Pavel
    Braslavski, Pavel
    SEMANTIC WEB, ESWC 2021, 2021, 12731 : 532 - 547
  • [32] Building a benchmark dataset for the Kurdish news question answering
    Saeed, Ari M.
    DATA IN BRIEF, 2024, 57
  • [33] A dataset for medical instructional video classification and question answering
    Gupta, Deepak
    Attal, Kush
    Demner-Fushman, Dina
    SCIENTIFIC DATA, 2023, 10 (01)
  • [34] OVQA: A Clinically Generated Visual Question Answering Dataset
    Huang, Yefan
    Wang, Xiaoli
    Liu, Feiyan
    Huang, Guofeng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2924 - 2938
  • [35] A Large Visual Question Answering Dataset for Cultural Heritage
    Asprino, Luigi
    Bulla, Luana
    Marinucci, Ludovica
    Mongiovi, Misael
    Presutti, Valentina
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE (LOD 2021), PT II, 2022, 13164 : 193 - 197
  • [36] DermaVQA: A Multilingual Visual Question Answering Dataset for Dermatology
    Yim, Wen-wai
    Fu, Yujuan
    Sun, Zhaoyi
    Ben Abacha, Asma
    Yetisgen, Meliha
    Xia, Fei
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 209 - 219
  • [37] PersianQuAD: The Native Question Answering Dataset for the Persian Language
    Kazemi, Arefeh
    Mozafari, Jamshid
    Nematbakhsh, Mohammad Ali
    IEEE Access, 2022, 10 : 26045 - 26057
  • [38] PersianQuAD: The Native Question Answering Dataset for the Persian Language
    Kazemi, Arefeh
    Mozafari, Jamshid
    Nematbakhsh, Mohammad Ali
    IEEE ACCESS, 2022, 10 : 26045 - 26057
  • [39] TheoremQA: A Theorem-driven Question Answering Dataset
    Chen, Wenhu
    Yin, Ming
    Ku, Max
    Lu, Pan
    Wan, Yixin
    Ma, Xueguang
    Xu, Jianyu
    Wang, Xinyi
    Xia, Tony
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7889 - 7901
  • [40] DAWQAS: A Dataset for Arabic Why Question Answering System
    Ismail, Walaa Saber
    Homsi, Masun Nabhan
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 123 - 131