TwEETQA: A Social Media Focused Question Answering Dataset

被引:0
|
作者
Xiong, Wenhan [1 ]
Wu, Jiawei [1 ]
Wang, Hong [1 ]
Kulkarni, Vivek [1 ]
Yu, Mo [2 ]
Chang, Shiyu [2 ]
Guo, Xiaoxiao [2 ]
Wang, William Yang [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] IBM Res, Yorktown Hts, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text.(1)
引用
收藏
页码:5020 / 5031
页数:12
相关论文
共 50 条
  • [21] PRAGMATICQA: A Dataset for Pragmatic Question Answering in Conversations
    Qi, Peng
    Du, Nina
    Manning, Christopher D.
    Huang, Jing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 6175 - 6191
  • [22] MemoriQA: A Question-Answering Lifelog Dataset
    Tran, Quang-Linh
    Nguyen, Binh
    Jones, Gareth J. F.
    Gurrin, Cathal
    PROCEEDINGS OF THE FIRST ACM WORKSHOP ON AI-POWERED QUESTION ANSWERING SYSTEMS FOR MULTIMEDIA, AIQAM 2024, 2024, : 7 - 12
  • [23] SYLLABUSQA: A Course Logistics Question Answering Dataset
    Fernandez, Nigel
    Scarlatos, Alexander
    Lan, Andrew
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 10344 - 10369
  • [24] A Portuguese Dataset for Evaluation of Semantic Question Answering
    de Araujo, Denis Andrei
    Rigo, Sandro Jose
    Quaresma, Paulo
    Muniz, Joao Henrique
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 217 - 227
  • [25] FOCUSED SEARCH OF SEMANTIC CASES IN QUESTION ANSWERING
    SINGER, M
    PARBERY, G
    JAKOBSON, LS
    MEMORY & COGNITION, 1988, 16 (02) : 147 - 157
  • [26] FOCUSED SEARCH OF SEMANTIC CASES IN QUESTION ANSWERING
    SINGER, MR
    PARBERY, GE
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1985, 23 (04) : 294 - 294
  • [27] Single-dataset Experts for Multi-dataset Question Answering
    Friedman, Dan
    Dodge, Ben
    Chen, Danqi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6128 - 6137
  • [28] Improvisation of Dataset Efficiency in Visual Question Answering Domain
    Mohamed, Sheerin Sitara Noor
    Srinivasan, Kavitha
    STATISTICS AND APPLICATIONS, 2022, 20 (02): : 279 - 289
  • [29] Dataset bias: A case study for visual question answering
    Das A.
    Anjum S.
    Gurari D.
    Proceedings of the Association for Information Science and Technology, 2019, 56 (01): : 58 - 67
  • [30] EgoVQA - An Egocentric Video Question Answering Benchmark Dataset
    Fan, Chenyou
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4359 - 4366