TwEETQA: A Social Media Focused Question Answering Dataset

被引:0
|
作者
Xiong, Wenhan [1 ]
Wu, Jiawei [1 ]
Wang, Hong [1 ]
Kulkarni, Vivek [1 ]
Yu, Mo [2 ]
Chang, Shiyu [2 ]
Guo, Xiaoxiao [2 ]
Wang, William Yang [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] IBM Res, Yorktown Hts, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text.(1)
引用
收藏
页码:5020 / 5031
页数:12
相关论文
共 50 条
  • [1] Transformer-Based Extractive Social Media Question Answering on TweetQA
    Butt, Sabur
    Ashraf, Noman
    Fahim, Hammad
    Sidorov, Grigori
    Gelbukh, Alexander
    COMPUTACION Y SISTEMAS, 2021, 25 (01): : 23 - 32
  • [2] AttractionDetailsQA: An Attraction Details Focused on Chinese Question Answering Dataset
    Huang, Weiming
    Xu, Shiting
    Wang Yuhan
    Jin Fan
    Chang, Qingling
    IEEE ACCESS, 2022, 10 : 86215 - 86221
  • [3] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Travis R. Goodwin
    Dina Demner-Fushman
    Kyle Lo
    Lucy Lu Wang
    Hoa T. Dang
    Ian M. Soboroff
    Scientific Data, 9
  • [4] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Goodwin, Travis R.
    Demner-Fushman, Dina
    Lo, Kyle
    Wang, Lucy Lu
    Dang, Hoa T.
    Soboroff, Ian M.
    SCIENTIFIC DATA, 2022, 9 (01)
  • [5] Educational Question Answering based on Social Media Content
    Gurevych, Iryna
    Bernhard, Delphine
    Ignatova, Kateryna
    Toprak, Cigdem
    ARTIFICIAL INTELLIGENCE IN EDUCATION: BUILDING LEARNING SYSTEMS THAT CARE: FROM KNOWLEDGE REPRESENTATION TO AFFECTIVE MODELLING, 2009, 200 : 133 - +
  • [6] Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset
    Chen, Zhanwen
    Li, Shiyao
    Rashedi, Roxanne
    Zi, Xiaoman
    Elrod-Erickson, Morgan
    Hollis, Bryan
    Maliakal, Angela
    Shen, Xinyu
    Zhao, Simeng
    Kunda, Maithilee
    10TH IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB 2020), 2020,
  • [7] QookA: A Cooking Question Answering Dataset
    Frummet, Alexander
    Elsweiler, David
    PROCEEDINGS OF THE 2024 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL, CHIIR 2024, 2024, : 406 - 410
  • [8] PQuAD: A Persian question answering dataset
    Darvishi, Kasra
    Shahbodaghkhan, Newsha
    Abbasiantaeb, Zahra
    Momtazi, Saeedeh
    COMPUTER SPEECH AND LANGUAGE, 2023, 80
  • [9] FQuAD: French Question Answering Dataset
    d'Hoffschmidt, Martin
    Belblidia, Wacim
    Heinrich, Quentin
    Brendle, Tom
    Vidal, Maxime
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1193 - 1208
  • [10] Slovak Dataset for Multilingual Question Answering
    Hladek, Daniel
    Stas, Jan
    Juhar, Jozef
    Koctur, Tomas
    IEEE ACCESS, 2023, 11 : 32869 - 32881