TwEETQA: A Social Media Focused Question Answering Dataset

被引:0
|
作者
Xiong, Wenhan [1 ]
Wu, Jiawei [1 ]
Wang, Hong [1 ]
Kulkarni, Vivek [1 ]
Yu, Mo [2 ]
Chang, Shiyu [2 ]
Guo, Xiaoxiao [2 ]
Wang, William Yang [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] IBM Res, Yorktown Hts, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effectiveness of many applications that rely on real-time knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike other QA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text.(1)
引用
收藏
页码:5020 / 5031
页数:12
相关论文
共 50 条
  • [41] QASC: A Dataset for Question Answering via Sentence Composition
    Khot, Tushar
    Clark, Peter
    Guerquin, Michal
    Jansen, Peter
    Sabharwal, Ashish
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8082 - 8090
  • [42] A dataset for medical instructional video classification and question answering
    Deepak Gupta
    Kush Attal
    Dina Demner-Fushman
    Scientific Data, 10
  • [43] MultiSpanQA: A Dataset for Multi-Span Question Answering
    Li, Haonan
    Vasardani, Maria
    Tomko, Martin
    Baldwin, Timothy
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1250 - 1260
  • [44] ToolQA: A Dataset for LLM Question Answering with External Tools
    Zhuang, Yuchen
    Yu, Yue
    Wang, Kuan
    Sun, Haotian
    Zhang, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [45] BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles
    Zhang, Yunxiang
    Wan, Xiaojun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11748 - 11756
  • [46] Cross-Dataset Adaptation for Visual Question Answering
    Chao, Wei-Lun
    Hu, Hexiang
    Sha, Fei
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5716 - 5725
  • [47] Systematic Error Analysis of the Stanford Question Answering Dataset
    Rondeau, Marc-Antoine
    Hazen, Timothy J.
    MACHINE READING FOR QUESTION ANSWERING, 2018, : 12 - 20
  • [48] Korean-Specific Dataset for Table Question Answering
    Jun, Changwook
    Choi, Jooyoung
    Sim, Myoseop
    Kim, Hyun
    Jang, Hansol
    Min, Kyungkoo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6114 - 6120
  • [49] LIQUID: A Framework for List Question Answering Dataset Generation
    Lee, Seongyun
    Kim, Hyunjae
    Kang, Jaewoo
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13014 - 13024
  • [50] CodeQA: A Question Answering Dataset for Source Code Comprehension
    Liu, Chenxiao
    Wan, Xiaojun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2618 - 2632