An augmented multilingual Twitter dataset for studying the COVID-19 infodemic

被引:0
|
作者
Christian E. Lopez
Caleb Gallemore
机构
[1] Lafayette College,Department of Computer Science and Mechanical Engineering Department
[2] Lafayette College,International Affairs Program
来源
关键词
Twitter; COVID-19; Named Entity Recognition; Sentiment analysis;
D O I
暂无
中图分类号
学科分类号
摘要
This work presents an openly available dataset to facilitate researchers’ exploration and hypothesis testing about the social discourse of the COVID-19 pandemic. The dataset currently consists of over 2.2 billions tweets (count as of September, 2021), from all over the world, in multiple languages. Tweets start from January 22, 2020, when the total cases of reported COVID-19 were below 600 worldwide. The dataset was collected using the Twitter API and by rehydrating tweets from other available datasets, data collection is ongoing as of the time of writing. To facilitate hypothesis testing and exploration of social discourse, the English and Spanish tweets have been augmented with state-of-the-art Twitter Sentiment and Named Entity Recognition algorithms. The dataset and the summary files provided allow researchers to avoid some computationally intensive analyses, facilitating more widespread use of social media data to gain insights on issues such as (mis)information diffusion, semantic networks, sentiments, and the evolution of COVID-19 discussions. In addition, the dataset provides an archive for researchers in the social sciences wishing to have access to a dataset covering the entire duration of the pandemic.
引用
收藏
相关论文
共 50 条
  • [1] An augmented multilingual Twitter dataset for studying the COVID-19 infodemic
    Lopez, Christian E.
    Gallemore, Caleb
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2021, 11 (01)
  • [2] A multilingual dataset of COVID-19 vaccination attitudes on Twitter
    Chen, Ninghan
    Chen, Xihui
    Pang, Jun
    [J]. DATA IN BRIEF, 2022, 44
  • [3] Studying the COVID-19 infodemic at scale
    Gruzd, Anatoliy
    De Domenico, Manlio
    Sacco, Pier Luigi
    Briand, Sylvie
    [J]. BIG DATA & SOCIETY, 2021, 8 (01):
  • [4] The COVID-19 Infodemic: Twitter versus Facebook
    Yang, Kai-Cheng
    Pierri, Francesco
    Hui, Pik-Mai
    Axelrod, David
    Torres-Lugo, Christopher
    Bryden, John
    Menczer, Filippo
    [J]. BIG DATA & SOCIETY, 2021, 8 (01):
  • [5] IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages
    Uniyal, Deepak
    Agarwal, Amit
    [J]. MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2021, 1525 : 309 - 324
  • [6] Twitter monitoring evidence of Covid-19 infodemic in Italy
    Calamusa, A.
    Tardelli, S.
    Avvenuti, M.
    Cresci, S.
    Federigi, I.
    Tesconi, M.
    Verani, M.
    Carducci, A.
    [J]. EUROPEAN JOURNAL OF PUBLIC HEALTH, 2020, 30
  • [8] Characterizing the roles of bots on Twitter during the COVID-19 infodemic
    Wentao Xu
    Kazutoshi Sasahara
    [J]. Journal of Computational Social Science, 2022, 5 : 591 - 609
  • [9] Characterizing the roles of bots on Twitter during the COVID-19 infodemic
    Xu, Wentao
    Sasahara, Kazutoshi
    [J]. JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2022, 5 (01): : 591 - 609
  • [10] Italian top actors during the COVID-19 infodemic on Twitter
    Zola, Paola
    Cola, Guglielmo
    Martella, Antonio
    Tesconi, Maurizio
    [J]. International Journal of Web Based Communities, 2022, 18 (02) : 150 - 172