Design and analysis of a large-scale COVID-19 tweets dataset

被引:102
|
作者
Lamsal, Rabindra [1 ]
机构
[1] Jawaharlal Nehru Univ, Sch Comp & Syst Sci, New Delhi 110067, India
关键词
Social computing; Crisis computing; Sentiment analysis; Network analysis; Twitter data; TWITTER; SENTIMENT; TIME;
D O I
10.1007/s10489-020-02029-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As of July 17, 2020, more than thirteen million people have been diagnosed with the Novel Coronavirus (COVID-19), and half a million people have already lost their lives due to this infectious disease. The World Health Organization declared the COVID-19 outbreak as a pandemic on March 11, 2020. Since then, social media platforms have experienced an exponential rise in the content related to the pandemic. In the past, Twitter data have been observed to be indispensable in the extraction of situational awareness information relating to any crisis. This paper presents COV19Tweets Dataset (Lamsal 2020a), a large-scale Twitter dataset with more than 310 million COVID-19 specific English language tweets and their sentiment scores. The dataset's geo version, the GeoCOV19Tweets Dataset (Lamsal 2020b), is also presented. The paper discusses the datasets' design in detail, and the tweets in both the datasets are analyzed. The datasets are released publicly, anticipating that they would contribute to a better understanding of spatial and temporal dimensions of the public discourse related to the ongoing pandemic. As per the stats, the datasets (Lamsal 2020a, 2020b) have been accessed over 74.5k times, collectively.
引用
收藏
页码:2790 / 2804
页数:15
相关论文
共 50 条
  • [21] Large-scale epidemiological monitoring of the COVID-19 epidemic in Tokyo
    Yoneoka, Daisuke
    Tanoue, Yuta
    Kawashima, Takayuki
    Nomura, Shuhei
    Shi, Shoi
    Eguchi, Akifumi
    Ejima, Keisuke
    Taniguchi, Toshibumi
    Sakamoto, Haruka
    Kunishima, Hiroyuki
    Gilmour, Stuart
    Nishiura, Hiroshi
    Miyata, Hiroaki
    LANCET REGIONAL HEALTH-WESTERN PACIFIC, 2020, 3
  • [22] COVID-19 Sentiment Analysis Based on Tweets
    La Gatta, Valerio
    Moscato, Vincenzo
    Postiglione, Marco
    Sperli, Giancarlo
    IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 51 - 55
  • [23] Analysis of COVID-19 Offensive Tweets and Their Targets
    Liao, Song
    Okpala, Ebuka
    Cheng, Long
    Li, Mingqi
    Vishwamitra, Nishant
    Hu, Hongxin
    Luo, Feng
    Costello, Matthew
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 4473 - 4484
  • [25] COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis
    Naseem, Usman
    Razzak, Imran
    Khushi, Matloob
    Eklund, Peter W.
    Kim, Jinman
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2021, 8 (04): : 1003 - 1015
  • [26] Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS
    Prieto-Alhambra, Daniel
    Kostka, Kristin
    Duarte-Salles, Talita
    Prats-Uribe, Albert
    Sena, Anthony
    Pistillo, Andrea
    Khalid, Sara
    Lai, Lana
    Golozar, Asieh
    Alshammari, Thamir M.
    Dawoud, Dalia
    Nyberg, Fredrik
    Wilcox, Adam
    Andryc, Alan
    Williams, Andrew
    Ostropolets, Anna
    Areia, Carlos
    Jung, Chi Young
    Harle, Christopher
    Reich, Christian
    Blacketer, Clair
    Morales, Daniel
    Dorr, David A.
    Burn, Edward
    Roel, Elena
    Tan, Eng Hooi
    Minty, Evan
    DeFalco, Frank
    de Maeztu, Gabriel
    Lipori, Gigi
    Alghoul, Heba
    Zhu, Hong
    Thomas, Jason
    Bian, Jiang
    Park, Jimyung
    Roldan, Jordi Martinez
    Posada, Jose
    Banda, Juan M.
    Horcajada, Juan P.
    Kohler, Julianna
    Shah, Karishma
    Natarajan, Karthik
    Lynch, Kristine
    Liu, Li
    Schilling, Lisa
    Recalde, Martina
    Spotnitz, Matthew
    Gong, Mengchun
    Matheny, Michael
    Valveny, Neus
    CLINICAL EPIDEMIOLOGY, 2022, 14 : 369 - 384
  • [27] Large-Scale Analysis of the Docker Hub Dataset
    Zhao, Nannan
    Tarasov, Vasily
    Albahar, Hadeel
    Anwar, Ali
    Rupprecht, Lukas
    Skourtis, Dimitrios
    Warke, Amit S.
    Mohamed, Mohamed
    Butt, Ali R.
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 215 - 224
  • [28] A Large-Scale Empirical Study of COVID-19 Themed GitHub Repositories
    Wang, Liu
    Li, Ruiqing
    Zhu, Jiaxin
    Bai, Guangdong
    Wang, Haoyu
    2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 914 - 923
  • [29] Insights from a Large-Scale Discussion on COVID-19 in Collective Intelligence
    Haqbeen, Jawad
    Ito, Takayuki
    Sahab, Sofia
    Sato, Takumi
    Okuhara, Shun
    Hofiani, Murtaza
    2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 546 - 553
  • [30] Large-scale Implementation of a COVID-19 Remote Patient Monitoring Program
    Wang, Lulu
    Arky, Marisa
    Ierardo, Alyssa
    Scanlin, Anna
    Templeton, Melissa
    Booker, Ethan
    WESTERN JOURNAL OF EMERGENCY MEDICINE, 2023, 24 (06) : 1085 - 1093