Mining Social Media Data to Predict COVID-19 Case Counts

被引:0
|
作者
Kazijevs, Maksims [1 ]
Akyelken, Furkan A. [1 ]
Samad, Manar D. [1 ]
机构
[1] Tennessee State Univ, Dept Comp Sci, Nashville, TN 37203 USA
基金
美国国家卫生研究院;
关键词
pandemic prediction; social media; Twitter; LSTM; natural language processing;
D O I
10.1109/ICHI54592.2022.00027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The unpredictability and unknowns surrounding the ongoing coronavirus disease (COVID-19) pandemic have led to an unprecedented consequence taking a heavy toll on the lives and economies of all countries. There have been efforts to predict COVID-19 case counts (CCC) using epidemiological data and numerical tokens online, which may allow early preventive measures to slow the spread of the disease. In this paper, we use state-of-the-art natural language processing (NLP) algorithms to numerically encode COVID-19 related tweets originated from eight cities in the United States and predict city-specific CCC up to eight days in the future. A city-embedding is proposed to obtain a time series representation of daily tweets posted from a city, which is then used to predict case counts using a custom long-short term memory (LSTM) model. The universal sentence encoder yields the best normalized root mean squared error (NRMSE) 0.090 (0.039), averaged across all cities in predicting CCC six days in the future. The R-2 scores in predicting CCC are more than 0.70 and often over 0.8, which suggests a strong correlation between the actual and our model predicted CCC values. Our analyses show that the NRMSE and R-2 scores are consistently robust across different cities and different numbers of time steps in time series data. Results show that the LSTM model can learn the mapping between the NLP-encoded tweet semantics and the case counts, which infers that social media text can be directly mined to identify the future course of the pandemic.
引用
收藏
页码:104 / 111
页数:8
相关论文
共 50 条
  • [31] An analysis of COVID-19 economic measures and attitudes: evidence from social media mining
    Domalewska, Dorota
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [32] An analysis of COVID-19 economic measures and attitudes: evidence from social media mining
    Dorota Domalewska
    [J]. Journal of Big Data, 8
  • [33] COVID-19 in Italy and extreme data mining
    Buscema, Paolo Massimo
    Della Torre, Francesca
    Breda, Marco
    Massini, Giulia
    Grossi, Enzo
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 557
  • [34] A Case Study on Mining Social Media Data
    Chan, H. K.
    Lacka, E.
    Yee, R. W. Y.
    Lim, M. K.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2014, : 593 - 596
  • [35] Exploring the Requirements of Pandemic Awareness Systems: A Case Study of COVID-19 Using Social Media Data
    Shakeri, Esmaeil
    Far, Behrouz H.
    [J]. 2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2020), 2020, : 33 - 40
  • [36] COVID-19 as a Metaphor: Understanding COVID-19 Through Social Media Users
    Colak, Figen Unal
    [J]. DISASTER MEDICINE AND PUBLIC HEALTH PREPAREDNESS, 2022, 17
  • [37] ESCALATIONS IN EARLY COVID-19 DEATH COUNTS PREDICT DISTRESS VARIABILITY
    Small, Amanda K.
    Zawadzki, Matthew J.
    [J]. PSYCHOSOMATIC MEDICINE, 2022, 84 (05) : A78 - A78
  • [38] A Hybrid Deep Learning Model to Predict the Impact of COVID-19 on Mental Health From Social Media Big Data
    Al Banna, Md. Hasan
    Ghosh, Tapotosh
    Al Nahian, Md. Jaber
    Kaiser, M. Shamim
    Mahmud, Mufti
    Abu Taher, Kazi
    Hossain, Mohammad Shahadat
    Andersson, Karl
    [J]. IEEE ACCESS, 2023, 11 : 77009 - 77022
  • [39] COVID-19 and social media: Beyond polarization
    De Nicola, Giacomo
    Mambou, Victor H. Tuekam
    Kauermann, Goeran
    [J]. PNAS NEXUS, 2023, 2 (08):
  • [40] COVID-19 Literacy through Social Media
    Esther Gonzalez-Moreno, Sonia
    [J]. JOURNAL OF LEARNING STYLES, 2020, 13 : 128 - 139