Predicting New Daily COVID-19 Cases and Deaths Using Search Engine Query Data in South Korea From 2020 to 2021: Infodemiology Study

被引:10
|
作者
Husnayain, Atina [1 ]
Shim, Eunha [2 ]
Fuad, Anis [3 ]
Su, Emily Chia-Yu [1 ,4 ]
机构
[1] Taipei Med Univ, Coll Med Sci & Technol, Grad Inst Biomed Informat, 172-1 Keelung Rd,Sec 2, Taipei 106, Taiwan
[2] Soongsil Univ, Dept Math, Seoul, South Korea
[3] Univ Gadjah Mada, Fac Med Publ Hlth & Nursing, Dept Biostat Epidemiol & Populat Hlth, Yogyakarta, Indonesia
[4] Taipei Med Univ Hosp, Clin Big Data Res Ctr, Taipei, Taiwan
基金
新加坡国家研究基金会;
关键词
prediction; internet search; COVID-19; South Korea; infodemiology; TRENDS; POPULATION; VOLUMES;
D O I
10.2196/34178
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Given the ongoing COVID-19 pandemic situation, accurate predictions could greatly help in the health resource management for future waves. However, as a new entity, COVID-19's disease dynamics seemed difficult to predict. External factors, such as internet search data, need to be included in the models to increase their accuracy. However, it remains unclear whether incorporating online search volumes into models leads to better predictive performances for long-term prediction. Objective: The aim of this study was to analyze whether search engine query data are important variables that should be included in the models predicting new daily COVID-19 cases and deaths in short- and long-term periods. Methods: We used country-level case-related data, NAVER search volumes, and mobility data obtained from Google and Apple for the period of January 20, 2020, to July 31, 2021, in South Korea. Data were aggregated into four subsets: 3, 6, 12, and 18 months after the first case was reported. The first 80% of the data in all subsets were used as the training set, and the remaining data served as the test set. Generalized linear models (GLMs) with normal, Poisson, and negative binomial distribution were developed, along with linear regression (LR) models with lasso, adaptive lasso, and elastic net regularization. Root mean square error values were defined as a loss function and were used to assess the performance of the models. All analyses and visualizations Results: GLMs with different types of distribution functions may have been beneficial in predicting new daily COVID-19 cases and deaths in the early stages of the outbreak. Over longer periods, as the distribution of cases and deaths became more normally distributed, LR models with regularization may have outperformed the GLMs. This study also found that models performed better when predicting new daily deaths compared to new daily cases. In addition, an evaluation of feature effects in the models showed that NAVER search volumes were useful variables in predicting new daily COVID-19 cases, particularly in the first 6 months of the outbreak. Searches related to logistical needs, particularly for "thermometer" and "mask strap," showed higher feature effects in that period. For longer prediction periods, NAVER search volumes were still found to constitute an important variable, although with a lower feature effect. This finding suggests that search term use should be considered to maintain the predictive Conclusions: NAVER search volumes were important variables in short- and long-term prediction, with higher feature effects for predicting new daily COVID-19 cases in the first 6 months of the outbreak. Similar results were also found for death predictions.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Information Usage and Compliance with Preventive Behaviors for COVID-19: A Longitudinal Study with Data from the JACSIS 2020/JASTIS 2021
    Kusama, Taro
    Kiuchi, Sakura
    Takeuchi, Kenji
    Ikeda, Takaaki
    Nakazawa, Noriko
    Kinugawa, Anna
    Osaka, Ken
    Tabuchi, Takahiro
    HEALTHCARE, 2022, 10 (03)
  • [22] Time series analysis of daily reported number of new positive cases of COVID-19 in Japan from January 2020 to February 2023
    Sumi, Ayako
    PLOS ONE, 2023, 18 (09):
  • [23] Predicting High-Risk Groups for COVID-19 Anxiety Using AdaBoost and Nomogram: Findings from Nationwide Survey in South Korea
    Byeon, Haewon
    APPLIED SCIENCES-BASEL, 2021, 11 (21):
  • [24] Consistency of Daily Number of Reported COVID-19 Cases in 191 Countries From 2020 to 2022: Comparative Analysis of 2 Major Data Sources
    Liu, Han
    Zong, Huiying
    Yang, Yang
    Schwebel, David C.
    Xie, Bin
    Ning, Peishan
    Rao, Zhenzhen
    Li, Li
    Hu, Guoqing
    JMIR PUBLIC HEALTH AND SURVEILLANCE, 2025, 11
  • [25] Prediction of COVID-19 New Cases Using Multiple Linear Regression Model Based on May to June 2020 Data in Ethiopia
    Argawu, Alemayehu Siffir
    Gobebo, Gizachew
    Bedane, Ketema
    Senbeto, Temesgen
    Lemessa, Reta
    Galdassa, Agassa
    JOURNAL OF PHARMACEUTICAL RESEARCH INTERNATIONAL, 2021, 33 (51A) : 54 - 63
  • [26] SARS-CoV-2 variants from COVID-19 positive cases in the Free State province, South Africa from July 2020 to December 2021
    Mwangi, Peter
    Okendo, Javan
    Mogotsi, Milton
    Ogunbayo, Ayodeji
    Adelabu, Olusesan
    Sondlane, Hlengiwe
    Maotoana, Makgotso
    Mahomed, Lutfiyya
    Morobadi, Molefi Daniel
    Vawda, Sabeehah
    von Gottberg, Anne
    Bhiman, Jinal
    Tegally, Houriiyah
    Wilkinson, Eduan
    Giandhari, Jennifer
    Pillay, Sureshnee
    Naidoo, Yeshnee
    Ramphal, Upasana
    de Oliveira, Tulio
    Bester, Armand
    Goedhals, Dominique
    Nyaga, Martin
    FRONTIERS IN VIROLOGY, 2022, 2
  • [27] Prevalence and Factors Associated with Olfactory Dysfunction in Individuals with COVID-19 in Brazil: A Study of 20,669 Cases from 2020 to 2021
    de Souza, Carlos Dornels Freire
    Magalhaes, Amanda Julia de Arruda
    Silva Nobre, Yasmin Vitoria
    Souza, Carlos Alberto
    do Nascimento, Andre Luis Oliveira
    de Faria, Luisa Robalinho
    Bezerra-Santos, Marcio
    Armstrong, Anderson da Costa
    Nicacio, Jandir Mendonca
    Gomes, Orlando Vieira
    do Carmo, Rodrigo Feliciano
    MEDICAL PRINCIPLES AND PRACTICE, 2024, 33 (02) : 164 - 172
  • [28] Predicting the weekly COVID-19 new cases using multilayer perceptron: An evidence from west Java']Java, Indonesia
    Hidayat, Yuyun
    Pangestu, Dhika Surya
    Subiyanto
    Purwandari, Titi
    Sukono
    Saputra, Jumadil
    DECISION SCIENCE LETTERS, 2022, 11 (03) : 247 - 262
  • [29] Association of COVID-19 with skin diseases and relevant biologics: a cross-sectional study using nationwide claim data in South Korea
    Cho, S. I.
    Kim, Y. E.
    Jo, S. J.
    BRITISH JOURNAL OF DERMATOLOGY, 2021, 184 (02) : 296 - 303
  • [30] Excess deaths directly and indirectly attributable to COVID-19 using routinely reported mortality data, Bishkek, Kyrgyzstan, 2020: a cross-sectional study
    Bumburidi, Yekaterina
    Dzhalimbekova, Altynai
    Malisheva, Marina
    Moolenaar, Ronald L.
    Horth, Roberta
    Singer, Daniel
    Otorbaeva, Dinagul
    BMJ OPEN, 2023, 13 (07):