Predicting New Daily COVID-19 Cases and Deaths Using Search Engine Query Data in South Korea From 2020 to 2021: Infodemiology Study

被引:10
|
作者
Husnayain, Atina [1 ]
Shim, Eunha [2 ]
Fuad, Anis [3 ]
Su, Emily Chia-Yu [1 ,4 ]
机构
[1] Taipei Med Univ, Coll Med Sci & Technol, Grad Inst Biomed Informat, 172-1 Keelung Rd,Sec 2, Taipei 106, Taiwan
[2] Soongsil Univ, Dept Math, Seoul, South Korea
[3] Univ Gadjah Mada, Fac Med Publ Hlth & Nursing, Dept Biostat Epidemiol & Populat Hlth, Yogyakarta, Indonesia
[4] Taipei Med Univ Hosp, Clin Big Data Res Ctr, Taipei, Taiwan
基金
新加坡国家研究基金会;
关键词
prediction; internet search; COVID-19; South Korea; infodemiology; TRENDS; POPULATION; VOLUMES;
D O I
10.2196/34178
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Given the ongoing COVID-19 pandemic situation, accurate predictions could greatly help in the health resource management for future waves. However, as a new entity, COVID-19's disease dynamics seemed difficult to predict. External factors, such as internet search data, need to be included in the models to increase their accuracy. However, it remains unclear whether incorporating online search volumes into models leads to better predictive performances for long-term prediction. Objective: The aim of this study was to analyze whether search engine query data are important variables that should be included in the models predicting new daily COVID-19 cases and deaths in short- and long-term periods. Methods: We used country-level case-related data, NAVER search volumes, and mobility data obtained from Google and Apple for the period of January 20, 2020, to July 31, 2021, in South Korea. Data were aggregated into four subsets: 3, 6, 12, and 18 months after the first case was reported. The first 80% of the data in all subsets were used as the training set, and the remaining data served as the test set. Generalized linear models (GLMs) with normal, Poisson, and negative binomial distribution were developed, along with linear regression (LR) models with lasso, adaptive lasso, and elastic net regularization. Root mean square error values were defined as a loss function and were used to assess the performance of the models. All analyses and visualizations Results: GLMs with different types of distribution functions may have been beneficial in predicting new daily COVID-19 cases and deaths in the early stages of the outbreak. Over longer periods, as the distribution of cases and deaths became more normally distributed, LR models with regularization may have outperformed the GLMs. This study also found that models performed better when predicting new daily deaths compared to new daily cases. In addition, an evaluation of feature effects in the models showed that NAVER search volumes were useful variables in predicting new daily COVID-19 cases, particularly in the first 6 months of the outbreak. Searches related to logistical needs, particularly for "thermometer" and "mask strap," showed higher feature effects in that period. For longer prediction periods, NAVER search volumes were still found to constitute an important variable, although with a lower feature effect. This finding suggests that search term use should be considered to maintain the predictive Conclusions: NAVER search volumes were important variables in short- and long-term prediction, with higher feature effects for predicting new daily COVID-19 cases in the first 6 months of the outbreak. Similar results were also found for death predictions.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Assessing Internet Search Models in Predicting Daily New COVID-19 Cases and Deaths in South Korea
    Husnayain, Atina
    Su, Emily Chia-Yu
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 855 - 859
  • [2] Discovering Time-Varying Public Interest for COVID-19 Case Prediction in South Korea Using Search Engine Queries: Infodemiology Study
    Ahn, Seong-Ho
    Yim, Kwangil
    Won, Hyun-Sik
    Kim, Kang-Min
    Jeong, Dong-Hwa
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [3] Predicting COVID-19 transmission in a student population in Seoul, South Korea, 2020-2021
    Lee, Young Hwa
    Kim, Han Ho
    Choe, Young June
    CLINICAL AND EXPERIMENTAL PEDIATRICS, 2023, 66 (04) : 173 - 178
  • [4] Forecasting imported COVID-19 cases in South Korea using mobile roaming data
    Choi, Soo Beom
    Ahn, Insung
    PLOS ONE, 2020, 15 (11):
  • [5] Predicting COVID-19 Cases in South Korea Using Stringency and Nino Sea Surface Temperature Indices
    Necesito, Imee V.
    Velasco, John Mark S.
    Jung, Jaewon
    Bae, Young Hye
    Yoo, Younghoon
    Kim, Soojun
    Kim, Hung Soo
    FRONTIERS IN PUBLIC HEALTH, 2022, 10
  • [6] COVID-19 Cases and Deaths among Healthcare Personnel with the Progression of the Pandemic in Korea from March 2020 to February 2022
    Kim, Yeonju
    Yang, Sung-Chan
    Jang, Jinhwa
    Park, Shin Young
    Kim, Seong Sun
    Kim, Chansoo
    Kwon, Donghyok
    Lee, Sang-Won
    TROPICAL MEDICINE AND INFECTIOUS DISEASE, 2023, 8 (06)
  • [7] Forecast daily tourist volumes during the epidemic period using COVID-19 data, search engine data and weather data
    Zhang, Chuan
    Tian, Yu-Xin
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 210
  • [8] Modeling Cases and Deaths per Million using Daily-Aggregated Facebook COVID-19 Symptom Survey Data
    Betko, Sage J.
    Shetty, Rishabh S.
    Morgan, Jeffrey J.
    Menon, Prahlad G.
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 1627 - 1630
  • [9] Predicting Depression during the COVID-19 Pandemic Using Interpretable TabNet: A Case Study in South Korea
    Nguyen, Hung Viet
    Byeon, Haewon
    MATHEMATICS, 2023, 11 (14)
  • [10] Forecast predictions for the COVID-19 pandemic in Brazil by statistical modeling using the Weibull distribution for daily new cases and deaths
    Moreau, Vitor Hugo
    BRAZILIAN JOURNAL OF MICROBIOLOGY, 2020, 51 (03) : 1109 - 1115