Predicting New Daily COVID-19 Cases and Deaths Using Search Engine Query Data in South Korea From 2020 to 2021: Infodemiology Study

被引:10
|
作者
Husnayain, Atina [1 ]
Shim, Eunha [2 ]
Fuad, Anis [3 ]
Su, Emily Chia-Yu [1 ,4 ]
机构
[1] Taipei Med Univ, Coll Med Sci & Technol, Grad Inst Biomed Informat, 172-1 Keelung Rd,Sec 2, Taipei 106, Taiwan
[2] Soongsil Univ, Dept Math, Seoul, South Korea
[3] Univ Gadjah Mada, Fac Med Publ Hlth & Nursing, Dept Biostat Epidemiol & Populat Hlth, Yogyakarta, Indonesia
[4] Taipei Med Univ Hosp, Clin Big Data Res Ctr, Taipei, Taiwan
基金
新加坡国家研究基金会;
关键词
prediction; internet search; COVID-19; South Korea; infodemiology; TRENDS; POPULATION; VOLUMES;
D O I
10.2196/34178
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Given the ongoing COVID-19 pandemic situation, accurate predictions could greatly help in the health resource management for future waves. However, as a new entity, COVID-19's disease dynamics seemed difficult to predict. External factors, such as internet search data, need to be included in the models to increase their accuracy. However, it remains unclear whether incorporating online search volumes into models leads to better predictive performances for long-term prediction. Objective: The aim of this study was to analyze whether search engine query data are important variables that should be included in the models predicting new daily COVID-19 cases and deaths in short- and long-term periods. Methods: We used country-level case-related data, NAVER search volumes, and mobility data obtained from Google and Apple for the period of January 20, 2020, to July 31, 2021, in South Korea. Data were aggregated into four subsets: 3, 6, 12, and 18 months after the first case was reported. The first 80% of the data in all subsets were used as the training set, and the remaining data served as the test set. Generalized linear models (GLMs) with normal, Poisson, and negative binomial distribution were developed, along with linear regression (LR) models with lasso, adaptive lasso, and elastic net regularization. Root mean square error values were defined as a loss function and were used to assess the performance of the models. All analyses and visualizations Results: GLMs with different types of distribution functions may have been beneficial in predicting new daily COVID-19 cases and deaths in the early stages of the outbreak. Over longer periods, as the distribution of cases and deaths became more normally distributed, LR models with regularization may have outperformed the GLMs. This study also found that models performed better when predicting new daily deaths compared to new daily cases. In addition, an evaluation of feature effects in the models showed that NAVER search volumes were useful variables in predicting new daily COVID-19 cases, particularly in the first 6 months of the outbreak. Searches related to logistical needs, particularly for "thermometer" and "mask strap," showed higher feature effects in that period. For longer prediction periods, NAVER search volumes were still found to constitute an important variable, although with a lower feature effect. This finding suggests that search term use should be considered to maintain the predictive Conclusions: NAVER search volumes were important variables in short- and long-term prediction, with higher feature effects for predicting new daily COVID-19 cases in the first 6 months of the outbreak. Similar results were also found for death predictions.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] COVID-19 diagnostic testing and vaccinations among First Nations in Manitoba: A nations-based retrospective cohort study using linked administrative data, 2020-2021
    Nickel, Nathan C.
    Phillips-Beck, Wanda
    Enns, Jennifer E.
    Ekuma, Okechukwu
    Taylor, Carole
    Fileatreault, Sarah
    Eze, Nkiru
    Star, Leona
    Lavoie, Josee
    Katz, Alan
    Brownell, Marni
    Mahar, Alyson
    Urquia, Marcelo
    Chateau, Dan
    Lix, Lisa
    Chartier, Mariette
    Brownell, Emily
    Deh, Miyosha Tso
    Durksen, Anita
    Romanescu, Razvan
    PLOS MEDICINE, 2024, 21 (02)
  • [42] Prediction of patients requiring intensive care for COVID-19: development and validation of an integer-based score using data from Centers for Disease Control and Prevention of South Korea
    JoonNyung Heo
    Deokjae Han
    Hyung-Jun Kim
    Daehyun Kim
    Yeon-Kyeng Lee
    Dosang Lim
    Sung Ok Hong
    Mi-Jin Park
    Beomman Ha
    Woong Seog
    Journal of Intensive Care, 9
  • [43] Prediction of patients requiring intensive care for COVID-19: development and validation of an integer-based score using data from Centers for Disease Control and Prevention of South Korea
    Heo, JoonNyung
    Han, Deokjae
    Kim, Hyung-Jun
    Kim, Daehyun
    Lee, Yeon-Kyeng
    Lim, Dosang
    Hong, Sung Ok
    Park, Mi-Jin
    Ha, Beomman
    Seog, Woong
    JOURNAL OF INTENSIVE CARE, 2021, 9 (01)
  • [45] The Hidden Factor-Low Quality of Data is a Major Peril in the Identification of Risk Factors for COVID-19 Deaths: A Comment on Nogueira, PJ, et al. "The Role of Health Preconditions on COVID-19 Deaths in Portugal: Evidence from Surveillance Data of the First 20293 Infection Cases". J. Clin. Med. 2020, 9, 2368
    Costa-Santos, Cristina
    Ribeiro-Vaz, Ines
    Monteiro-Soares, Matilde
    JOURNAL OF CLINICAL MEDICINE, 2020, 9 (11) : 1 - 2
  • [46] Mental health status of individuals with diabetes in Korea before and during the COVID-19 pandemic: a comparison of data from the Korean national health and nutrition examination surveys of 2018-2019 and 2020-2021
    Jung, Hyejin
    BMJ OPEN, 2023, 13 (10):
  • [47] Analysis of SARS-CoV-2 Screening Clinic (Including Drive-through System) Data at a Single University Hospital in South Korea from 27 January 2020 to 31 March 2020 During the COVID-19 Outbreak
    Chang, Min Cheol
    Seo, Wan-Seok
    Park, Donghwi
    Hur, Jian
    HEALTHCARE, 2020, 8 (02)
  • [48] Has the gout epidemic peaked in the UK? A nationwide cohort study using data from the Clinical Practice Research Datalink, from 1997 to across the COVID-19 pandemic in 2021
    Abhishek, Abhishek
    Tata, Laila J.
    Mamas, Mamas
    Avery, Anthony J.
    ANNALS OF THE RHEUMATIC DISEASES, 2022, 81 (06) : 898 - 899
  • [49] Lockdown-type containment measures for COVID-19 prevention and control: a descriptive ecological study with data from South Africa, Germany, Brazil, Spain, United States, Italy and New Zealand, February - August 2020
    Houvessou, Gbenankpon Mathias
    de Souza, Tatiana Porto
    da Silveira, Mariangela Freitas
    EPIDEMIOLOGIA E SERVICOS DE SAUDE, 2021, 30 (01):
  • [50] Ischaemic stroke in patients with diabetes requiring urgent procedures during the COVID-19 pandemic in South Korea: a retrospective, nationwide, population-based cohort study using data from the National Emergency Department Information System
    Park, Min Jeong
    Hwang, Jeongeun
    Ahn, Jonghwa
    Park, Sung Joon
    Song, Eyun
    Jang, Ahreum
    Choi, Kyung Mook
    Baik, Sei Hyun
    Yoo, Hye Jin
    BMJ OPEN, 2023, 13 (12):