Using search engine big data for predicting new HIV diagnoses

被引:37
|
作者
Young, Sean D. [1 ]
Zhang, Qingpeng [2 ]
机构
[1] Univ Calif Los Angeles, Dept Family Med, Univ Calif Inst Predict Technol, Los Angeles, CA 90095 USA
[2] City Univ Hong Kong, Dept Syst Engn & Engn Management, Kowloon, Hong Kong, Peoples R China
来源
PLOS ONE | 2018年 / 13卷 / 07期
关键词
D O I
10.1371/journal.pone.0199527
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background A large and growing body of "big data" is generated by internet search engines, such as Google. Because people often search for information about public health and medical issues, researchers may be able to use search engine data to monitor and predict public health problems, such as HIV. We sought to assess the feasibility of using Google search data to analyze and predict new HIV diagnoses cases in the United States. Methods and findings From 2007 to 2014, we collected search volume data on HIV-related Google search keywords across the United States. State-level new HIV diagnoses data were collected from the Centers for Disease Control and Prevention (CDC) and AIDSVu.org. We developed a negative binomial model to predict HIV cases using a subset of significant predictor keywords identified by LASSO. The Google search data were combined with state-level HIV case reports provided by the CDC. We use historical data to train the model and predict new HIV diagnoses from 2011 to 2014, with an average R-2 value of 0.99 between predicted versus actual cases, and average root-mean-square error (RMSE) of 108.75. Conclusions Results indicate that Google Trends is a feasible tool to predict new cases of HIV at the state level. We discuss the implications of integrating visualization maps and tools based on these models into public health and HIV monitoring and surveillance.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Search in Big Networks and Big Data
    Abdelrahman, Omer H.
    Gelenbe, E.
    [J]. ANALYTIC METHODS IN INTERDISCIPLINARY APPLICATIONS, 2015, 116 : 1 - 15
  • [22] Detecting China influenza using search engine data
    Li, Xiu-Ting
    Liu, Fan
    Dong, Ji-Chang
    Lü, Ben-Fu
    [J]. Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, 2013, 33 (12): : 3028 - 3034
  • [23] Using Search Engine Data as a Tool to Predict Syphilis
    Young, Sean D.
    Torrone, Elizabeth A.
    Urata, John
    Aral, Sevgi O.
    [J]. EPIDEMIOLOGY, 2018, 29 (04) : 574 - 578
  • [24] Data Extraction for Search Engine Using Safe Matching
    Hong, Jer Lang
    Tan, Ee Xion
    Fauzi, Fariza
    [J]. AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 759 - +
  • [25] Stock Turnover Prediction Using Search Engine Data
    Wang, Zhijin
    Huang, Yaohui
    Cai, Bing
    Ma, Rui
    Wang, Zongyue
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (07)
  • [26] PREDICTING FUTURE VISITORS OF RESTAURANTS USING BIG DATA
    Ma, Xu
    Tian, Yanshan
    Luo, Chu
    Zhang, Yuehui
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2018, : 269 - 274
  • [27] Predicting the ratings of Amazon products using Big Data
    Woo, Jongwook
    Mishra, Monika
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 11 (03)
  • [28] Understanding Competition using Big Consumer Search Data
    Ringel, Daniel M.
    Skiera, Bernd
    [J]. 2014 47TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2014, : 3129 - 3138
  • [29] Predicting Smoking Prevalence in Japan Using Search Volumes in an Internet Search Engine: Infodemiology Study
    Taira, Kazuya
    Itaya, Takahiro
    Fujita, Sumio
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (12)
  • [30] New search engine
    不详
    [J]. ONLINE & CDROM REVIEW, 1998, 22 (04): : 299 - 299