Using search engine big data for predicting new HIV diagnoses

被引:37
|
作者
Young, Sean D. [1 ]
Zhang, Qingpeng [2 ]
机构
[1] Univ Calif Los Angeles, Dept Family Med, Univ Calif Inst Predict Technol, Los Angeles, CA 90095 USA
[2] City Univ Hong Kong, Dept Syst Engn & Engn Management, Kowloon, Hong Kong, Peoples R China
来源
PLOS ONE | 2018年 / 13卷 / 07期
关键词
D O I
10.1371/journal.pone.0199527
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background A large and growing body of "big data" is generated by internet search engines, such as Google. Because people often search for information about public health and medical issues, researchers may be able to use search engine data to monitor and predict public health problems, such as HIV. We sought to assess the feasibility of using Google search data to analyze and predict new HIV diagnoses cases in the United States. Methods and findings From 2007 to 2014, we collected search volume data on HIV-related Google search keywords across the United States. State-level new HIV diagnoses data were collected from the Centers for Disease Control and Prevention (CDC) and AIDSVu.org. We developed a negative binomial model to predict HIV cases using a subset of significant predictor keywords identified by LASSO. The Google search data were combined with state-level HIV case reports provided by the CDC. We use historical data to train the model and predict new HIV diagnoses from 2011 to 2014, with an average R-2 value of 0.99 between predicted versus actual cases, and average root-mean-square error (RMSE) of 108.75. Conclusions Results indicate that Google Trends is a feasible tool to predict new cases of HIV at the state level. We discuss the implications of integrating visualization maps and tools based on these models into public health and HIV monitoring and surveillance.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Predicting New Diagnoses of HIV Infection Using Internet Search Engine Data
    Jena, Anupam B.
    Karaca-Mandic, Pinar
    Weaver, Lesley
    Seabury, Seth A.
    [J]. CLINICAL INFECTIOUS DISEASES, 2013, 56 (09) : 1352 - 1353
  • [2] Using internet search data to predict new HIV diagnoses in China: a modelling study
    Zhang, Qingpeng
    Chai, Yi
    Li, Xiaoming
    Young, Sean D.
    Zhou, Jiaqi
    [J]. BMJ OPEN, 2018, 8 (10):
  • [3] Predicting the Present with Search Engine Data
    Varian, Hal
    [J]. 19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 4 - 4
  • [4] A Recommendation Engine for Predicting Movie Ratings Using a Big Data Approach
    Awan, Mazhar Javed
    Khan, Rafia Asad
    Nobanee, Haitham
    Yasin, Awais
    Anwar, Syed Muhammad
    Naseem, Usman
    Singh, Vishwa Pratap
    [J]. ELECTRONICS, 2021, 10 (10)
  • [5] Big Data Analytics for Search Engine Optimization
    Drivas, Ioannis C.
    Sakas, Damianos P.
    Giannakopoulos, Georgios A.
    Kyriaki-Manessi, Daphne
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2020, 4 (02) : 1 - 22
  • [6] Search engine intelligent algorithm for big data
    Li C.H.
    [J]. Telecommunications and Radio Engineering (English translation of Elektrosvyaz and Radiotekhnika), 2020, 79 (10): : 883 - 890
  • [7] Design and Implementation of Search Engine Based on Big Data
    Zhang Zhifeng
    Han Susu
    [J]. AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (01): : 1355 - 1359
  • [8] Design of a vertical search engine for synchrotron data: a big data approach using Hadoop ecosystem
    Ali Khaleghi
    Kamran Mahmoudi
    Sonia Mozaffari
    [J]. SN Applied Sciences, 2019, 1
  • [9] Design of a vertical search engine for synchrotron data: a big data approach using Hadoop ecosystem
    Khaleghi, Ali
    Mahmoudi, Kamran
    Mozaffari, Sonia
    [J]. SN APPLIED SCIENCES, 2019, 1 (12)
  • [10] HIDDEN DIAGNOSES: THE SEARCH FOR HIV
    Warsame, Fadumo
    Halperin, Jason
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2020, 35 (SUPPL 1) : S478 - S478