Data Mining From Web Search Queries: A Comparison of Google Trends and Baidu Index

被引:74
|
作者
Vaughan, Liwen [1 ,2 ]
Chen, Yue [2 ]
机构
[1] Univ Western Ontario, Fac Informat & Media Studies, London, ON N6A 5B7, Canada
[2] Dalian Univ Technol, Sch Publ Adm, Inst Sci Studies & S&T Management, WISELAB, Dalian 116085, Liaoning Provin, Peoples R China
关键词
web mining; webometrics;
D O I
10.1002/asi.23201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
引用
收藏
页码:13 / 22
页数:10
相关论文
共 50 条
  • [21] Search Concentration, Bias, and Parochialism: A Comparative Study of Google, Baidu, and Jike's Search Results From China
    Jiang, Min
    JOURNAL OF COMMUNICATION, 2014, 64 (06) : 1088 - 1110
  • [22] Framework for Web Content Mining Using Semantic Search and Natural Language Queries
    Shaikh, A. J.
    Kolhe, V. L.
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 280 - 284
  • [23] Public Interest in Acne on the Internet: Comparison of Search Information From Google Trends and Naver
    Park, Tae Heum
    Kim, Woo Il
    Park, Suyeon
    Ahn, Jaeouk
    Cho, Moon Kyun
    Kim, Sooyoung
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (10)
  • [24] Google Web and Image Search Visibility Data for Online Store
    Strzelecki, Artur
    DATA, 2019, 4 (03)
  • [25] Challenges and Opportunities in One Health: Google Trends Search Data
    Wisnieski, Lauren
    Gruszynski, Karen
    Faulkner, Vina
    Shock, Barbara
    PATHOGENS, 2023, 12 (11):
  • [26] Do search queries predict violence against women? A forecasting model based on Google Trends
    Gonzalvez-Gallego, Nicolas
    Concepcion Perez-Carceles, Maria
    Nieto-Torrejon, Laura
    JOURNAL OF FORECASTING, 2024, 43 (05) : 1607 - 1614
  • [27] Patterns of search queries of diabetes-related terms: An infodemiological study using Google trends
    Koo, Malcolm
    Tsai, Kun-Wei
    Lin, Shih-Chun
    DIABETES RESEARCH AND CLINICAL PRACTICE, 2016, 120 : S209 - S209
  • [28] Seasonal Variation for Plantar Fasciitis: Evidence from Google Trends Search Query Data
    Hwang, Seok-Min
    Kim, Seok
    Hwang, Suk-Hyun
    HEALTHCARE, 2022, 10 (09)
  • [29] Seasonal trends in hypertension in Poland: evidence from Google search engine query data
    Platek, Anna E.
    Sierdzinski, Janusz
    Krzowski, Bartosz
    Szymanski, Filip M.
    KARDIOLOGIA POLSKA, 2018, 76 (03) : 637 - 641
  • [30] Semantic Discovery from Web Comparison Queries
    Zhong, Tingting
    Wu, Wensheng
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1501 - 1504