Data Mining From Web Search Queries: A Comparison of Google Trends and Baidu Index

被引:74
|
作者
Vaughan, Liwen [1 ,2 ]
Chen, Yue [2 ]
机构
[1] Univ Western Ontario, Fac Informat & Media Studies, London, ON N6A 5B7, Canada
[2] Dalian Univ Technol, Sch Publ Adm, Inst Sci Studies & S&T Management, WISELAB, Dalian 116085, Liaoning Provin, Peoples R China
关键词
web mining; webometrics;
D O I
10.1002/asi.23201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
引用
收藏
页码:13 / 22
页数:10
相关论文
共 50 条
  • [31] From temporal data mining and web mining to temporal web mining
    Samia, M
    Conrad, S
    DATABASES AND INFORMATION SYSTEMS, 2005, 118 : 91 - 102
  • [32] A Comparison of Internet Search Trends and Sexually Transmitted Infection Rates Using Google Trends
    Johnson, Amy K.
    Mehta, Supriya D.
    SEXUALLY TRANSMITTED DISEASES, 2014, 41 (01) : 61 - 63
  • [33] Where you search is what you get: literature mining - Google Scholar versus Web of Science using a data set from a literature search in vegetation science
    Beckmann, Michael
    von Wehrden, Henrik
    JOURNAL OF VEGETATION SCIENCE, 2012, 23 (06) : 1197 - 1199
  • [34] A dengue fever predicting model based on Baidu search index data and climate data in South China
    Liu, Dan
    Guo, Songjing
    Zou, Mingjun
    Chen, Cong
    Deng, Fei
    Xie, Zhong
    Hu, Sheng
    Wu, Liang
    PLOS ONE, 2019, 14 (12):
  • [35] Facilitating web search with visualization and data mining techniques
    Lee, YJ
    KNOWLEDGE AND INFORMATION VISUALIZATION: SEARCHING FOR SYNERGIES, 2005, 3426 : 326 - 342
  • [36] Mondou: Web search engine with textual data mining
    Kawano, H
    1997 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2: PACRIM 10 YEARS - 1987-1997, 1997, : 402 - 405
  • [37] A Data Mining Method for Accurate Employment Search on the Web
    Muntean, Cristina Ioana
    Moldovan, Darie
    Veres, Ovidiu
    COMMUNICATION AND MANAGEMENT IN TECHNOLOGICAL INNOVATION AND ACADEMIC GLOBALIZATION, 2010, : 123 - 128
  • [38] Online Information on Electronic Cigarettes: Comparative Study of Relevant Websites From Baidu and Google Search Engines
    Chen, Ting
    Gentry, Sarah
    Qiu, Dechao
    Deng, Yan
    Notley, Caitlin
    Cheng, Guangwen
    Song, Fujian
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (01)
  • [39] The COVID-19 infodemic in Brazil: trends in Google search data
    Harb, Maria da Penha
    Veiga e Silva, Lena
    Vijaykumar, Nandamudi
    da Silva, Marcelino Silva
    Lisboa Frances, Carlos Renato
    PEERJ, 2022, 10
  • [40] Detection of breaking news from Online web search queries
    Murata, Tsuyoshi
    NEW GENERATION COMPUTING, 2008, 26 (01) : 63 - 73