Query Smearing: Improving Classification Accuracy and Coverage of Search Results using Logs

被引:0
|
作者
Oztekin, B. Uygar [1 ]
Chiu, Andy [1 ]
机构
[1] Google Inc, Mountain View, CA USA
关键词
D O I
10.1109/ISCIS.2009.5291837
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High dimensional concept spaces have various applications in web search including personalized search, related page computation, diversity preservation, user interest inference, similarity computation, and advertisement targetting. Clustering and classification methods are common means to map documents and users into concept spaces. In most classification algorithms, precision (accuracy) and recall (coverage) tend to be competing aspects. In this paper, we introduce Query Smearing, an algorithm that can significantly improve both the accuracy and coverage of an existing classifier by leveraging information contained in fully anonymized search engine logs. Starting with a potentially incomplete seed classification, it expands the classification information to cover various items in search engine logs using a weighted majority voting scheme. The technique is similar to semi-supervised learning approaches and may be classified as one, but we have notable differences from most such examples. In particular, initial labels are not fully trusted for accuracy or completeness (hence, after the first stage, they can be thrown away), and additional relationships between classified items are used extensively to guide the process. Empirical evaluation shows that our algorithm performs well under the following assumptions: i) the search engine log contains a sufficiently large number of query transactions, ii) most results of most queries are relevant and on-topic, and iii) sufficient fraction of search results are classified in the seed classification, and those classifications are reasonably accurate (but not necessarily complete). (1)
引用
收藏
页码:135 / 140
页数:6
相关论文
共 50 条
  • [41] Improving search via personalized query expansion using social media
    Zhou, Dong
    Lawless, Seamus
    Wade, Vincent
    [J]. INFORMATION RETRIEVAL, 2012, 15 (3-4): : 218 - 242
  • [42] Improving Result Diversity Using Query Term Proximity in Exploratory Search
    Singh, Vikram
    Dave, Mayank
    [J]. BIG DATA ANALYTICS (BDA 2019), 2019, 11932 : 67 - 87
  • [43] Named Entity Classification Using Search Engine's Query Suggestions
    Barua, Jayendra
    Patel, Dhaval
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 612 - 618
  • [44] Improving Accuracy of Pneumonia Classification Using Modified DenseNet
    Wang, Kai
    Jiang, Ping
    Kong, Dali
    Sun, Beibei
    Shen, Ting
    [J]. JOURNAL OF DIGITAL IMAGING, 2023, 36 (04) : 1507 - 1514
  • [45] Improving the Accuracy of the SVM Classification using the Parzen Classifier
    Demidova, Liliya
    Egin, Maksim
    [J]. 2018 7TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2018, : 203 - 206
  • [46] Improving Imputation Accuracy in Ordinal Data Using Classification
    Alam, Shafiq
    Dobbie, Gillian
    Sun, XiaoBin
    [J]. INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA 2016), 2017, 557 : 45 - 56
  • [47] Improving Cancer Classification Accuracy Using Gene Pairs
    Chopra, Pankaj
    Lee, Jinseung
    Kang, Jaewoo
    Lee, Sunwon
    [J]. PLOS ONE, 2010, 5 (12):
  • [48] Improving Accuracy of Pneumonia Classification Using Modified DenseNet
    Kai Wang
    Ping Jiang
    Dali Kong
    Beibei Sun
    Ting Shen
    [J]. Journal of Digital Imaging, 2023, 36 : 1507 - 1514
  • [49] Improving the Accuracy of Tagging Recommender System by Using Classification
    Song, Jian
    He, Liang
    Lin, Xin
    [J]. 12TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY: ICT FOR GREEN GROWTH AND SUSTAINABLE DEVELOPMENT, VOLS 1 AND 2, 2010, : 387 - 391
  • [50] TRANSLATING NATURAL LANGUAGE UTTERANCES TO SEARCH QUERIES FOR SLU DOMAIN DETECTION USING QUERY CLICK LOGS
    Hakkani-Tuer, Dilek
    Tur, Gokhan
    Iyer, Rukmini
    Heck, Larry
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4953 - 4956