Query Smearing: Improving Classification Accuracy and Coverage of Search Results using Logs

被引:0
|
作者
Oztekin, B. Uygar [1 ]
Chiu, Andy [1 ]
机构
[1] Google Inc, Mountain View, CA USA
关键词
D O I
10.1109/ISCIS.2009.5291837
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High dimensional concept spaces have various applications in web search including personalized search, related page computation, diversity preservation, user interest inference, similarity computation, and advertisement targetting. Clustering and classification methods are common means to map documents and users into concept spaces. In most classification algorithms, precision (accuracy) and recall (coverage) tend to be competing aspects. In this paper, we introduce Query Smearing, an algorithm that can significantly improve both the accuracy and coverage of an existing classifier by leveraging information contained in fully anonymized search engine logs. Starting with a potentially incomplete seed classification, it expands the classification information to cover various items in search engine logs using a weighted majority voting scheme. The technique is similar to semi-supervised learning approaches and may be classified as one, but we have notable differences from most such examples. In particular, initial labels are not fully trusted for accuracy or completeness (hence, after the first stage, they can be thrown away), and additional relationships between classified items are used extensively to guide the process. Empirical evaluation shows that our algorithm performs well under the following assumptions: i) the search engine log contains a sufficiently large number of query transactions, ii) most results of most queries are relevant and on-topic, and iii) sufficient fraction of search results are classified in the seed classification, and those classifications are reasonably accurate (but not necessarily complete). (1)
引用
收藏
页码:135 / 140
页数:6
相关论文
共 50 条
  • [1] Improving Europeana Search Experience Using Query Logs
    Ceccarelli, Diego
    Gordea, Sergiu
    Lucchese, Claudio
    Nardini, Franco Maria
    Tolomei, Gabriele
    [J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, TPDL 2011, 2011, 6966 : 384 - +
  • [2] Using Search Results to Microaggregate Query Logs Semantically
    Erola, Arnau
    Castella-Roca, Jordi
    [J]. DATA PRIVACY MANAGEMENT AND AUTONOMOUS SPONTANEOUS SECURITY, DPM 2013, 2014, 8247 : 148 - 161
  • [3] Improving the effectiveness of keyword search in databases using query logs
    Yu, Ziqiang
    Abraham, Ajith
    Yu, Xiaohui
    Liu, Yang
    Zhou, Jing
    Ma, Kun
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 81 : 169 - 179
  • [4] Improving the Effectiveness of Keyword Search in Databases Using Query Logs
    Zhou, Jing
    Liu, Yang
    Yu, Ziqiang
    [J]. WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 193 - 206
  • [5] Using web search logs to identify query classification terms
    Taksa, Isak
    Zelikovitz, Sarah
    Spink, Amanda
    [J]. INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 469 - +
  • [6] Using web search logs to identify query classification terms
    Taksa, Isak
    Zelikovitz, Sarah
    Spink, Amanda
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2007, 3 (04) : 315 - +
  • [7] Domain Adaptation for Cross-Lingual Query Classification using Search Query Logs and Document Classification
    Hady, Mohamed Farouk Abdel
    Ibrahim, Rania
    Ashour, Ahmed
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [8] Ranking Keyword Search Results with Query Logs
    Zhou, Jing
    Yu, Xiaohui
    Liu, Yang
    Yu, Ziqiang
    [J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 770 - 771
  • [9] Query recommendation using query logs in search engines
    BaezaYates, R
    Hurtado, C
    Mendoza, M
    [J]. CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 588 - 596
  • [10] Query recommendation using query logs in search engines
    Baeza-Yates, Ricardo
    Hurtado, Carlos
    Mendoza, Marcelo
    De Chile, Universidad
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3268 : 588 - 596