Caching Search Engine Results over Incremental Indices

被引:0
|
作者
Blanco, Roi [1 ]
Bortnikov, Edward
Junqueira, Flavio P. [1 ]
Lempel, Ronny
Telloli, Luca
Zaragoza, Hugo [1 ]
机构
[1] Yahoo Res, Barcelona, Spain
关键词
Search engine caching; Real-time indexing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naive approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index. To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over Wikipedia documents using a query log from the Yahoo! search engine shows that selective invalidation of cached search results can lower the number of unnecessary query evaluations by as much as 30% compared to a baseline scheme, while returning results of similar freshness. In general, our predictors enable fewer unnecessary invalidations and fewer stale results compared to a TTL-only scheme for similar freshness of results.
引用
收藏
页码:82 / 89
页数:8
相关论文
共 50 条
  • [1] On caching search engine query results
    Markatos, EP
    COMPUTER COMMUNICATIONS, 2001, 24 (02) : 137 - 143
  • [2] Centrality Indices for Web Search Engine Results Understanding
    De Virgilio, Roberto
    MODEL AND DATA ENGINEERING, MEDI 2013, 2013, 8216 : 50 - 64
  • [3] A highly scalable parallel caching system for Web Search Engine results
    Fagni, T
    Perego, R
    Silvestri, F
    EURO-PAR 2004 PARALLEL PROCESSING, PROCEEDINGS, 2004, 3149 : 347 - 354
  • [4] Refining Web search engine results using incremental clustering
    Zhang, YJ
    Liu, ZQ
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2004, 19 (1-2) : 191 - 199
  • [5] Locality in search engine queries and its implications for caching
    Xie, YL
    O'Hallaron, D
    IEEE INFOCOM 2002: THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS, 2002, : 1238 - 1247
  • [6] Improving Search Engine Performance Through Dynamic Caching
    Gutierrez-Soto, Claudio
    Palomino, Marco A.
    Roa, Ernesto
    Galdames, Patricio
    2021 40TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2021,
  • [7] Design Trade-Offs for Search Engine Caching
    Baeza-Yates, Ricardo
    Gionis, Aristides
    Junqueira, Flavio P.
    Murdock, Vanessa
    Plachouras, Vassilis
    Silvestri, Fabrizio
    ACM TRANSACTIONS ON THE WEB, 2008, 2 (04)
  • [8] An incremental model on search engine query recommendation
    Wang, JianGuo
    Huang, Joshua Zhexue
    Wu, Dingming
    Guo, Jiafeng
    Lan, Yanyan
    NEUROCOMPUTING, 2016, 218 : 423 - 431
  • [9] Competitive caching of query results in search engines
    Lempel, R
    Moran, S
    THEORETICAL COMPUTER SCIENCE, 2004, 324 (2-3) : 253 - 271
  • [10] Diversifying Search Engine Results
    Klavdianos, Christos
    Makris, Christos
    10TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2018), 2018,