Caching Search Engine Results over Incremental Indices

被引:0
|
作者
Blanco, Roi [1 ]
Bortnikov, Edward
Junqueira, Flavio P. [1 ]
Lempel, Ronny
Telloli, Luca
Zaragoza, Hugo [1 ]
机构
[1] Yahoo Res, Barcelona, Spain
关键词
Search engine caching; Real-time indexing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naive approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index. To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over Wikipedia documents using a query log from the Yahoo! search engine shows that selective invalidation of cached search results can lower the number of unnecessary query evaluations by as much as 30% compared to a baseline scheme, while returning results of similar freshness. In general, our predictors enable fewer unnecessary invalidations and fewer stale results compared to a TTL-only scheme for similar freshness of results.
引用
收藏
页码:82 / 89
页数:8
相关论文
共 50 条
  • [41] Digital Hegemonies: The Localness of Search Engine Results
    Ballatore, Andrea
    Graham, Mark
    Sen, Shilad
    ANNALS OF THE AMERICAN ASSOCIATION OF GEOGRAPHERS, 2017, 107 (05) : 1194 - 1215
  • [42] Web accessibility awareness in search engine results
    Arrue M.
    Vigo M.
    Abascal J.
    Univers. Access Inf. Soc., 2008, 1-2 (103-116): : 103 - 116
  • [43] Local Explanations for Clinical Search Engine Results
    Contempre, Edeline
    Szlavik, Zoltan
    Mohammadi, Majid
    Velazquez, Erick
    ten Teije, Annette
    Tiddi, Ilaria
    HEALTHINF: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 5: HEALTHINF, 2021, : 735 - 742
  • [44] Methods for comparing rankings of search engine results
    Bar-Ilan, Judit
    Mat-Hassan, Mazlita
    Levene, Mark
    COMPUTER NETWORKS, 2006, 50 (10) : 1448 - 1463
  • [45] GeoSearcher: GeoSpatial ranking of search engine results
    Watters, C
    Amoudi, G
    ASIST 2002: PROCEEDINGS OF THE 65TH ASIST ANNUAL MEETING, VOL 39, 2002, 2002, 39 : 409 - 416
  • [46] Admission policies for caches of search engine results
    Baeza-Yates, Ricardo
    Junqueira, Flavio
    Plachouras, Vassilis
    Witschel, Hans Friedrich
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2007, 4726 : 74 - +
  • [47] Caching for Realtime Search
    Bortnikov, Edward
    Lempel, Ronny
    Vornovitsky, Kolman
    ADVANCES IN INFORMATION RETRIEVAL, 2011, 6611 : 104 - 116
  • [48] INCREMENTAL COMPUTATION VIA FUNCTION CACHING
    PUGH, W
    TEITELBAUM, T
    CONFERENCE RECORD OF THE SIXTEENTH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES, 1989, : 315 - 328
  • [49] Distributed web caching with incremental update
    Tiow, TT
    Yong, Z
    ICCS 2002: 8TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2002, : 1147 - 1151
  • [50] Methods for measuring search engine performance over time
    Bar-Ilan, J
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2002, 53 (04): : 308 - 319