Faster Compact Top-k Document Retrieval

被引:12
|
作者
Konow, Roberto [1 ]
Navarro, Gonzalo [1 ]
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
关键词
QUERIES;
D O I
10.1109/DCC.2013.43
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
An optimal index solving top-k document retrieval [Navarro and Nekrich, SODA'12] takes O(m + k) time for a pattern of length m, but its space is at least 80n bytes for a collection of n symbols. We reduce it to 1.5n-3n bytes, with O(m+(k+ log log n) log log n) time, on typical texts. The index is up to 25 times faster than the best previous compressed solutions, and requires at most 5% more space in practice (and in some cases as little as one half). Apart from replacing classical by compressed data structures, our main idea is to replace suffix tree sampling by frequency thresholding to achieve compression.
引用
收藏
页码:351 / 360
页数:10
相关论文
共 50 条
  • [1] Faster Compressed Top-k Document Retrieval
    Hon, Wing-Kai
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    [J]. 2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 341 - 350
  • [2] Faster Top-k Document Retrieval in Optimal Space
    Navarro, Gonzalo
    Thankachan, Sharma V.
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2013), 2013, 8214 : 255 - 262
  • [3] Faster Top-k Document Retrieval Using Block-Max Indexes
    Ding, Shuai
    Suel, Torsten
    [J]. PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 993 - 1002
  • [4] Top-k Document Retrieval in Compact Space and Near-Optimal Time
    Navarro, Gonzalo
    Thankachan, Sharma V.
    [J]. ALGORITHMS AND COMPUTATION, 2013, 8283 : 394 - 404
  • [5] Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval
    Khattab, Omar
    Hammoud, Mohammad
    Elsayed, Tamer
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1031 - 1040
  • [6] Top-k document retrieval in optimal space
    Tsur, Dekel
    [J]. INFORMATION PROCESSING LETTERS, 2013, 113 (12) : 440 - 443
  • [7] Top-k Document Retrieval in External Memory
    Shah, Rahul
    Sheng, Cheng
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    [J]. ALGORITHMS - ESA 2013, 2013, 8125 : 803 - 814
  • [8] Top-K Color Queries for Document Retrieval
    Karpinski, Marek
    Nekrich, Yakov
    [J]. PROCEEDINGS OF THE TWENTY-SECOND ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2011, : 401 - 411
  • [9] Efficient In-Memory Top-k Document Retrieval
    Culpepper, J. Shane
    Petri, Matthias
    Scholer, Falk
    [J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 225 - 234
  • [10] TIME-OPTIMAL TOP-k DOCUMENT RETRIEVAL
    Navarro, Gonzalo
    Nekrich, Yakov
    [J]. SIAM JOURNAL ON COMPUTING, 2017, 46 (01) : 80 - 113