Space-efficient algorithms for document retrieval

被引:0
|
作者
Valimaki, Niko [1 ]
Makinen, Veli [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
基金
芬兰科学院;
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the Document Listing problem, where a collection D of documents d(1),..., d(k) of total length Sigma(i) d(i) = n is to be preprocessed, so that one can later efficiently list all the ndoc documents containing a given query pattern P of length m as a substring. Muthukrishnan (SODA 2002) gave an optimal solution to the problem, with O(n) time preprocessing, one can answer the queries in O(m + ndoc) time. In this paper, we improve the space-requirement of the Muthukrishnan's solution from O(n log n) bits to vertical bar CSA vertical bar + 2n + n log k(1 + o(1)) bits, where vertical bar CSA vertical bar <= n log vertical bar Sigma vertical bar (1 + o(1)) is the size of any suitable compressed suffix array (CSA), and Sigma is the underlying alphabet of documents. The time requirement depends on the CSA used, but we can obtain e.g. the optimal O(m+ndoc) time when vertical bar Sigma vertical bar, k = O(polylog(n)). For general vertical bar Sigma vertical bar, k the time requirement becomes O(m log vertical bar Sigma vertical bar + ndoclog k). Sadakane (ISAAC 2002) has developed a similar space-efficient variant of the Muthukrishnan's solution; we obtain a better time requirement in most cases, but a slightly worse space requirement.
引用
收藏
页码:205 / +
页数:3
相关论文
共 50 条