Compressed compact suffix arrays

被引:0
|
作者
Mäkinen, V
Navarro, G
机构
[1] Univ Helsinki, Dept Comp Sci, FIN-00014 Helsinki, Finland
[2] Univ Chile, Dept Comp Sci, Santiago 2120, Chile
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The compact suffix array (CSA) is a space-efficient full-text index, which is fast in practice to search for patterns in a static text. Compared to other compressed suffix arrays (Grossi and Vitter, Sadakane, Ferragina and Manzini), the CSA is significantly larger (2.7 times the text size, as opposed to 0.6-0.8 of compressed suffix arrays). The space of the CSA includes that of the text, which the CSA needs separately available. Compressed suffix arrays, on the other hand, include the text, that is, they axe self-indexes. Although compressed suffix arrays are very fast to determine the number of occurrences of a pattern, they axe in practice very slow to report even a few occurrence positions or text contexts. In this aspect the CSA is much faster. In this paper we contribute to this space-time trade off by introducing the Compressed CSA (CCSA), a self-index that improves the space usage of the CSA in exchange for search speed. We show that the occ occurrence positions of a pattern of length m in a text of length n can be reported in O((m + occ) log n) time using the CCSA, whose representation needs O(n(1 + H-k log n)) bits for any k, H-k being the k-th order empirical entropy of the text. In practice the CCSA takes 1.6 times the text size (and includes the text). This is still larger than current compressed suffix arrays, and similar in size to the LZ-index of Navarro. Search times are by far better than for self-indexes that take less space than the text, and competitive against the LZ-index and versions of compressed suffix arrays tailored to take 1.6 times the text size.
引用
收藏
页码:420 / 433
页数:14
相关论文
共 50 条
  • [1] A quick tour on suffix arrays and compressed suffix arrays
    Grossi, Roberto
    [J]. THEORETICAL COMPUTER SCIENCE, 2011, 412 (27) : 2964 - 2973
  • [2] Smaller Compressed Suffix Arrays
    Benza, Ekaterina
    Klein, Shmuel T.
    Shapira, Dana
    [J]. COMPUTER JOURNAL, 2021, 64 (05): : 721 - 730
  • [3] Compressed Spaced Suffix Arrays
    Gagie T.
    Manzini G.
    Valenzuela D.
    [J]. Mathematics in Computer Science, 2017, 11 (2) : 151 - 157
  • [4] Using Compressed Suffix-Arrays for a compact representation of temporal-graphs
    Brisaboa, Nieves R.
    Caro, Diego
    Farina, Antonio
    Andrea Rodriguez, M.
    [J]. INFORMATION SCIENCES, 2018, 465 : 459 - 483
  • [5] Compressed Suffix Arrays for Massive Data
    Siren, Jouni
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 63 - 74
  • [6] Constructing compressed suffix arrays with large alphabets
    Hon, WK
    Lam, TW
    Sadakane, K
    Sung, WK
    [J]. ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2003, 2906 : 240 - 249
  • [7] Smaller RLZ-Compressed Suffix Arrays
    Puglisi, Simon J.
    Zhukova, Bella
    [J]. 2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, : 213 - 222
  • [8] Compressed suffix arrays and suffix trees with applications to text indexing and string matching
    Grossi, R
    Vitter, JS
    [J]. SIAM JOURNAL ON COMPUTING, 2005, 35 (02) : 378 - 407
  • [9] A Compact RDF Store Using Suffix Arrays
    Brisaboa, Nieves R.
    Cerdeira-Pena, Ana
    Farina, Antonio
    Navarro, Gonzalo
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2015), 2015, 9309 : 103 - 115
  • [10] New text indexing functionalities of the compressed suffix arrays
    Sadakane, K
    [J]. JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2003, 48 (02): : 294 - 313