Fully Compressed Suffix Trees

被引:30
|
作者
Russo, Luis M. S. [1 ,2 ]
Navarro, Gonzalo [3 ]
Oliveira, Arlindo L. [1 ,2 ]
机构
[1] INESC ID, P-1000029 Lisbon, Portugal
[2] Univ Tecn Lisboa, Inst Super Tecn, P-1049001 Lisbon, Portugal
[3] Univ Chile, Dept Comp Sci, Santiago, Chile
关键词
Text processing; pattern matching; string algorithms; suffix tree; data compression; compressed index; ARRAYS; INDEX;
D O I
10.1145/2000807.2000821
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Suffix trees are by far the most important data structure in stringology, with a myriad of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require Theta(nlogn) bits of space, for a string of size n. This is considerably more than the nlog(2) sigma bits needed for the string itself, where s is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Theta(n) extra bits. This is already spectacular, but the linear extra bits are still unsatisfactory when s is small as in DNA sequences. In this article, we introduce the first compressed suffix tree representation that breaks this Theta(n)bit space barrier. The Fully Compressed Suffix Tree (FCST) representation requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time. This includes extracting arbitrary text substrings, so the FCST replaces the text using almost the same space as the compressed text. An essential ingredient of FCSTs is the lowest common ancestor (LCA) operation. We reveal important connections between LCAs and suffix tree navigation. We also describe how to make FCSTs dynamic, that is, support updates to the text. The dynamic FCST also supports several operations. In particular, it can build the static FCST within optimal space and polylogarithmic time per symbol. Our theoretical results are also validated experimentally, showing that FCSTs are very effective in practice as well.
引用
收藏
页数:34
相关论文
共 50 条
  • [1] Fully-compressed suffix trees
    Russo, Luis M. S.
    Navarro, Gonzalo
    Oliveira, Arlindo L.
    [J]. LATIN 2008: THEORETICAL INFORMATICS, 2008, 4957 : 362 - +
  • [2] Fast Fully-Compressed Suffix Trees
    Navarro, Gonzalo
    Russo, Luis M. S.
    [J]. 2014 DATA COMPRESSION CONFERENCE (DCC 2014), 2014, : 283 - 291
  • [3] Dynamic fully-compressed suffix trees
    Russo, Luis M. S.
    Navarro, Gonzalo
    Oliveira, Arlindo L.
    [J]. COMBINATORIAL PATTERN MATCHING, 2008, 5029 : 191 - +
  • [4] Compressed Property Suffix Trees
    Hon, Wing-Kai
    Patil, Manish
    Shah, Rahul
    Thankachan, Sharma V.
    [J]. 2011 DATA COMPRESSION CONFERENCE (DCC), 2011, : 123 - 132
  • [5] Practical Compressed Suffix Trees
    Canovas, Rodrigo
    Navarro, Gonzalo
    [J]. EXPERIMENTAL ALGORITHMS, PROCEEDINGS, 2010, 6049 : 94 - 105
  • [6] Compressed property suffix trees
    Hon, Wing-Kai
    Patil, Manish
    Shah, Rahul
    Thankachan, Sharma V.
    [J]. INFORMATION AND COMPUTATION, 2013, 232 : 10 - 18
  • [7] PFP Compressed Suffix Trees
    Boucher, Christina
    Cvacho, Onclfej
    Gagie, Travis
    Holub, Jan
    Manzini, Giovanni
    Navarro, Gonzalo
    Rossi, Massimiliano
    [J]. 2021 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX, 2021, : 60 - 72
  • [8] Practical Compressed Suffix Trees
    Abeliuk, Andres
    Canovas, Rodrigo
    Navarro, Gonzalo
    [J]. ALGORITHMS, 2013, 6 (02) : 319 - 351
  • [9] Compressed Suffix Trees with Full Functionality
    Kunihiko Sadakane
    [J]. Theory of Computing Systems, 2007, 41 : 589 - 607
  • [10] Compressed suffix trees with full functionality
    Sadakane, Kunihiko
    [J]. THEORY OF COMPUTING SYSTEMS, 2007, 41 (04) : 589 - 607