Engineering a compressed suffix tree implementation

被引:0
|
作者
Valimaki, Niko [1 ]
Gerlach, Wolfgang [2 ]
Dixit, Kashyap [3 ]
Makinen, Veli [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Teollisuuskatu 23, SF-00510 Helsinki, Finland
[2] Univ Bielefeld, Technische Fakultat, Bielefeld, Germany
[3] Indian Inst Technol, Dept Comp Engn & Sci, Kanpur 208016, Uttar Pradesh, India
来源
基金
芬兰科学院;
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Suffix tree is one of the most important data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet Sigma = {A, C, G, T} can be stored in nlog vertical bar Sigma vertical bar = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50. We report on an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log vertical bar Sigma vertical bar bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix tree. At the same time, a representative algorithm is slowed down by factor 30. Our implementation follows the original proposal in spirit, but some internal parts are tailored towards practical implementation. Our construction algorithm has time requirement O(n log n log vertical bar Sigma vertical bar) and uses closely the same space as the final structure while constructing it: on the 10 MB DNA sequence, the maximum space usage during construction is only 1.4 times the final product size.
引用
下载
收藏
页码:217 / +
页数:3
相关论文
共 50 条
  • [1] Compressed by the suffix tree
    Senft, Martin
    DCC 2006: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2006, : 183 - 192
  • [2] A Compressed Suffix Tree Based Implementation With Low Peak Memory Usage
    Nogueira Nunes, Daniel Saad
    Ayala-Rincon, Mauricio
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2014, 302 : 73 - 94
  • [3] An(other) entropy-bounded compressed suffix tree
    Fischer, Johannes
    Makinen, Veli
    Navarro, Gonzalo
    COMBINATORIAL PATTERN MATCHING, 2008, 5029 : 152 - +
  • [4] Tree Contraction for Compressed Suffix Arrays on Modern Processors
    Yamamuro, Takeshi
    Onizuka, Makoto
    Honjo, Toshimori
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2015, PT II, 2015, 9050 : 363 - 378
  • [5] A Suffix Tree Or Not a Suffix Tree?
    Starikovskaya, Tatiana
    Vildhoj, Hjalte Wedel
    COMBINATORIAL ALGORITHMS, IWOCA 2014, 2015, 8986 : 338 - 350
  • [6] A suffix tree or not a suffix tree?
    Starikovskaya, Tatiana
    Vildhoj, Hjalte Wedel
    JOURNAL OF DISCRETE ALGORITHMS, 2015, 32 : 14 - 23
  • [7] Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
    Rachid, Maan Haj
    Malluhi, Qutaibah
    Abouelhoda, Andmohamed
    BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [8] Engineering a fast online persistent suffix tree construction
    Bedathur, SJ
    Haritsa, JR
    20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 720 - 731
  • [9] A practical suffix-tree implementation for string searches
    Dorohonceanu, B
    Nevill-Manning, C
    DR DOBBS JOURNAL, 2000, 25 (07): : 133 - +
  • [10] Compressed suffix tree -: a basis for genome-scale sequence analysis
    Valimaki, Niko
    Gerlach, Wolfgang
    Dixit, Kashyap
    Makinen, Veli
    BIOINFORMATICS, 2007, 23 (05) : 629 - 630