Compressed text databases with efficient query algorithms based on the compressed suffix array

被引:0
|
作者
Sadakane, K [1 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Dept Syst Informat Sci, Sendai, Miyagi 980, Japan
来源
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A compressed text database based on the compressed suffix array is proposed. The compressed suffix array of Grossi and Vitter occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies O(n log \Sigma\) bits for the alphabet Sigma. On the other hand, our data structure does not use the text itself, and supports important operations for text databases: inverse, search and decompress. Our algorithms can find occ occurrences of any substring P of the text in O(\P\ log n + occ log(epsilon) n) time and decompress a part of the text of length l in O(l + log(epsilon) n) time for any given 1 greater than or equal to epsilon > 0. Our data structure occupies only n(2/epsilon (3/2 + H-o + 2 log H-o) + 2 + 4 log(epsilon) n/log(epsilon) n-1)+o(n)+O(\Sigma\ log\Sigma\) bits Where H-o less than or equal to log \Sigma\ is the order-0 entropy of the text. We also show the relationship with the opportunistic data structure of Ferragina and Manzini.
引用
收藏
页码:410 / 421
页数:12
相关论文
共 50 条
  • [1] Fibonacci Based Compressed Suffix Array
    Klein, Shmuel T.
    Shapira, Dana
    [J]. 2018 DATA COMPRESSION CONFERENCE (DCC 2018), 2018, : 415 - 415
  • [2] Faster Compressed Suffix Trees for Repetitive Text Collections
    Navarro, Gonzalo
    Ordonez, Alberto
    [J]. EXPERIMENTAL ALGORITHMS, SEA 2014, 2014, 8504 : 424 - 435
  • [3] New text indexing functionalities of the compressed suffix arrays
    Sadakane, K
    [J]. JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2003, 48 (02): : 294 - 313
  • [4] Compressed suffix arrays and suffix trees with applications to text indexing and string matching
    Grossi, R
    Vitter, JS
    [J]. SIAM JOURNAL ON COMPUTING, 2005, 35 (02) : 378 - 407
  • [5] A Qualitative Performance Comparison and Analysis of Suffix Array, FM-index and Compressed Suffix Array
    Wu, Jichuan
    Mao, Xin
    Lu, Songfeng
    [J]. 2012 INTERNATIONAL CONFERENCE ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT SCIENCE & ENGINEERING (FITMSE 2012), 2012, 14 : 348 - 352
  • [6] A space-efficient solution to find the maximum overlap using a compressed suffix array
    Rachid, Maan Haj
    Malluhi, Qutaibah
    Abouelhoda, Mohamed
    [J]. 2014 MIDDLE EAST CONFERENCE ON BIOMEDICAL ENGINEERING (MECBME), 2014, : 329 - 333
  • [7] Space-efficient construction of compressed suffix trees
    Prezza, Nicola
    Rosone, Giovanna
    [J]. THEORETICAL COMPUTER SCIENCE, 2021, 852 : 138 - 156
  • [8] A Compressed Enhanced Suffix Array Supporting Fast String Matching
    Oblebusch, Enno
    Gog, Simon
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 51 - 62
  • [9] CODING METHODS FOR TEXT STRING SEARCH ON COMPRESSED DATABASES
    GOYAL, P
    [J]. INFORMATION SYSTEMS, 1983, 8 (03) : 231 - 233
  • [10] A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays
    Wing-Kai Hon
    Tak-Wah Lam
    Kunihiko Sadakane
    Wing-Kin Sung
    Siu-Ming Yiu
    [J]. Algorithmica, 2007, 48 : 23 - 36