Dynamic Suffix Array with Polylogarithmic Queries and Updates

被引:6
|
作者
Kempa, Dominik [1 ]
Kociumaka, Tomasz [2 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Suffix array; text indexing; pattern matching; string synchronizing sets; dynamic data structures;
D O I
10.1145/3519935.3520061
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The suffix array SA [1..n] of a text T of length n is a permutation of {1, ..., n} describing the lexicographical ordering of suffixes of T and is considered to be one of the most important data structures for string processing, with dozens of applications in data compression, bioinformatics, and information retrieval. One of the biggest drawbacks of the suffix array is that it is very difficult to maintain under text updates: even a single character substitution can completely change the contents of the suffix array. Thus, the suffix array of a dynamic text is modelled using suffix array queries, which return the value SA [i] given any i is an element of [1..n]. Prior to this work, the fastest dynamic suffix array implementations were by Amir and Boneh, who showed how to answer suffix array queries in (O) over tilde (k) time, where k is an element of [1..n] is a trade-off parameter, with (O) over tilde (n/k)-time text updates [ISAAC 2020]. In a very recent preprint, they also provided a solution with O (log(5)n)-time queries and (O) over tilde (n(2/3))-time updates [arXiv 2021]. We propose the first data structure that supports both suffix array queries and text updates in O(polylog n) time (achieving O(log(4) n) and O(log(3+o (1)) n) time, respectively). Our data structure is deterministic and the running times for all operations are worst-case. In addition to the standard single-character edits (character insertions, deletions, and substitutions), we support (also in O(log (3+o(1)) n) time) the "cut-paste" operation that moves any (arbitrarily long) substring of T to any place in T. To achieve our result, we develop a number of new techniques which are of independent interest. This includes a new flavor of dynamic locally consistent parsing, as well as a dynamic construction of string synchronizing sets with an extra local sparsity property; this significantly generalizes the sampling technique introduced at STOC 2019. We complement our structure by a hardness result: unless the Online Matrix-Vector Multiplication (OMv) Conjecture fails, no data structure with O (polylog n)-time suffix array queries can support the "copy-paste" operation in O (n(1-epsilon)) time for any epsilon > 0.
引用
收藏
页码:1657 / 1670
页数:14
相关论文
共 50 条
  • [41] Lightweight Distributed Suffix Array Construction
    Fischer, Johannes
    Kurpicz, Florian
    [J]. 2019 PROCEEDINGS OF THE MEETING ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX, 2019, : 27 - 38
  • [42] Fibonacci Based Compressed Suffix Array
    Klein, Shmuel T.
    Shapira, Dana
    [J]. 2018 DATA COMPRESSION CONFERENCE (DCC 2018), 2018, : 415 - 415
  • [43] Fast Parallel Suffix Array on the GPU
    Wang, Leyuan
    Baxter, Sean
    Owens, John D.
    [J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 573 - 587
  • [44] A taxonomy of suffix array construction algorithms
    Publisi, Simon J.
    Smyth, W. F.
    Turpin, Andrew H.
    [J]. ACM COMPUTING SURVEYS, 2007, 39 (02)
  • [45] Scalable parallel suffix array construction
    Kulla, Fabian
    Sanders, Peter
    [J]. HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING '06, 2007, : 543 - 546
  • [46] Most Recent Match Queries in On-Line Suffix Trees
    Larsson, N. Jesper
    [J]. COMBINATORIAL PATTERN MATCHING, CPM 2014, 2014, 8486 : 252 - 261
  • [47] Massively parallel suffix array construction
    Iliopoulos, CS
    Korda, M
    [J]. SOFSEM'98: THEORY AND PRACTICE OF INFORMATICS, 1998, 1521 : 371 - 380
  • [48] CONSTRUCTING SUFFIX ARRAY DURING DECOMPRESSION
    Mahmoud, M.
    Abouelhoda, M. I.
    Kandil, A.
    Elbialy, A.
    [J]. 2008 CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE, 2008, : 47 - 50
  • [49] FINE TUNING THE ENHANCED SUFFIX ARRAY
    Abouelhoda, M. I.
    Dawood, A.
    [J]. 2008 CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE, 2008, : 39 - +
  • [50] Scalable parallel suffix array construction
    Kulla, Fabian
    Sanders, Peter
    [J]. RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 2006, 4192 : 22 - 29