Dynamic Suffix Array with Polylogarithmic Queries and Updates

被引:6
|
作者
Kempa, Dominik [1 ]
Kociumaka, Tomasz [2 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Suffix array; text indexing; pattern matching; string synchronizing sets; dynamic data structures;
D O I
10.1145/3519935.3520061
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The suffix array SA [1..n] of a text T of length n is a permutation of {1, ..., n} describing the lexicographical ordering of suffixes of T and is considered to be one of the most important data structures for string processing, with dozens of applications in data compression, bioinformatics, and information retrieval. One of the biggest drawbacks of the suffix array is that it is very difficult to maintain under text updates: even a single character substitution can completely change the contents of the suffix array. Thus, the suffix array of a dynamic text is modelled using suffix array queries, which return the value SA [i] given any i is an element of [1..n]. Prior to this work, the fastest dynamic suffix array implementations were by Amir and Boneh, who showed how to answer suffix array queries in (O) over tilde (k) time, where k is an element of [1..n] is a trade-off parameter, with (O) over tilde (n/k)-time text updates [ISAAC 2020]. In a very recent preprint, they also provided a solution with O (log(5)n)-time queries and (O) over tilde (n(2/3))-time updates [arXiv 2021]. We propose the first data structure that supports both suffix array queries and text updates in O(polylog n) time (achieving O(log(4) n) and O(log(3+o (1)) n) time, respectively). Our data structure is deterministic and the running times for all operations are worst-case. In addition to the standard single-character edits (character insertions, deletions, and substitutions), we support (also in O(log (3+o(1)) n) time) the "cut-paste" operation that moves any (arbitrarily long) substring of T to any place in T. To achieve our result, we develop a number of new techniques which are of independent interest. This includes a new flavor of dynamic locally consistent parsing, as well as a dynamic construction of string synchronizing sets with an extra local sparsity property; this significantly generalizes the sampling technique introduced at STOC 2019. We complement our structure by a hardness result: unless the Online Matrix-Vector Multiplication (OMv) Conjecture fails, no data structure with O (polylog n)-time suffix array queries can support the "copy-paste" operation in O (n(1-epsilon)) time for any epsilon > 0.
引用
收藏
页码:1657 / 1670
页数:14
相关论文
共 50 条
  • [1] Sapling: accelerating suffix array queries with learned data models
    Kirsche, Melanie
    Das, Arun
    Schatz, Michael C.
    [J]. BIOINFORMATICS, 2021, 37 (06) : 744 - 749
  • [2] Hierarchical data cube for range queries and dynamic updates
    Li, JZ
    Gao, H
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2003, 2798 : 61 - 75
  • [3] Suffix cactus: A cross between suffix tree and suffix array
    Karkkainen, J
    [J]. COMBINATORIAL PATTERN MATCHING, 1995, 937 : 191 - 204
  • [4] Optimal prefix and suffix queries on texts
    Crochemore, Maxime
    Iliopoulos, Costas S.
    Rahman, M. Sohel
    [J]. INFORMATION PROCESSING LETTERS, 2008, 108 (05) : 320 - 325
  • [5] UPDATES AND SUBJUNCTIVE QUERIES
    GRAHNE, G
    MENDELZON, AO
    [J]. INFORMATION AND COMPUTATION, 1995, 116 (02) : 241 - 252
  • [6] General dynamic Yannakakis: conjunctive queries with theta joins under updates
    Muhammad Idris
    Martín Ugarte
    Stijn Vansummeren
    Hannes Voigt
    Wolfgang Lehner
    [J]. The VLDB Journal, 2020, 29 : 619 - 653
  • [7] General dynamic Yannakakis: conjunctive queries with theta joins under updates
    Idris, Muhammad
    Ugarte, Martin
    Vansummeren, Stijn
    Voigt, Hannes
    Lehner, Wolfgang
    [J]. VLDB JOURNAL, 2020, 29 (2-3): : 619 - 653
  • [8] Parallel hierarchical data cube for range sum queries and dynamic updates
    Li, JZ
    Gao, H
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, 3180 : 339 - 348
  • [9] Compact suffix array
    Mäkinen, V
    [J]. COMBINATORIAL PATTERN MATCHING, 2000, 1848 : 305 - 319
  • [10] Reconstructing a suffix array
    Franek, Frantisek
    Smyth, William F.
    [J]. INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2006, 17 (06) : 1281 - 1295