Dynamic Suffix Array with Polylogarithmic Queries and Updates

被引:6
|
作者
Kempa, Dominik [1 ]
Kociumaka, Tomasz [2 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Suffix array; text indexing; pattern matching; string synchronizing sets; dynamic data structures;
D O I
10.1145/3519935.3520061
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The suffix array SA [1..n] of a text T of length n is a permutation of {1, ..., n} describing the lexicographical ordering of suffixes of T and is considered to be one of the most important data structures for string processing, with dozens of applications in data compression, bioinformatics, and information retrieval. One of the biggest drawbacks of the suffix array is that it is very difficult to maintain under text updates: even a single character substitution can completely change the contents of the suffix array. Thus, the suffix array of a dynamic text is modelled using suffix array queries, which return the value SA [i] given any i is an element of [1..n]. Prior to this work, the fastest dynamic suffix array implementations were by Amir and Boneh, who showed how to answer suffix array queries in (O) over tilde (k) time, where k is an element of [1..n] is a trade-off parameter, with (O) over tilde (n/k)-time text updates [ISAAC 2020]. In a very recent preprint, they also provided a solution with O (log(5)n)-time queries and (O) over tilde (n(2/3))-time updates [arXiv 2021]. We propose the first data structure that supports both suffix array queries and text updates in O(polylog n) time (achieving O(log(4) n) and O(log(3+o (1)) n) time, respectively). Our data structure is deterministic and the running times for all operations are worst-case. In addition to the standard single-character edits (character insertions, deletions, and substitutions), we support (also in O(log (3+o(1)) n) time) the "cut-paste" operation that moves any (arbitrarily long) substring of T to any place in T. To achieve our result, we develop a number of new techniques which are of independent interest. This includes a new flavor of dynamic locally consistent parsing, as well as a dynamic construction of string synchronizing sets with an extra local sparsity property; this significantly generalizes the sampling technique introduced at STOC 2019. We complement our structure by a hardness result: unless the Online Matrix-Vector Multiplication (OMv) Conjecture fails, no data structure with O (polylog n)-time suffix array queries can support the "copy-paste" operation in O (n(1-epsilon)) time for any epsilon > 0.
引用
收藏
页码:1657 / 1670
页数:14
相关论文
共 50 条
  • [31] Combinatorial Queries and Updates on Partial Words
    Diaconu, Adrian
    Manea, Florin
    Tiseanu, Catalin
    [J]. FUNDAMENTALS OF COMPUTATION THEORY, PROCEEDINGS, 2009, 5699 : 96 - 108
  • [32] PROCEDURAL LANGUAGES FOR DATABASE QUERIES AND UPDATES
    ABITEBOUL, S
    VIANU, V
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1990, 41 (02) : 181 - 229
  • [33] Conjunctive Queries with Inequalities Under Updates
    Idris, Muhammad
    Ugarte, Martin
    Vansummeren, Stijn
    Voigt, Hannes
    Lehner, Wolfgang
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (07): : 733 - 745
  • [34] Approximate Queries over Concurrent Updates
    Wang, Congying
    Tellapuri, Nithin Sastry
    Keshannagari, Sphoorthi
    Zinsley, Dylan
    Zhao, Zhuoyue
    Xie, Dong
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3986 - 3989
  • [35] Answering Conjunctive Queries under Updates
    Berkholz, Christoph
    Keppeler, Jens
    Schweikardt, Nicole
    [J]. PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 303 - 318
  • [36] GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array
    Suzuki, Shuji
    Kakuta, Masanori
    Ishida, Takashi
    Akiyama, Yutaka
    [J]. PLOS ONE, 2014, 9 (08):
  • [37] Lightweight Parameterized Suffix Array Construction
    Tomohiro, I
    Deguchi, Satoshi
    Bannai, Hideo
    Inenaga, Shunsuke
    Takeda, Masayuki
    [J]. COMBINATORIAL ALGORITHMS, 2009, 5874 : 312 - +
  • [38] Suffix array and Lyndon factorization of a text
    Mantaci, Sabrina
    Restivo, Antonio
    Rosone, Giovanna
    Sciortino, Marinella
    [J]. JOURNAL OF DISCRETE ALGORITHMS, 2014, 28 : 2 - 8
  • [39] Fully Dynamic Approximation of LIS in Polylogarithmic Time
    Gawrychowski, Pawel
    Janczewski, Wojciech
    [J]. STOC '21: PROCEEDINGS OF THE 53RD ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2021, : 654 - 667
  • [40] Scalable parallel suffix array construction
    Kulla, Fabian
    Sanders, Peter
    [J]. PARALLEL COMPUTING, 2007, 33 (09) : 605 - 612