Dynamic Suffix Array with Polylogarithmic Queries and Updates

被引:6
|
作者
Kempa, Dominik [1 ]
Kociumaka, Tomasz [2 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Suffix array; text indexing; pattern matching; string synchronizing sets; dynamic data structures;
D O I
10.1145/3519935.3520061
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The suffix array SA [1..n] of a text T of length n is a permutation of {1, ..., n} describing the lexicographical ordering of suffixes of T and is considered to be one of the most important data structures for string processing, with dozens of applications in data compression, bioinformatics, and information retrieval. One of the biggest drawbacks of the suffix array is that it is very difficult to maintain under text updates: even a single character substitution can completely change the contents of the suffix array. Thus, the suffix array of a dynamic text is modelled using suffix array queries, which return the value SA [i] given any i is an element of [1..n]. Prior to this work, the fastest dynamic suffix array implementations were by Amir and Boneh, who showed how to answer suffix array queries in (O) over tilde (k) time, where k is an element of [1..n] is a trade-off parameter, with (O) over tilde (n/k)-time text updates [ISAAC 2020]. In a very recent preprint, they also provided a solution with O (log(5)n)-time queries and (O) over tilde (n(2/3))-time updates [arXiv 2021]. We propose the first data structure that supports both suffix array queries and text updates in O(polylog n) time (achieving O(log(4) n) and O(log(3+o (1)) n) time, respectively). Our data structure is deterministic and the running times for all operations are worst-case. In addition to the standard single-character edits (character insertions, deletions, and substitutions), we support (also in O(log (3+o(1)) n) time) the "cut-paste" operation that moves any (arbitrarily long) substring of T to any place in T. To achieve our result, we develop a number of new techniques which are of independent interest. This includes a new flavor of dynamic locally consistent parsing, as well as a dynamic construction of string synchronizing sets with an extra local sparsity property; this significantly generalizes the sampling technique introduced at STOC 2019. We complement our structure by a hardness result: unless the Online Matrix-Vector Multiplication (OMv) Conjecture fails, no data structure with O (polylog n)-time suffix array queries can support the "copy-paste" operation in O (n(1-epsilon)) time for any epsilon > 0.
引用
收藏
页码:1657 / 1670
页数:14
相关论文
共 50 条
  • [21] DAWG versus Suffix Array
    Balík, M
    [J]. IMPLEMENTATION AND APPLICATION OF AUTOMATA, 2003, 2608 : 233 - 238
  • [22] Sampling the Suffix Array with Minimizers
    Grabowski, Szymon
    Raniszewski, Marcin
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2015), 2015, 9309 : 287 - 298
  • [23] DAWG versus suffix array
    Balík, Miroslav
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003, 2608 : 233 - 238
  • [24] Property Suffix Array with Applications
    Charalampopoulos, Panagiotis
    Iliopoulos, Costas S.
    Liu, Chang
    Pissis, Solon P.
    [J]. LATIN 2018: THEORETICAL INFORMATICS, 2018, 10807 : 290 - 302
  • [25] A Qualitative Performance Comparison and Analysis of Suffix Array, FM-index and Compressed Suffix Array
    Wu, Jichuan
    Mao, Xin
    Lu, Songfeng
    [J]. 2012 INTERNATIONAL CONFERENCE ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT SCIENCE & ENGINEERING (FITMSE 2012), 2012, 14 : 348 - 352
  • [26] Verifiable Zero-Knowledge Order Queries and Updates for Fully Dynamic Lists and Trees
    Ghosh, Esha
    Goodrich, Michael T.
    Ohrimenko, Olga
    Tamassia, Roberto
    [J]. SECURITY AND CRYPTOGRAPHY FOR NETWORKS, SCN 2016, 2016, 9841 : 216 - 236
  • [27] Efficient and Safe Network Updates with Suffix Causal Consistency
    Liu, Sheng
    Benson, Theophilus A.
    Reiter, Michael K.
    [J]. PROCEEDINGS OF THE FOURTEENTH EUROSYS CONFERENCE 2019 (EUROSYS '19), 2019,
  • [28] Weighted Set Similarity: Queries and Updates
    Srivastava, Divesh
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1559 - 1559
  • [29] Maintaining Triangle Queries under Updates
    Kara, Ahmet
    Ngo, Hung Q.
    Nikolic, Milos
    Olteanu, Dan
    Zhang, Haozhe
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2020, 45 (03):
  • [30] DATALOG EXTENSIONS FOR DATABASE QUERIES AND UPDATES
    ABITEBOUL, S
    VIANU, V
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1991, 43 (01) : 62 - 124