Optimal Algorithms for Bounded Weighted Edit Distance

被引:2
|
作者
Cassis, Alejandro [1 ,2 ]
Kociumaka, Tomasz [2 ]
Wellnitz, Philip [2 ]
机构
[1] Saarland Univ, Saarland Informat Campus, Saarbrucken, Germany
[2] Max Planck Inst Informat, Saarland Informat Campus, Saarbrucken, Germany
基金
欧洲研究理事会;
关键词
edit distance; conditional lower bounds; string algorithms;
D O I
10.1109/FOCS57990.2023.00135
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The edit distance (also known as Levenshtein distance) of two strings is the minimum number of insertions, deletions, and substitutions of characters needed to transform one string into the other. The textbook dynamic-programming algorithm computes the edit distance of two length- n strings in O(n(2)) time, which is optimal up to subpolynomial factors assuming the Strong Exponential Time Hypothesis (SETH). An established way of circumventing this hardness is to consider the bounded setting, where the running time is parameterized by the edit distance k. A celebrated algorithm by Landau and Vishkin (JCSS'88) achieves a running time of O(n+ k(2)), which is optimal as a function of n and k (again, up to subpolynmial factors and assuming SETH). While the theory community thoroughly studied the Levenshtein distance, most practical applications rely on a more general weighted edit distance, where each edit has a weight depending on its type and the involved characters from the alphabet Sigma. This is formalized through a weight function w : Sigma boolean OR{epsilon}x Sigma U{epsilon} -> R normalized so that w(a bar right arrow a) = 0 for a is an element of Sigma boolean OR {epsilon} and w(a bar right arrow b) >= 1 for a, b is an element of Sigma boolean OR {epsilon} with a not equal= b; the goal is to find an alignment of the two strings minimizing the total weight of edits. The classic O(n(2))-time algorithm supports this setting seamlessly, but for many decades just a straightforward O(nk)-time solution was known for the bounded version of the weighted edit distance problem. Only very recently, Das, Gilbert, Hajiaghayi, Kociumaka, and Saha (STOC'23) gave the first non-trivial algorithm, achieving a time complexity of O(n + k(5)). While this running time is linear for k <= n(1/5), it is still very far from O(n + k(2))-the bound achievable in the unweighted setting. This is unsatisfactory, especially given the lack of any compelling evidence that the weighted version is inherently harder. In this paper, we essentially close this gap by showing both an improved (O) over tilde (n+ root nk(3))-time algorithm and, more surprisingly, a matching lower bound: Conditioned on the All-Pairs Shortest Paths (APSP) hypothesis, the running time of our solution is optimal for root n <= k <= n (up to subpolynomial factors). In particular, this is the first separation between the complexity of the weighted and unweighted edit distance problems. Just like the Landau-Vishkin algorithm, our algorithm can be adapted to a wide variety of settings, such as when the input is given in a compressed representation. This is because, independently of the string length n, our procedure takes (O) over tilde (k(3)) time assuming that the equality of any two substrings can be tested in (O) over tilde (1) time. Consistently with the previous work, our algorithm relies on the observation that strings with a rich structure of low-weight alignments must contain highly repetitive substrings. Nevertheless, achieving the optimal running time requires multiple new insights. We capture the right notion of repetitiveness using a tailor-made compressibility measure that we call self-edit distance. Our divide-and-conquer algorithm reduces the computation of weighted edit distance to several subproblems involving substrings of small self-edit distance and, at the same time, distributes the budget for edit weights among these subproblems. We then exploit the repetitive structure of the underlying substrings using state-of-the-art results for multiplesource shortest paths in planar graphs (Klein, SODA'05). As a stepping stone for our conditional lower bound, we study a dynamic problem of maintaining two strings subject to updates (substitutions of characters) and weighted edit distance queries. We significantly extend the construction of Abboud and Dahlgaard ( FOCS'16), originally for dynamic shortest paths in planar graphs, to show that a sequence of n updates and q <= n queries cannot be handled much faster than in O(n(2) root q) time. We then compose the snapshots of the dynamic strings to derive hardness of the static problem in the bounded setting.
引用
收藏
页码:2177 / 2187
页数:11
相关论文
共 50 条
  • [11] Fast cyclic edit distance computation with weighted edit costs in classification
    Peris, G
    Marzal, A
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 184 - 187
  • [12] Lossless filter for multiple repeats with bounded edit distance
    Pierre Peterlongo
    Gustavo Akio Tominaga Sacomoto
    Alair Pereira do Lago
    Nadia Pisanti
    Marie-France Sagot
    Algorithms for Molecular Biology, 4
  • [13] Lossless filter for multiple repeats with bounded edit distance
    Peterlongo, Pierre
    Sacomoto, Gustavo Akio Tominaga
    do Lago, Alair Pereira
    Pisanti, Nadia
    Sagot, Marie-France
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2009, 4
  • [14] Decomposition algorithms for the tree edit distance problem
    Dulucq, Serge
    Touzet, Helene
    JOURNAL OF DISCRETE ALGORITHMS, 2005, 3 (2-4) : 448 - 471
  • [15] Exploiting Spatial Architectures for Edit Distance Algorithms
    Tithi, Jesmin Jahan
    Crago, Neal C.
    Emer, Joel S.
    2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2014, : 23 - 34
  • [16] Optimal Approximation Algorithms for Maximum Distance-Bounded Subgraph Problems
    Yuichi Asahiro
    Yuya Doi
    Eiji Miyano
    Kazuaki Samizo
    Hirotaka Shimizu
    Algorithmica, 2018, 80 : 1834 - 1856
  • [17] Optimal Approximation Algorithms for Maximum Distance-Bounded Subgraph Problems
    Asahiro, Yuichi
    Doi, Yuya
    Miyano, Eiji
    Shimizu, Hirotaka
    COMBINATORIAL OPTIMIZATION AND APPLICATIONS, (COCOA 2015), 2015, 9486 : 586 - 600
  • [18] Optimal Approximation Algorithms for Maximum Distance-Bounded Subgraph Problems
    Asahiro, Yuichi
    Doi, Yuya
    Miyano, Eiji
    Samizo, Kazuaki
    Shimizu, Hirotaka
    ALGORITHMICA, 2018, 80 (06) : 1834 - 1856
  • [19] Faster algorithms for guided tree edit distance
    Tsur, Dekel
    INFORMATION PROCESSING LETTERS, 2008, 108 (04) : 251 - 254
  • [20] Weighted Edit Distance Computation: Strings, Trees, and Dyck
    Das, Debarati
    Gilbert, Jacob
    Hajiaghayi, Mohammad Taghi
    Kociumaka, Tomasz
    Saha, Barna
    PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023, 2023, : 377 - 390