Efficient edit distance with duplications and contractions

被引:5
|
作者
Pinhas, Tamar [1 ]
Zakov, Shay [2 ]
Tsur, Dekel [1 ]
Ziv-Ukelson, Michal [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
[2] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92093 USA
关键词
Edit distance; Minisatellites; Min-plus matrix multiplication; Four Russians; MINISATELLITE; ALGORITHM; ALIGNMENT; RNA;
D O I
10.1186/1748-7188-8-27
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O(vertical bar Sigma vertical bar MP(n)) and O(vertical bar Sigma vertical bar n(2)), respectively, where vertical bar Sigma vertical bar is the alphabet size, n is the length of the strings, and MP(n) is the time bound for the computation of min-plus matrix multiplication of two n x n matrices (currently, MP(n) = O(n(3) log(3) log n/log(2) n) due to an algorithm by Chan). For integer cost functions, the running time is further improved to O(vertical bar Sigma vertical bar n(3)/log(2) n). In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1/log(vertical bar D vertical bar) n < lambda < 1, the algorithm preprocesses an n x n matrix in O(n(2+lambda)/vertical bar D vertical bar) time and O(n(2+lambda)/vertical bar D vertical bar lambda(2) log(vertical bar D vertical bar)(2) n) space. Then, it may multiply the matrix with any given n-length vector in O(n(2)/lambda(2)log(vertical bar D vertical bar)(2) n) time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest O(n(3)/log(2) n) time algorithm for single-strand RNA folding with discrete cost functions. Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O(vertical bar Sigma vertical bar nMP((m) over tilde))/(n) over tilde time and O(vertical bar Sigma vertical bar n (n) over tilde) space, where (n) over tilde is the length of the run-length encoding of the strings.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Efficient edit distance with duplications and contractions
    Tamar Pinhas
    Shay Zakov
    Dekel Tsur
    Michal Ziv-Ukelson
    [J]. Algorithms for Molecular Biology, 8
  • [2] Edit Distance with Duplications and Contractions Revisited
    Pinhas, Tamar
    Tsur, Dekel
    Zakov, Shay
    Ziv-Ukelson, Michal
    [J]. COMBINATORIAL PATTERN MATCHING, 22ND ANNUAL SYMPOSIUM, CPM 2011, 2011, 6661 : 441 - 454
  • [3] Efficient Computation of the Tree Edit Distance
    Pawlik, Mateusz
    Augsten, Nikolaus
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2015, 40 (01):
  • [4] Efficient alignment and correspondence using edit distance
    Bergamini, P
    Cinque, L
    Cross, ADJ
    Hancock, ER
    Levialdi, S
    Myers, R
    [J]. ADVANCES IN PATTERN RECOGNITION, 2000, 1876 : 246 - 255
  • [5] Efficient Communication Protocols for Deciding Edit Distance
    Jowhari, Hossein
    [J]. ALGORITHMS - ESA 2012, 2012, 7501 : 648 - 658
  • [6] Efficient Parallel Computing of Graph Edit Distance
    Wang, Ran
    Fang, Yixiang
    Feng, Xing
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019), 2019, : 233 - 240
  • [7] An efficient algorithm for graph edit distance computation
    Chen, Xiaoyang
    Huo, Hongwei
    Huan, Jun
    Vitter, Jeffrey Scott
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 762 - 775
  • [8] Efficient relational matching with local edit distance
    Myers, R
    Wilson, RC
    Hancock, ER
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1711 - 1714
  • [9] Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance
    Saks, Michael
    Seshadhri, C.
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013), 2013, : 1698 - 1709
  • [10] Efficient Graph Similarity Joins with Edit Distance Constraints
    Zhao, Xiang
    Xiao, Chuan
    Lin, Xuemin
    Wang, Wei
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 834 - 845