Faster Approximate Pattern Matching: A Unified Approach

被引:11
|
作者
Charalampopoulos, Panagiotis [1 ,2 ]
Kociumaka, Tomasz [3 ]
Wellnitz, Philip [4 ]
机构
[1] Kings Coll London, Dept Informat, London, England
[2] Univ Warsaw, Inst Informat, Warsaw, Poland
[3] Bar Ilan Univ, Dept Comp Sci, Ramat Gan, Israel
[4] Max Planck Inst Informat, Saarland Informat Campus, Saarbrucken, Germany
基金
欧洲研究理事会; 欧盟地平线“2020”;
关键词
approximate pattern matching; grammar compression; dynamic strings; Hamming distance; edit distance; COMPRESSED STRINGS; SEARCH;
D O I
10.1109/FOCS46700.2020.00095
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the approximate pattern matching problem, given a text T, a pattern P, and a threshold k, the task is to find (the starting positions of) all substrings of T that are at distance at most k from P. We consider the two most fundamental string metrics: Under the Hamming distance, we search for substrings of T that have at most k mismatches with P, while under the edit distance, we search for substrings of T that can be transformed to P with at most k edits. Exact occurrences of P in T have a very simple structure: If we assume for simplicity that vertical bar P vertical bar < vertical bar T vertical bar <= 3/2 |P| and that P occurs both as a prefix and as a suffix of T, then both P and T are periodic with a common period. However, an analogous characterization for occurrences with up to k mismatches was proved only recently by Bringmann et al. [SODA'19]: Either there are O(k(2)) k-mismatch occurrences of P in T, or both P and T are at Hamming distance O(k) from strings with a common string period of length O(m/k). We tighten this characterization by showing that there are O( k) k-mismatch occurrences in the non-periodic case, and we lift it to the edit distance setting, where we tightly bound the number of k-edit occurrences by O(k(2)) in the non-periodic case. Our proofs are constructive and let us obtain a unified framework for approximate pattern matching for both considered distances. In particular, we provide meta-algorithms that only rely on a small set of primitive operations. We showcase the generality of our meta-algorithms with results for the fully compressed setting, the dynamic setting, and the standard setting.
引用
下载
收藏
页码:978 / 989
页数:12
相关论文
共 50 条
  • [21] Improving an algorithm for approximate pattern matching
    Navarro, G
    BaezaYates, R
    ALGORITHMICA, 2001, 30 (04) : 473 - 502
  • [22] Improved approximate pattern matching on hypertext
    Navarro, G
    LATIN '98: THEORETICAL INFORMATICS, 1998, 1380 : 352 - 357
  • [23] Improved approximate pattern matching on hypertext
    Navarro, G
    THEORETICAL COMPUTER SCIENCE, 2000, 237 (1-2) : 455 - 463
  • [24] On Approximate Jumbled Pattern Matching in Strings
    Péter Burcsi
    Ferdinando Cicalese
    Gabriele Fici
    Zsuzsanna Lipták
    Theory of Computing Systems, 2012, 50 : 35 - 51
  • [25] On the Communication Complexity of Approximate Pattern Matching
    Kociumaka, Tomasz
    Nogler, Jakob
    Wellnitz, Philip
    PROCEEDINGS OF THE 56TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2024, 2024, : 1758 - 1768
  • [26] Approximate Cartesian Tree Pattern Matching
    Kim, Sungmin
    Han, Yo-Sub
    DEVELOPMENTS IN LANGUAGE THEORY, DLT 2024, 2024, 14791 : 189 - 202
  • [27] Improving an Algorithm for Approximate Pattern Matching
    G. Navarro
    R. Baeza-Yates
    Algorithmica, 2001, 30 : 473 - 502
  • [28] On Approximate Jumbled Pattern Matching in Strings
    Burcsi, Peter
    Cicalese, Ferdinando
    Fici, Gabriele
    Liptak, Zsuzsanna
    THEORY OF COMPUTING SYSTEMS, 2012, 50 (01) : 35 - 51
  • [29] APPROXIMATE PATTERN-MATCHING IN A PATTERN DATABASE SYSTEM
    DAVIS, LS
    ROUSSOPOULOS, N
    INFORMATION SYSTEMS, 1980, 5 (02) : 107 - 119
  • [30] Faster Exponential Algorithm for Permutation Pattern Matching
    Gawrychowski, Pawel
    Rzepecki, Mateusz
    2022 SYMPOSIUM ON SIMPLICITY IN ALGORITHMS, SOSA, 2022, : 279 - 284