Sublinear approximate string matching and biological applications

被引:0
|
作者
Chang, W.I. [1 ]
Lawler, E.L. [1 ]
机构
[1] Cold Spring Harbor Lab, Cold Spring Harbor, United States
来源
Algorithmica (New York) | 1994年 / 12卷 / 4-5期
关键词
Approximation theory - Biochemistry - Computability and decidability - Nucleic acid sequences - Pattern recognition - Threshold logic - Trees (mathematics);
D O I
暂无
中图分类号
学科分类号
摘要
Given a text string of length n and a pattern string of length m over a b-letter alphabet, the k differences approximate string matching problem asks for all locations in the text where the pattern occurs with at most k differences (substitutions, insertions, deletions). We treat k not as a constant but as a fraction of m (not necessarily constant-fraction). Previous algorithms require at least O(kn) time (or exponential space). We give an algorithm that is sublinear time O((n/m)k logb m) when the text is random and k is bounded by the threshold m/(logb m+O(1)) . In particular, when k = o(m/logb m) the expected running time is o(n). In the worst case our algorithm is O(kn), but is still an improvement in that it is practical and uses O(m) space compared with O(n) or O(m2). We define three problems motivated by molecular biology and describe efficient algorithms based on our techniques: (1) approximate substring matching, (2) approximate-overlap detection, and (3) approximate codon matching. Respectively, applications to biology are local similarity search, sequence assembly, and DNA-protein matching.
引用
收藏
页码:327 / 344
相关论文
共 50 条
  • [1] SUBLINEAR APPROXIMATE STRING-MATCHING AND BIOLOGICAL APPLICATIONS
    CHANG, WI
    LAWLER, EL
    ALGORITHMICA, 1994, 12 (4-5) : 327 - 344
  • [2] Fast Convolutions and Their Applications in Approximate String Matching
    Fredriksson, Kimmo
    Grabowski, Szymon
    COMBINATORIAL ALGORITHMS, 2009, 5874 : 254 - +
  • [3] APPROXIMATE STRING MATCHING
    HALL, PAV
    DOWLING, GR
    COMPUTING SURVEYS, 1980, 12 (04) : 381 - 402
  • [4] Bit-parallel witnesses and their applications to approximate string matching
    Hyyrö, H
    Navarro, G
    ALGORITHMICA, 2005, 41 (03) : 203 - 231
  • [5] Bit-Parallel Witnesses and Their Applications to Approximate String Matching
    Heikki Hyyrö
    Gonzalo Navarro
    Algorithmica , 2005, 41 : 203 - 231
  • [6] Sublinear Algorithms for (1.5+ε)-Approximate Matching
    Bhattacharya, Sayan
    Kiss, Peter
    Saranurak, Thatchaphol
    PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023, 2023, : 254 - 266
  • [7] ALGORITHMS FOR APPROXIMATE STRING MATCHING
    UKKONEN, E
    INFORMATION AND CONTROL, 1985, 64 (1-3): : 100 - 118
  • [8] A Preprocessing for Approximate String Matching
    Baba, Kensuke
    Nakatoh, Tetsuya
    Yamada, Yasuhiro
    Ikeda, Daisuke
    INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT II, 2011, 252 : 610 - +
  • [9] Spatial Approximate String Matching
    Katsumata, Akifumi
    Miura, Takao
    2009 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 123 - 128
  • [10] Faster approximate string matching
    BaezaYates, R
    Navarro, G
    ALGORITHMICA, 1999, 23 (02) : 127 - 158