Improving an algorithm for approximate pattern matching

被引:16
|
作者
Navarro, G [1 ]
BaezaYates, R [1 ]
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
关键词
string matching allowing errors; bit-parallelism; edit distance; approximate matching probability;
D O I
10.1007/s00453-001-0034-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We study a recent algorithm for fast on-line approximate string matching. This is the problem of searching a pattern in a text allowing errors in the pattern or in the text. The algorithm is based on a very East kernel which is able to search short patterns using a nondeterministic finite automaton, which is simulated using bit-parallelism, A number of techniques to extend this kernel for longer patterns are presented in that work. However, the techniques can be integrated in many ways and the optimal interplay among them is by no means obvious. The solution to this problem starts at a very low level, by obtaining basic probabilistic information about the problem which was not previously known, and ends integrating analytical results with empirical data to obtain the optimal heuristic. The conclusions obtained via analysis are experimentally confirmed. We also improve many of the techniques and obtain a combined heuristic which is faster than the original work. This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, instead of the common practice of first designing an algorithm and then analyzing it.
引用
下载
收藏
页码:473 / 502
页数:30
相关论文
共 50 条
  • [21] AN IMPROVED ALGORITHM FOR APPROXIMATE STRING MATCHING
    GALIL, Z
    PARK, K
    LECTURE NOTES IN COMPUTER SCIENCE, 1989, 372 : 394 - 404
  • [22] A Consensus Algorithm for Approximate String Matching
    Rubio, Miguel
    Alba, Alfonso
    Mendez, Martin
    Arce-Santana, Edgar
    Rodriguez-Kessler, Margarita
    3RD IBEROAMERICAN CONFERENCE ON ELECTRONICS ENGINEERING AND COMPUTER SCIENCE, CIIECC 2013, 2013, 7 : 322 - 327
  • [23] AN IMPROVED ALGORITHM FOR APPROXIMATE STRING MATCHING
    GALIL, Z
    PARK, K
    SIAM JOURNAL ON COMPUTING, 1990, 19 (06) : 989 - 999
  • [24] A Randomized Algorithm for Approximate String Matching
    M. J. Atallah
    F. Chyzak
    P. Dumas
    Algorithmica, 2001, 29 : 468 - 486
  • [25] A randomized algorithm for approximate string matching
    Atallah, MJ
    Chyzak, F
    Dumas, P
    ALGORITHMICA, 2001, 29 (03) : 468 - 486
  • [26] A parallel algorithm for approximate string matching
    Kaplan, K
    Burge, LL
    Garuba, M
    PDPTA'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-4, 2003, : 1844 - 1848
  • [27] AN APPROXIMATE STRING-MATCHING ALGORITHM
    KIM, JY
    SHAWETAYLOR, J
    THEORETICAL COMPUTER SCIENCE, 1992, 92 (01) : 107 - 117
  • [28] Approximate Pattern Matching for DNA Sequence Data
    Patil, Nagamma
    Toshniwal, Durga
    Garg, Kumkum
    COMPUTER NETWORKS AND INFORMATION TECHNOLOGIES, 2011, 142 : 212 - 218
  • [29] Strict approximate pattern matching with general gaps
    Wu, Youxi
    Fu, Shuai
    Jiang, He
    Wu, Xindong
    APPLIED INTELLIGENCE, 2015, 42 (03) : 566 - 580
  • [30] A black box for online approximate pattern matching
    Clifford, Raphael
    Efremenko, Klim
    Porat, Benny
    Porat, Ely
    INFORMATION AND COMPUTATION, 2011, 209 (04) : 731 - 736