Improving an algorithm for approximate pattern matching

被引:16
|
作者
Navarro, G [1 ]
BaezaYates, R [1 ]
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
关键词
string matching allowing errors; bit-parallelism; edit distance; approximate matching probability;
D O I
10.1007/s00453-001-0034-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We study a recent algorithm for fast on-line approximate string matching. This is the problem of searching a pattern in a text allowing errors in the pattern or in the text. The algorithm is based on a very East kernel which is able to search short patterns using a nondeterministic finite automaton, which is simulated using bit-parallelism, A number of techniques to extend this kernel for longer patterns are presented in that work. However, the techniques can be integrated in many ways and the optimal interplay among them is by no means obvious. The solution to this problem starts at a very low level, by obtaining basic probabilistic information about the problem which was not previously known, and ends integrating analytical results with empirical data to obtain the optimal heuristic. The conclusions obtained via analysis are experimentally confirmed. We also improve many of the techniques and obtain a combined heuristic which is faster than the original work. This work shows an excellent example of a complex and theoretical analysis of algorithms used for design and for practical algorithm engineering, instead of the common practice of first designing an algorithm and then analyzing it.
引用
下载
收藏
页码:473 / 502
页数:30
相关论文
共 50 条
  • [41] A linear size index for approximate pattern matching
    Chan, Ho-Leung
    Lam, Tak-Wah
    Sung, Wing-Kin
    Tam, Siu-Lung
    Wong, Swee-Seong
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2006, 4009 : 49 - 59
  • [42] Approximate pattern matching and transitive closure logics
    Lemström, K
    Hella, L
    THEORETICAL COMPUTER SCIENCE, 2003, 299 (1-3) : 387 - 412
  • [43] NetDAP: (δ, γ) −approximate pattern matching with length constraints
    Youxi Wu
    Jinquan Fan
    Yan Li
    Lei Guo
    Xindong Wu
    Applied Intelligence, 2020, 50 : 4094 - 4116
  • [44] An efficient pattern matching algorithm
    Sleit, Azzam
    AlMobaideen, Wesam
    Baarah, Aladdin H.
    Abusitta, Adel H.
    Journal of Applied Sciences, 2007, 7 (18) : 2691 - 2695
  • [45] An Improved Pattern Matching Algorithm
    Yuan, Jingbo
    Zheng, Jisen
    Ding, Shunli
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 599 - 603
  • [46] A FAST pattern matching algorithm
    Sheik, SS
    Aggarwal, SK
    Poddar, A
    Balakrishnan, N
    Sekar, K
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (04): : 1251 - 1256
  • [47] A subquadratic algorithm for approximate limited expression matching
    Wu, S
    Manber, U
    Myers, G
    ALGORITHMICA, 1996, 15 (01) : 50 - 67
  • [48] A SUBQUADRATIC ALGORITHM FOR APPROXIMATE REGULAR EXPRESSION MATCHING
    WU, S
    MANBER, U
    MYERS, E
    JOURNAL OF ALGORITHMS, 1995, 19 (03) : 346 - 360
  • [49] Approximate string matching: A simpler faster algorithm
    Cole, R
    Hariharan, R
    SIAM JOURNAL ON COMPUTING, 2002, 31 (06) : 1761 - 1782
  • [50] Approximate String Matching Algorithm for Phishing Detection
    Abraham, Dona
    Raj, Nisha S.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2285 - 2290