A bit-parallel approach to suffix automata: Fast extended string matching

被引:0
|
作者
Navarro, G
Raffinot, M
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
[2] Inst Gaspard Monge, F-77454 Marne La Vallee 2, France
来源
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a new algorithm for string matching. The algorithm, called BNDM, is the bit-parallel simulation of a known (but recent) algorithm called BDM. BDM skips characters using a " suffix automaton " which is made deterministic in the preprocessing. BNDM, instead, simulates the nondeterministic version using bit-parallelism. This algorithm is 20%-25% faster than BDM, 2-3 times faster than other bit-parallel algorithms, and 10%-40% faster than all the Boyer-Moore family. This makes it the fastest algorithm in all cases except for very short or very long patterns (e.g. on English text it is the fastest between 5 and 110 characters). Moreover, the algorithm is very simple, allowing to easily implement other variants of BDM which are extremely complex in their original formulation. We show that, as other bit-parallel algorithms, BNDM can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in the pattern or in the text, combining simplicity, efficiency and flexibility. We also generalize the suffix automaton definition to handle classes of characters. To the best of our knowledge, this extension has not been studied before.
引用
收藏
页码:14 / 33
页数:20
相关论文
共 50 条
  • [31] A weak approach to suffix automata simulation for exact and approximate string matching
    Faro, Simone
    Scafiti, Stefano
    Theoretical Computer Science, 2022, 933 : 88 - 103
  • [32] Efficient bit-parallel algorithms for (δ, α)-matching
    Fredriksson, Kimmo
    Grabowski, Szymon
    EXPERIMENTAL ALGORITHMS, PROCEEDINGS, 2006, 4007 : 170 - 181
  • [33] Bit-Parallel Multiple Pattern Matching
    Tuan Tu Tran
    Giraud, Mathieu
    Varre, Jean-Stephane
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT II, 2012, 7204 : 292 - 301
  • [34] Row-wise tiling for the Myers' bit-parallel approximate string matching algorithm
    Fredriksson, K
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2003, 2857 : 66 - 79
  • [35] Improving the bit-parallel NFA of Baeza-Yates and Navarro for approximate string matching
    Hyyro, Heikki
    INFORMATION PROCESSING LETTERS, 2008, 108 (05) : 313 - 319
  • [36] Bit-parallel algorithms for computing all the runs in a string
    Hirashima, Kazunori
    Bannai, Hideo
    Matsubara, Wataru
    Ishino, Akira
    Shinohara, Ayumi
    PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2009, 2009, : 203 - 213
  • [37] A Bit-Parallel Algorithm for Sequential Pattern Matching with Wildcards
    Guo, Dan
    Hong, Xiao-Li
    Hu, Xue-Gang
    Gao, Jun
    Liu, Ying-Ling
    Wu, Gong-Qing
    Wu, Xindong
    CYBERNETICS AND SYSTEMS, 2011, 42 (06) : 382 - 401
  • [38] Design of bit-parallel multiplier based on programmable cellular automata
    Jeon, JC
    Yoo, KY
    PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2003, : 94 - 98
  • [39] Bit-parallel string matching under Hamming distance in O(n[m/w]) worst case time
    Grabowski, Szymon
    Fredriksson, Kimmo
    INFORMATION PROCESSING LETTERS, 2008, 105 (05) : 182 - 187
  • [40] Fast approximate string matching with finite automata
    Hulden, Mans
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (43): : 57 - 64