A bit-parallel approach to suffix automata: Fast extended string matching

被引:0
|
作者
Navarro, G
Raffinot, M
机构
[1] Univ Chile, Dept Comp Sci, Santiago, Chile
[2] Inst Gaspard Monge, F-77454 Marne La Vallee 2, France
来源
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a new algorithm for string matching. The algorithm, called BNDM, is the bit-parallel simulation of a known (but recent) algorithm called BDM. BDM skips characters using a " suffix automaton " which is made deterministic in the preprocessing. BNDM, instead, simulates the nondeterministic version using bit-parallelism. This algorithm is 20%-25% faster than BDM, 2-3 times faster than other bit-parallel algorithms, and 10%-40% faster than all the Boyer-Moore family. This makes it the fastest algorithm in all cases except for very short or very long patterns (e.g. on English text it is the fastest between 5 and 110 characters). Moreover, the algorithm is very simple, allowing to easily implement other variants of BDM which are extremely complex in their original formulation. We show that, as other bit-parallel algorithms, BNDM can be extended to handle classes of characters in the pattern and in the text, multiple patterns and to allow errors in the pattern or in the text, combining simplicity, efficiency and flexibility. We also generalize the suffix automaton definition to handle classes of characters. To the best of our knowledge, this extension has not been studied before.
引用
收藏
页码:14 / 33
页数:20
相关论文
共 50 条
  • [1] Bit-parallel (delta, gamma)-matching and suffix automata
    Crochemore, Maxime
    Iliopoulos, Costas S.
    Navarro, Gonzalo
    Pinzon, Yoan J.
    Salinger, Alejandro
    JOURNAL OF DISCRETE ALGORITHMS, 2005, 3 (2-4) : 198 - 214
  • [2] A Compact Representation of Nondeterministic (Suffix) Automata for the Bit-Parallel Approach
    Cantone, Domenico
    Faro, Simone
    Giaquinta, Emanuele
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2010, 6129 : 288 - 298
  • [3] A compact representation of nondeterministic (suffix) automata for the bit-parallel approach
    Cantone, Domenico
    Faro, Simone
    Giaquinta, Emanuele
    INFORMATION AND COMPUTATION, 2012, 213 : 3 - 12
  • [4] A bit-parallel suffix automaton approach for (δ, γ)-matching in music retrieval
    Crochemore, M
    Iliopoulos, CS
    Navarro, G
    Pinzon, YJ
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2003, 2857 : 211 - 223
  • [5] A bit-parallel suffix automaton approach for (δ,γ)-matching in music retrieval
    Crochemore, Maxime
    Iliopoulos, Costas S.
    Navarro, Gonzalo
    Pinzon, Yoan J.
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003, 2857 : 211 - 223
  • [6] A fast bit-parallel algorithm for matching extended regular expressions
    Yamamoto, H
    Miyazaki, T
    COMPUTING AND COMBINATORICS, PROCEEDINGS, 2003, 2697 : 222 - 231
  • [7] Bit-parallel approach to approximate string matching in compressed texts
    Matsumoto, T
    Kida, T
    Takeda, M
    Shinohara, A
    Arikawa, S
    SPIRE 2000: SEVENTH INTERNATIONAL SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL - PROCEEDINGS, 2000, : 221 - 228
  • [8] Nested Counters in Bit-Parallel String Matching
    Fredriksson, Kimmo
    Grabowski, Szymon
    LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, 2009, 5457 : 338 - +
  • [9] Alternative algorithms for bit-parallel string matching
    Peltola, H
    Tarhio, J
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2003, 2857 : 80 - 94
  • [10] Faster bit-parallel approximate string matching
    Hyyrö, H
    Navarro, G
    COMBINATORIAL PATTERN MATCHING, 2002, 2373 : 203 - 224