A Bit-Parallel Algorithm for Sequential Pattern Matching with Wildcards

被引:12
|
作者
Guo, Dan [1 ]
Hong, Xiao-Li [1 ]
Hu, Xue-Gang [1 ]
Gao, Jun [1 ]
Liu, Ying-Ling [1 ,3 ]
Wu, Gong-Qing [1 ]
Wu, Xindong [1 ,2 ]
机构
[1] Hefei Univ Technol, Coll Comp Sci & Informat Engn, Hefei 230009, Peoples R China
[2] Univ Vermont, Dept Comp Sci, Burlington, VT USA
[3] Univ Sci & Technol China, Sch Phys, Hefei 230026, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
bit-parallelism; length contraints; nondeterministic automatons; one-off condition; pattern matching; wildcards;
D O I
10.1080/01969722.2011.600651
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Pattern matching with both gap constraints and the one-off condition is a challenging topic, especially in bioinformatics, information retrieval, and dictionary query. Among the algorithms to solve the problem, the most efficient one is SAIL, which is time consuming, especially when the pattern is long. In addition, existing algorithms based on bit-parallelism cannot handle a pattern that has only one pattern character between successive wildcards and the minimum local length constraints are zero. We propose an algorithm BPBM to handle online sequential pattern matching. In BPBM, an extended bit-parallelism operation is used to accelerate the matching process. An effective transition window mechanism with two nondeterministic finite state automatons (NFAs) is adopted to drop the useless scan window. It identifies gap constraints automatically and just scans once to export occurrences with exact match positions. Theoretical analysis and experimental results show that the BPBM algorithm is more competitive than other peers. It has an absolute advantage on search time complexity. It also has better stability that decreases operation costs with the increasing of the size of sequence alphabet or the length of the pattern. We also study off-line pattern matching. With twice pruning, left-most and right-most, we can increase the matching ratio about 2.08% on average.
引用
收藏
页码:382 / 401
页数:20
相关论文
共 50 条
  • [1] Bit-Parallel Multiple Pattern Matching
    Tuan Tu Tran
    Giraud, Mathieu
    Varre, Jean-Stephane
    [J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT II, 2012, 7204 : 292 - 301
  • [2] A GPU-Based Bit-Parallel Multiple Pattern Matching Algorithm
    Hung, Che-Lun
    Wang, Hsiao-Hsi
    Hsu, Tzu-Hung
    Lin, Chun-Yuan
    [J]. IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 1219 - 1222
  • [3] An Alternative Bit-Parallel Algorithm for Parameterized String Matching
    Prasad, Rajesh
    Agarwal, Suneeta
    [J]. INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 2148 - 2155
  • [4] A Bit-Parallel Exact String Matching Algorithm for Small Alphabet
    Zhang, Guomin
    Zhu, En
    Mao, Ling
    Yin, Ming
    [J]. FRONTIERS IN ALGORITHMICS, PROCEEDINGS, 2009, 5598 : 336 - +
  • [5] BLIM: A New Bit-Parallel Pattern Matching Algorithm Overcoming Computer Word Size Limitation
    Kulekci, M. Oguzhan
    [J]. MATHEMATICS IN COMPUTER SCIENCE, 2010, 3 (04) : 407 - 420
  • [6] A fast bit-parallel algorithm for matching extended regular expressions
    Yamamoto, H
    Miyazaki, T
    [J]. COMPUTING AND COMBINATORICS, PROCEEDINGS, 2003, 2697 : 222 - 231
  • [7] Bit-Parallel Tree Pattern Matching Algorithms for Unordered Labeled Trees
    Yamamoto, Hiroaki
    Takenouchi, Daichi
    [J]. ALGORITHMS AND DATA STRUCTURES, 2009, 5664 : 554 - +
  • [8] Efficient bit-parallel algorithms for (δ, α)-matching
    Fredriksson, Kimmo
    Grabowski, Szymon
    [J]. EXPERIMENTAL ALGORITHMS, PROCEEDINGS, 2006, 4007 : 170 - 181
  • [9] A bit-parallel tree matching algorithm for patterns with horizontal VLDC's
    Tsuji, Hisashi
    Ishino, Akira
    Takeda, Masayuki
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2005, 3772 : 388 - 398
  • [10] Multi-Pattern Matching Algorithm with Wildcards Based on Bit-Parallelism
    Ahmed A.F.Saif
    HU Liang
    CHU Jianfeng
    [J]. Wuhan University Journal of Natural Sciences, 2017, 22 (02) : 178 - 184