High performance parallelization of Boyer-Moore algorithm on many-core accelerators

被引:2
|
作者
Jeong, Yosang [1 ]
Lee, Myungho [1 ]
Nam, Dukyun [2 ]
Kim, Jik-Soo [2 ]
Hwang, Soonwook [2 ]
机构
[1] Myongji Univ, Dept Comp Sci & Engn, Yongin, Kyungki Do, South Korea
[2] Korea Inst Sci & Technol Informat, Supercomp R&D Ctr, Taejon, South Korea
基金
新加坡国家研究基金会;
关键词
Boyer-Moore algorithm; Many-core accelerator; Parallelization; Dynamic scheduling; Multithreading; Algorithmic cascading; STANDARD;
D O I
10.1007/s10586-015-0466-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Boyer-Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. Using the two shift rules, pattern matching operations are performed against the target input string in the second phase. The string shift rules calculated in the first phase let parts of the target input string be skipped where there are no matches to be found in the second phase. The second phase is a time consuming process and needs to be parallelized in order to realize the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many-core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU along with the general-purpose multi-core microprocessors. For the parallel string matching, the target input data is partitioned amongst multiple threads. Data lying on the threads' boundaries is searched redundantly so that the pattern string lying on the boundary between two neighboring threads cannot be missed. The redundant data search overheads increases significantly for a large number of threads. For a fixed target input length, the number of possible matches decreases as the pattern length increases. Furthermore, the positions of the pattern string are spread all over the target data randomly. This leads to the unbalanced workload distribution among threads. We employ the dynamic scheduling and the multithreading techniques to deal with the load balancing issue. We also use the algorithmic cascading technique to maximize the benefit of the multithreading and to reduce the overheads associated with the redundant data search between neighboring threads. Our parallel implementation leads to 17-times speedup on the Xeon Phi and 47-times speedup on the Nvidia Tesla K20 GPU compared with a serial implementation on the host Intel Xeon processor.
引用
收藏
页码:1087 / 1098
页数:12
相关论文
共 50 条
  • [31] Fast-search: A new efficient variant of the Boyer-Moore string matching algorithm
    Cantone, D
    Faro, S
    EXPERIMENTAL AND EFFICIENCT ALGORITHMS, PROCEEDINGS, 2003, 2647 : 47 - 58
  • [32] Boyer-Moore Horspool Algorithm Used in Content Management System of Data Fast Searching
    Hoong, Chan Chung
    Ameedeen, Mohamed Ariff
    ADVANCED SCIENCE LETTERS, 2017, 23 (11) : 11387 - 11390
  • [33] Programmable SoC platform for Deep Packet Inspection using enhanced Boyer-Moore algorithm
    Dominguez, Adrian
    Carballo, Pedro P.
    Nunez, Antonio
    2017 12TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC), 2017,
  • [34] A Method for Web Application Vulnerabilities Detection by Using Boyer-Moore String Matching Algorithm
    Saleh, Ain Zubaidah Mohd
    Rozali, Nur Amizah
    Buja, Alya Geogiana
    Jalil, Kamarularifin Abd.
    Ali, Fakariah Hani Mohd
    Rahman, Teh Faradilla Abdul
    THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 112 - 121
  • [35] Correctness of sub string-preprocessing in Boyer-Moore's pattern matching algorithm
    Stomp, F
    THEORETICAL COMPUTER SCIENCE, 2003, 290 (01) : 59 - 78
  • [36] Mechanization of a proof of string-preprocessing in Boyer-Moore's pattern matching algorithm
    Besta, M
    Stomp, F
    EIGHTH IEEE INTERNATIONAL CONFERENCE ON ENGINEERING OF COMPLEX COMPUTER SYSTEMS, PROCEEDINGS, 2002, : 68 - 77
  • [37] Parallel DC3 Algorithm for Suffix Array Construction on Many-core Accelerators
    Liao, Gang
    Ma, Longfei
    Zang, Guangming
    Tang, Lin
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1155 - 1158
  • [38] Auto-Tuning Dedispersion for Many-Core Accelerators
    Sclocco, Alessio
    Bal, Henri E.
    Hessels, Jason
    van Leeuwen, Joeri
    van Nieuwpoort, Rob V.
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [39] On the Parallelization of Subproduct Tree Techniques Targeting Many-Core Architectures
    Haque, Sardar Anisul
    Mansouri, Farnam
    Maza, Marc Moreno
    COMPUTER ALGEBRA IN SCIENTIFIC COMPUTING, CASC 2014, 2014, 8660 : 171 - 185
  • [40] Efficient Parallelization of a Genetic Algorithm Solution on the Traveling Salesman Problem with Multi-core and Many-core Systems
    Abbasi, M.
    Rafiee, M.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2020, 33 (07): : 1257 - 1265