High performance parallelization of Boyer-Moore algorithm on many-core accelerators

被引：2

作者：

Jeong, Yosang ^{[1
]}

Lee, Myungho ^{[1
]}

Nam, Dukyun ^{[2
]}

Kim, Jik-Soo ^{[2
]}

Hwang, Soonwook ^{[2
]}

机构：

[1] Myongji Univ, Dept Comp Sci & Engn, Yongin, Kyungki Do, South Korea

[2] Korea Inst Sci & Technol Informat, Supercomp R&D Ctr, Taejon, South Korea

来源：

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2015年 / 18卷 / 03期

基金：

新加坡国家研究基金会;

关键词：

Boyer-Moore algorithm; Many-core accelerator; Parallelization; Dynamic scheduling; Multithreading; Algorithmic cascading; STANDARD;

D O I：

10.1007/s10586-015-0466-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Boyer-Moore (BM) algorithm is a single pattern string matching algorithm. It is considered as the most efficient string matching algorithm and used in many applications. The algorithm first calculates two string shift rules based on the given pattern string in the preprocessing phase. Using the two shift rules, pattern matching operations are performed against the target input string in the second phase. The string shift rules calculated in the first phase let parts of the target input string be skipped where there are no matches to be found in the second phase. The second phase is a time consuming process and needs to be parallelized in order to realize the high performance string matching. In this paper, we parallelize the BM algorithm on the latest many-core accelerators such as the Intel Xeon Phi and the Nvidia Tesla K20 GPU along with the general-purpose multi-core microprocessors. For the parallel string matching, the target input data is partitioned amongst multiple threads. Data lying on the threads' boundaries is searched redundantly so that the pattern string lying on the boundary between two neighboring threads cannot be missed. The redundant data search overheads increases significantly for a large number of threads. For a fixed target input length, the number of possible matches decreases as the pattern length increases. Furthermore, the positions of the pattern string are spread all over the target data randomly. This leads to the unbalanced workload distribution among threads. We employ the dynamic scheduling and the multithreading techniques to deal with the load balancing issue. We also use the algorithmic cascading technique to maximize the benefit of the multithreading and to reduce the overheads associated with the redundant data search between neighboring threads. Our parallel implementation leads to 17-times speedup on the Xeon Phi and 47-times speedup on the Nvidia Tesla K20 GPU compared with a serial implementation on the host Intel Xeon processor.

引用

页码：1087 / 1098

页数：12

共 50 条

[41] Parallelization and sustainability of distributed genetic algorithms on many-core processors
Sato, Yuji
Sato, Mikiko
INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2014, 7 (01) : 2 - 23
[42] Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters
Al-hayanni, Mohammed A. N.
Shafik, Rishad
Rafiev, Ashur
Xia, Fei
Yakovlev, Alex
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 410 - 417
[43] Design of Direct Communication Facility for Many-Core based Accelerators
Si, Min
Ishikawa, Yutaka
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 924 - 929
[44] A New Method to Obtain the shift-table in Boyer-Moore's String Matching Algorithm
Wang, Yang
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3282 - 3285
[45] Application Deployment Strategies for Spatial Isolation on Many-Core Accelerators
Real, Maria Mendez
Wehner, Philipp
Lapotre, Vianney
Goehringer, Diana
Gogniat, Guy
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (02)
[46] Performance/energy aware task migration algorithm for many-core chips
Afsharpour, Sima
Patooghy, Ahmad
Fazeli, Mahdi
IET COMPUTERS AND DIGITAL TECHNIQUES, 2016, 10 (04): : 165 - 173
[47] Parallelization and fault-tolerance of evolutionary computation on many-core processors
Sato, Yuji
Sato, Mikiko
2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 2602 - 2609
[48] A Nanophotonic Interconnect for High-Performance Many-Core Computation
Beausoleil, R. G.
Fiorentino, M.
Ahn, J.
Binkert, N.
Davis, A.
Fattal, D.
Jouppi, N. P.
McLaren, M.
Santori, C. M.
Schreiber, R. S.
Spillane, S. M.
Vantrease, D.
Xu, Q.
2008 5TH IEEE INTERNATIONAL CONFERENCE ON GROUP IV PHOTONICS, 2008, : 365 - 367
[49] A nanophotonic interconnect for high-performance many-core computation
Beausoleil, R. G.
Ahn, J.
Binkert, N.
Davis, A.
Fattal, D.
Fiorentino, M.
Jouppi, N. P.
McLaren, M.
Santori, C. M.
Schreiber, R. S.
Spillane, S. M.
Vantrease, D.
Xu, Q.
16TH ANNUAL IEEE SYMPOSIUM ON HIGH-PERFORMANCE INTERCONNECTS, PROCEEDINGS, 2008, : 182 - 189
[50] A Compressive Sensing Algorithm for Many-Core Architectures
Borghi, A.
Darbon, J.
Peyronnet, S.
Chan, T. F.
Osher, S.
ADVANCES IN VISUAL COMPUTING, PT II, 2010, 6454 : 678 - 686

← 1 2 3 4 5 →