PHAST Library - Enabling Single-source and High Performance Code for GPUs and Multi-cores

被引：3

作者：

Peccerillo, Biagio ^{[1
]}

Bartolini, Sandro ^{[1
]}

机构：

[1] Univ Siena, Dept Informat Engn & Math Sci, Via Roma 56, I-53100 Siena, Italy

来源：

2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS) | 2017年

关键词：

D O I：

10.1109/HPCS.2017.109

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.

引用

页码：715 / 718

页数：4

共 27 条

[1] Parallel bitsliced AES through PHAST: a single-source high-performance library for multi-cores and GPUs
Peccerillo, Biagio
Bartolini, Sandro
Koc, Cetin Kaya
[J]. JOURNAL OF CRYPTOGRAPHIC ENGINEERING, 2019, 9 (02) : 159 - 171
[2] Parallel bitsliced AES through PHAST: a single-source high-performance library for multi-cores and GPUs
Biagio Peccerillo
Sandro Bartolini
Çetin Kaya Koç
[J]. Journal of Cryptographic Engineering, 2019, 9 : 159 - 171
[3] PHAST-A Portable High-Level Modern C plus plus Programming Library for GPUs and Multi-Cores
Peccerillo, Biagio
Bartolini, Sandro
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (01) : 174 - 189
[4] Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures
Peccerillo, Biagio
Bartolini, Sandro
[J]. PROCEEDINGS OF THE TENTH INTERNATIONAL WORKSHOP ON PROGRAMMING MODELS AND APPLICATIONS FOR MULTICORES AND MANYCORES (PMAM 2019), 2019, : 91 - 100
[5] Single-source Library for Enabling Seamless Assignment of Data-parallel Task-DAGs to CPUs and GPUs in Heterogeneous Architectures
Peccerillo, Biagio
Bartolini, Sandro
[J]. PROCEEDINGS 10TH WORKSHOP ON PARALLEL PROGRAMMING AND RUN-TIME MANAGEMENT TECHNIQUES FOR MANY-CORE ARCHITECTURES: 8TH WORKSHOP ON DESIGN TOOLS AND ARCHITECTURES FOR MULTICORE EMBEDDED COMPUTING PLATFORMS (PARMA-DITAM 2019), 2019,
[6] AGORA: A Dependable High-Performance Coordination Service for Multi-Cores
Schiekofer, Rainer
Behl, Johannes
Distler, Tobias
[J]. 2017 47TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2017, : 333 - 344
[7] Creating an Easy to Use and High Performance Parallel Platform on Multi-cores Networks
Viet Hai Ha
Xuan Huyen Do
Van Long Tran
Renault, Eric
[J]. MOBILE, SECURE, AND PROGRAMMABLE NETWORKING (MSPN 2016), 2016, 10026 : 197 - 207
[8] Towards high-performance packet processing on commodity multi-cores: current issues and future directions
Tang Lu
Yan JinLi
Sun ZhiGang
Li Tao
Zhang MinXuan
[J]. SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 16
[9] Towards high-performance packet processing on commodity multi-cores: current issues and future directions
TANG Lu
YAN JinLi
SUN ZhiGang
LI Tao
ZHANG MinXuan
[J]. Science China(Information Sciences), 2015, 58 (12) : 28 - 43
[10] Speculative-Aware Execution: A Simple and Efficient Technique for Utilizing Multi-Cores to Improve Single-Thread Performance
Mameesh, Rania H.
Franklin, Manoj
[J]. PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, : 421 - 430

← 1 2 3 →