PHAST Library - Enabling Single-source and High Performance Code for GPUs and Multi-cores

被引:3
|
作者
Peccerillo, Biagio [1 ]
Bartolini, Sandro [1 ]
机构
[1] Univ Siena, Dept Informat Engn & Math Sci, Via Roma 56, I-53100 Siena, Italy
关键词
D O I
10.1109/HPCS.2017.109
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.
引用
收藏
页码:715 / 718
页数:4
相关论文
共 27 条
  • [1] Parallel bitsliced AES through PHAST: a single-source high-performance library for multi-cores and GPUs
    Peccerillo, Biagio
    Bartolini, Sandro
    Koc, Cetin Kaya
    [J]. JOURNAL OF CRYPTOGRAPHIC ENGINEERING, 2019, 9 (02) : 159 - 171
  • [2] Parallel bitsliced AES through PHAST: a single-source high-performance library for multi-cores and GPUs
    Biagio Peccerillo
    Sandro Bartolini
    Çetin Kaya Koç
    [J]. Journal of Cryptographic Engineering, 2019, 9 : 159 - 171
  • [3] PHAST-A Portable High-Level Modern C plus plus Programming Library for GPUs and Multi-Cores
    Peccerillo, Biagio
    Bartolini, Sandro
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (01) : 174 - 189
  • [4] Task-DAG Support in Single-Source PHAST Library: Enabling Flexible Assignment of Tasks to CPUs and GPUs in Heterogeneous Architectures
    Peccerillo, Biagio
    Bartolini, Sandro
    [J]. PROCEEDINGS OF THE TENTH INTERNATIONAL WORKSHOP ON PROGRAMMING MODELS AND APPLICATIONS FOR MULTICORES AND MANYCORES (PMAM 2019), 2019, : 91 - 100
  • [5] Single-source Library for Enabling Seamless Assignment of Data-parallel Task-DAGs to CPUs and GPUs in Heterogeneous Architectures
    Peccerillo, Biagio
    Bartolini, Sandro
    [J]. PROCEEDINGS 10TH WORKSHOP ON PARALLEL PROGRAMMING AND RUN-TIME MANAGEMENT TECHNIQUES FOR MANY-CORE ARCHITECTURES: 8TH WORKSHOP ON DESIGN TOOLS AND ARCHITECTURES FOR MULTICORE EMBEDDED COMPUTING PLATFORMS (PARMA-DITAM 2019), 2019,
  • [6] AGORA: A Dependable High-Performance Coordination Service for Multi-Cores
    Schiekofer, Rainer
    Behl, Johannes
    Distler, Tobias
    [J]. 2017 47TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2017, : 333 - 344
  • [7] Creating an Easy to Use and High Performance Parallel Platform on Multi-cores Networks
    Viet Hai Ha
    Xuan Huyen Do
    Van Long Tran
    Renault, Eric
    [J]. MOBILE, SECURE, AND PROGRAMMABLE NETWORKING (MSPN 2016), 2016, 10026 : 197 - 207
  • [8] Towards high-performance packet processing on commodity multi-cores: current issues and future directions
    Tang Lu
    Yan JinLi
    Sun ZhiGang
    Li Tao
    Zhang MinXuan
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 16
  • [9] Towards high-performance packet processing on commodity multi-cores: current issues and future directions
    TANG Lu
    YAN JinLi
    SUN ZhiGang
    LI Tao
    ZHANG MinXuan
    [J]. Science China(Information Sciences), 2015, 58 (12) : 28 - 43
  • [10] Speculative-Aware Execution: A Simple and Efficient Technique for Utilizing Multi-Cores to Improve Single-Thread Performance
    Mameesh, Rania H.
    Franklin, Manoj
    [J]. PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, : 421 - 430