Simultaneous CPU-GPU Execution of Data Parallel Algorithmic Skeletons

被引:5
|
作者
Wrede, Fabian [1 ]
Ernsting, Steffen [1 ]
机构
[1] Leonardo Campus 3, D-48149 Munster, Germany
关键词
High-level parallel programming; Data parallel algorithmic skeletons; Simultaneous CPU-GPU execution;
D O I
10.1007/s10766-016-0483-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallel programming has become ubiquitous; however, it is still a low-level and error-prone task, especially when accelerators such as GPUs are used. Thus, algorithmic skeletons have been proposed to provide well-defined programming patterns in order to assist programmers and shield them from low-level aspects. As the complexity of problems, and consequently the need for computing capacity, grows, we have directed our research toward simultaneous CPU-GPU execution of data parallel skeletons to achieve a performance gain. GPUs are optimized with respect to throughput and designed for massively parallel computations. Nevertheless, we analyze whether the additional utilization of the CPU for data parallel skeletons in the Muenster Skeleton Library leads to speedups or causes a reduced performance, because of the smaller computational capacity of CPUs compared to GPUs. We present a C implementation based on a static distribution approach. In order to evaluate the implementation, four different benchmarks, including matrix multiplication, N-body simulation, Frobenius norm, and ray tracing, have been conducted. The ratio of CPU and GPU execution has been varied manually to observe the effects of different distributions. The results show that a speedup can be achieved by distributing the execution among CPUs and GPUs. However, both the results and the optimal distribution highly depend on the available hardware and the specific algorithm.
引用
收藏
页码:42 / 61
页数:20
相关论文
共 50 条
  • [1] Simultaneous CPU–GPU Execution of Data Parallel Algorithmic Skeletons
    Fabian Wrede
    Steffen Ernsting
    [J]. International Journal of Parallel Programming, 2018, 46 : 42 - 61
  • [2] Heterogeneous CPU-GPU Execution of Stencil Applications
    Siklosi, Balint
    Reguly, Istvan Z.
    Mudalige, Gihan R.
    [J]. PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018), 2018, : 71 - 80
  • [3] Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS
    Yogatama, Bobbi W.
    Gong, Weiwei
    Yu, Xiangyao
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 2491 - 2503
  • [4] Hybrid CPU-GPU scheduling and execution of tree traversals
    Liu, Jianqiao
    Hegde, Nikhil
    Kulkarni, Milind
    [J]. ACM SIGPLAN NOTICES, 2016, 51 (08) : 385 - 386
  • [5] Simultaneous parallel power flow calculations using hybrid CPU-GPU approach
    Araujo, Igor
    Tadaiesky, Vincent
    Cardoso, Diego
    Fukuyama, Yoshikazu
    Santana, Adamo
    [J]. INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2019, 105 : 229 - 236
  • [6] Parallel Graph Partitioning on a CPU-GPU Architecture
    Goodarzi, Bahareh
    Burtscher, Martin
    Goswami, Dhrubajyoti
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 58 - 66
  • [7] Optimizing tensor contraction expressions for hybrid CPU-GPU execution
    Wenjing Ma
    Sriram Krishnamoorthy
    Oreste Villa
    Karol Kowalski
    Gagan Agrawal
    [J]. Cluster Computing, 2013, 16 : 131 - 155
  • [8] Optimizing tensor contraction expressions for hybrid CPU-GPU execution
    Ma, Wenjing
    Krishnamoorthy, Sriram
    Villa, Oreste
    Kowalski, Karol
    Agrawal, Gagan
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2013, 16 (01): : 131 - 155
  • [9] Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems
    Lee, Janghaeng
    Samadi, Mehrzad
    Park, Yongjun
    Mahlke, Scott
    [J]. 2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 245 - 255
  • [10] CPU-GPU hybrid parallel strategy for cosmological simulations
    Wang, Yueqing
    Dou, Yong
    Guo, Song
    Lei, Yuanwu
    Zou, Dan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (03): : 748 - 765