Energy-efficient Stencil Computations on Distributed GPUs using Dynamic Parallelism and GPU-controlled Communication

被引:1
|
作者
Oden, Lena [1 ]
Klenk, Benjamin [2 ]
Froening, Holger [2 ]
机构
[1] Fraunhofer Inst Ind Math, Competence Ctr High Perfomance Comp, Kaiserslautern, Germany
[2] Heidelberg Univ, Inst Comp Engn, Heidelberg, Germany
关键词
GPUs; Energy Efficient; Dynamic Parallelism; Communication; Data Transfer; Infiniband;
D O I
10.1109/E2SC.2014.14
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate the computation while the CPU is responsible for the communication. This approach always requires a dedicated CPU thread, which consumes additional CPU cycles and therefore increases the power consumption of the complete application. In recent work we have shown that the GPU is able to control the communication independently of the CPU. Still, there are several problems with GPU-controlled communication. The main problem is intra-GPU synchronization, since GPU blocks are non-preemptive. Therefore, the use of communication requests within a GPU can easily result in a deadlock. In this work we show how Dynamic Parallelism solves this problem. GPU-controlled communication in combination with Dynamic Parallelism allows keeping the control flow of multi-GPU applications on the GPU and bypassing the CPU completely. Although the performance of applications using GPU-controlled communication is still slightly worse than the performance of hybrid applications, we will show that performance per Watt increases by up to 10% while still using commodity hardware.
引用
收藏
页码:31 / 40
页数:10
相关论文
共 50 条
  • [1] Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy
    Oden, Lena
    Klenk, Benjamin
    Froening, Holger
    [J]. PARALLEL COMPUTING, 2016, 57 : 125 - 134
  • [2] EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
    Manuel de Castro
    Inmaculada Santamaria-Valenzuela
    Yuri Torres
    Arturo Gonzalez-Escribano
    Diego R. Llanos
    [J]. The Journal of Supercomputing, 2023, 79 : 9409 - 9442
  • [3] EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
    de Castro, Manuel
    Santamaria-Valenzuela, Inmaculada
    Torres, Yuri
    Gonzalez-Escribano, Arturo
    Llanos, Diego R.
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (09): : 9409 - 9442
  • [4] Efficient synchronization for stencil computations using dynamic task graphs
    Bhatti, Zubair Wadood
    Wuyts, Roel
    Costanza, Pascal
    Preuveneers, Davy
    Berbers, Yolande
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 2428 - 2431
  • [5] Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs
    Oden, Lena
    Klenk, Benjamin
    Froening, Holger
    [J]. 2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 483 - 492
  • [6] Energy-Efficient Communication in Distributed, Embedded Systems
    Vodel, Matthias
    Hardt, Wolfram
    [J]. 2013 11TH INTERNATIONAL SYMPOSIUM ON MODELING & OPTIMIZATION IN MOBILE, AD HOC & WIRELESS NETWORKS (WIOPT), 2013, : 641 - 647
  • [7] USING INTRADISK PARALLELISM TO BUILD ENERGY-EFFICIENT STORAGE SYSTEMS
    Gurumurthi, Sudhanva
    Stan, Mircea R.
    Sankar, Sriram
    [J]. IEEE MICRO, 2009, 29 (01) : 50 - 61
  • [8] Dynamic sampling rate: harnessing frame coherence in graphics applications for energy-efficient GPUs
    Anglada, Marti
    de Lucas, Enrique
    Parcerisa, Joan-Manuel
    Aragon, Juan L.
    Gonzalez, Antonio
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (13): : 14940 - 14964
  • [9] Dynamic sampling rate: harnessing frame coherence in graphics applications for energy-efficient GPUs
    Martí Anglada
    Enrique de Lucas
    Joan-Manuel Parcerisa
    Juan L. Aragón
    Antonio González
    [J]. The Journal of Supercomputing, 2022, 78 : 14940 - 14964
  • [10] Energy-efficient data dissemination in sensor networks using distributed dynamic tree management
    Hwang, Kwang-Il
    Eom, Doo-seop
    [J]. AD-HOC, MOBILE, AND WIRELESS NETWORKS, PROCEEDINGS, 2006, 4104 : 32 - 45