Efficient Intranode Communication in GPU-Accelerated Systems

被引：1

作者：

Ji, Feng ^{[1
]}

Aji, Ashwin M. ^{[2
]}

Dinan, James ^{[3
]}

Buntinas, Darius ^{[3
]}

Balaji, Pavan ^{[3
]}

Feng, Wu-chun ^{[2
]}

Ma, Xiaosong ^{[1
,4
]}

机构：

[1] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA

[2] Virginia Tech, Dept Comp Sci, Blacksburg, VA USA

[3] Argonne Natl Lab, Math & Comp Sci Div, Argonne, IL 60439 USA

[4] Oak Ridge Natl Lab, Div Math & Comp Sci, Oak Ridge, TN USA

来源：

2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW) | 2012年

基金：

美国国家科学基金会;

关键词：

IMPLEMENTATION;

D O I：

10.1109/IPDPSW.2012.227

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Current implementations of MPI are unaware of accelerator memory (i.e., GPU device memory) and require programmers to explicitly move data between memory spaces. This approach is inefficient, especially for intranode communication where it can result in several extra copy operations. In this work, we integrate GPU-awareness into a popular MPI runtime system and develop techniques to significantly reduce the cost of intranode communication involving one or more GPUs. Experiment results show an up to 2x increase in bandwidth, resulting in an average of 4.3% improvement to the total execution time of a halo exchange benchmark.

引用

页码：1838 / 1847

页数：10

共 50 条

[21] GPU-accelerated Path Rendering
Kilgard, Mark J.
Bolz, Jeff
ACM TRANSACTIONS ON GRAPHICS, 2012, 31 (06):
[22] GPU-Accelerated Charge Mapping
Sanaullah, Ahmed
Mojumder, Saiful A.
Lewis, Kathleen M.
Herbordt, Martin C.
2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
[23] GPU-accelerated Evolutionary Design of the Complete Exchange Communication on Wormhole Networks
Jaros, Jiri
Tyrala, Radek
GECCO'14: PROCEEDINGS OF THE 2014 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2014, : 1023 - 1030
[24] Empirical Study on the GPU-accelerated HPL Performance: Effects of PCIe Communication
Choi, Jieun
Jeong, Yosang
Kang, Ji-Hoon
Gu, Gibeom
Ryu, Hoon
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 496 - 497
[25] Challenges in GPU-Accelerated Nonlinear Dynamic Analysis for Structural Systems
Simpson, Barbara G.
Zhu, Minjie
Seki, Akiri
Scott, Michael
JOURNAL OF STRUCTURAL ENGINEERING, 2023, 149 (03)
[26] Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster
Ohmura, Junichi
Miyoshi, Takefumi
Irie, Hidetsugu
Yoshinaga, Tsutomu
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (12): : 2319 - 2327
[27] Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems
Anzt, Hartwig
Tomov, Stanimire
Dongarra, Jack
Heuveline, Vincent
EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS, 2013, 7640 : 145 - 154
[28] A New LU Decomposition on Hybrid GPU-Accelerated Multicore Systems
Eduardo Gonzalez, Hector
Carmona, Juan
COMPUTACION Y SISTEMAS, 2013, 17 (03): : 413 - 422
[29] GPU-Accelerated Monte Carlo Simulations of Dense Stellar Systems
Pattabiraman, B.
Umbreit, S.
Liao, W.
Rasio, F.
Kalogera, V.
Choudhary, A.
ADVANCES IN COMPUTATIONAL ASTROPHYSICS: METHODS, TOOLS AND OUTCOMES, 2012, 453 : 329 - 332
[30] Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
Anzt, Hartwig
Tomov, Stanimire
Gates, Mark
Dongarra, Jack
Heuveline, Vincent
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 7 - 16

← 1 2 3 4 5 →