Accelerating communication for parallel programming models on GPU systems

被引:2
|
作者
Choi, Jaemin [1 ]
Fink, Zane [1 ]
White, Sam [1 ]
Bhat, Nitin [2 ]
Richards, David F. [3 ]
V. Kale, Laxmikant [1 ,2 ]
机构
[1] Univ Illinois, Dept Comp Sci, Champaign, IL 61820 USA
[2] Charmworks Inc, Urbana, IL USA
[3] Lawrence Livermore Natl Lab, Ctr Appl Sci Comp, Livermore, CA USA
关键词
GPU-aware communication; UCX; Charm plus plus; AMPI; CUDA-aware MPI; !text type='Python']Python[!/text; Charm4py;
D O I
10.1016/j.parco.2022.102969
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication of GPU data is becoming one of the most critical components of high-performance computing. For developers of parallel programming models, implementing support for GPU-aware communication using native APIs for GPUs such as CUDA can be a daunting task as it requires considerable effort with little guarantee of performance. In this work, we demonstrate the capability of the Unified Communication X (UCX) framework to compose a GPU-aware communication layer that serves multiple parallel programming models of the Charm++ ecosystem: Charm++, Adaptive MPI (AMPI), and Charm4py. We demonstrate the performance impact of our designs with microbenchmarks adapted from the OSU benchmark suite, obtaining improvements in latency of up to 10.1x in Charm++, 11.7x in AMPI, and 17.4x in Charm4py. We also observe increases in bandwidth of up to 10.1x in Charm++, 10x in AMPI, and 10.5x in Charm4py. We show the potential impact of our designs on real-world applications by evaluating a proxy application for the Jacobi iterative method, improving the communication performance by up to 12.4x in Charm++, 12.8x in AMPI, and 19.7x in Charm4py.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Accelerating GPU Message Communication for Autonomous Navigation Systems
    Wu, Hao
    Jin, Jiangming
    Zhai, Jidong
    Gong, Yifan
    Liu, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 181 - 191
  • [2] Models and parallel programming abstractions for synchronization and communication
    Doroshenko, AE
    Godlevsky, AB
    Hluchy, L
    [J]. COMPUTERS AND ARTIFICIAL INTELLIGENCE, 1999, 18 (04): : 361 - 381
  • [3] Programming models for parallel and distributed systems
    Scott, ML
    [J]. ACM SIGPLAN NOTICES, 2002, 37 (10) : 235 - 235
  • [4] GPU-aware Communication with UCX in Parallel Programming Models: Charm plus plus , MPI, and Python']Python
    Choi, Jaemin
    Fink, Zane
    White, Sam
    Bhat, Nitin
    Richards, David F.
    Kale, Laxmikant, V
    [J]. 2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 479 - 488
  • [5] Hybrid Communication with TCA and InfiniB and on A Parallel Programming Language XcalableACC for GPU Clusters
    Odajima, Tetsuya
    Boku, Taisuke
    Hanawa, Toshihiro
    Murai, Hitoshi
    Nakao, Masahiro
    Tabuchi, Akihiro
    Sato, Mitsuhisa
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 627 - 634
  • [6] Membrane Models in Big Data Process on GPU-accelerating Systems
    Zhang, Yuanhan
    Ji, Zhenzhou
    [J]. PROCEEDINGS OF 2016 SIXTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION & MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC 2016), 2016, : 422 - 425
  • [7] On parallel random number generation for accelerating simulations of communication systems
    Brugger, C.
    Weithoffer, S.
    de Schryver, C.
    Wasenmueller, U.
    Wehn, N.
    [J]. ADVANCES IN RADIO SCIENCE, 2014, 12 : 75 - 81
  • [8] Cluster communication protocols for parallel-programming systems
    Verstoep, K
    Bhoedjang, RAF
    Rühl, T
    Bal, HE
    Hofman, RFH
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2004, 22 (03): : 281 - 325
  • [9] Accelerating Generation of Stochastic Cyclone Routes with GPU Programming
    Chen, Yiran
    Huang, Zhou
    [J]. 2015 23RD INTERNATIONAL CONFERENCE ON GEOINFORMATICS, 2015,
  • [10] Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
    Teodoro, George
    Kurc, Tahsin M.
    Pan, Tony
    Cooper, Lee A. D.
    Kong, Jun
    Widener, Patrick
    Saltz, Joel H.
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 1093 - 1104