Multi-GPU systems and Unified Virtual Memory for scientific applications: The case of the NAS multi-zone parallel benchmarks

被引:4
|
作者
Gonzalez, Marc [1 ]
Morancho, Enric [1 ]
机构
[1] Univ Politecn Catalunya BarcelonaTECH, Dept Comp Architecture, Barcelona, Spain
关键词
Multi-GPU; Unified Virtual Memory; Single address space; NAS parallel benchmarks;
D O I
10.1016/j.jpdc.2021.08.001
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPU-based computing systems have become a widely accepted solution for the high-performance-computing (HPC) domain. GPUs have shown highly competitive performance-per-watt ratios and can exploit an astonishing level of parallelism. However, exploiting the peak performance of such devices is a challenge, mainly due to the combination of two essential aspects of multi-GPU execution: memory allocation and work distribution. Memory allocation determines the data mapping to GPUs, and therefore conditions all work distribution schemes and communication phases in the application. Unified Virtual Memory simplifies the codification of memory allocations, but its effects on performance depend on how data is used by the devices and how the devices' driver is going to orchestrate the data transfers across the system. In this paper we present a multi-GPU and Unified Virtual Memory (UM) implementation of the NAS Multi-Zone Parallel Benchmarks which alternate communication and computation phases offering opportunities to overlap these phases. We analyse the programmability and performance effects of the introduction of the UM support. Our experience shows that the programming efforts for introducing UM are similar to those of having a memory allocation per GPU. On an evaluation environment composed of 2 x IBM Power9 8335-GTH and 4 x GPU NVIDIA V100 (Volta), our UM-based parallelization outperforms the manual memory allocation versions by 1.10x to 1.85x. However, these improvements are highly sensitive to the information forwarded to the devices' driver describing the most convenient location for specific memory regions. We analyse these improvements in terms of the relationship between the computational and communication phases of the applications. (C) 2021 The Author(s). Published by Elsevier Inc.
引用
收藏
页码:138 / 150
页数:13
相关论文
共 22 条
  • [1] Multi-GPU Parallelization of the NAS Multi-Zone Parallel Benchmarks
    Gonzalez, Marc
    Morancho, Enric
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (01) : 229 - 241
  • [2] Performance characteristics of the multi-zone NAS parallel benchmarks
    Jin, HQ
    Van der Wijngaart, RF
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2006, 66 (05) : 674 - 685
  • [3] Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memory
    Choi, Sangjin
    Kim, Taeksoo
    Jeong, Jinwoo
    Ausavarungnirun, Rachata
    Jeon, Myeongjae
    Kwon, Youngjin
    Ahn, Jeongseob
    [J]. PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 625 - 638
  • [4] Benchmarking multi-GPU applications on modern multi-GPU integrated systems
    Bernaschi, Massimo
    Agostini, Elena
    Rossetti, Davide
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (14):
  • [5] Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems
    Ernsting, Steffen
    Kuchen, Herbert
    [J]. APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 509 - 518
  • [6] Sphynx: A parallel multi-GPU graph partitioner for distributed-memory systems
    Acer, Seher
    Boman, Erik G.
    Glusa, Christian A.
    Rajamanickam, Sivasankaran
    [J]. PARALLEL COMPUTING, 2021, 106
  • [7] Performance Analysis of Parallel FFT on Large Multi-GPU Systems
    Ayala, Alan
    Tomov, Stan
    Stoyanov, Miroslav
    Haidar, Azzam
    Dongarra, Jack
    [J]. 2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 372 - 381
  • [8] Parallel Algorithm for Landform Attributes Representation on Multicore and Multi-GPU Systems
    Boratto, Murilo
    Alonso, Pedro
    Ramiro, Carla
    Barreto, Marcos
    Coelho, Leandro
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT I, 2012, 7333 : 29 - 43
  • [9] GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems
    Ino, Fumihiko
    Nakagawa, Shinta
    Hagihara, Kenichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (12): : 2604 - 2616
  • [10] Scalable Framework for Mapping Streaming Applications onto Multi-GPU Systems
    Huynh, Huynh Phung
    Hagiescu, Andrei
    Wong, Weng-Fai
    Goh, Rick Siow Mong
    [J]. ACM SIGPLAN NOTICES, 2012, 47 (08) : 1 - 10