Beyond Explicit Transfers: Shared and Managed Memory in OpenMP

被引:0
|
作者
Neth, Brandon [1 ]
Scogland, Thomas R. W. [2 ]
Duran, Alejandro [3 ]
de Supinski, Bronis R. [2 ]
机构
[1] Univ Arizona, Tucson, AZ 85721 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
[3] Intel Corp, Iberia, Spain
关键词
D O I
10.1007/978-3-030-85262-7_13
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
OpenMP began supporting offloading in version 4.0, almost 10 years ago. It introduced the programming model for offload to GPUs or other accelerators that was common at the time, requiring users to explicitly transfer data between host and devices. But advances in heterogeneous computing and programming systems have created a new environment. No longer are programmers required to manage tracking and moving their data on their own. Now, for those who want it, inter-device address mapping and other runtime systems push these data management tasks behind a veil of abstraction. In the context of this progress, OpenMP offloading support shows signs of its age. However, because of its ubiquity as a standard for portable, parallel code, OpenMP is well positioned to provide a similar standard for heterogeneous programming. Towards this goal, we review the features available in other programming systems and argue that OpenMP expand its offloading support to better meet the expectations of modern programmers The first step, detailed here, augments OpenMP's existing memory space abstraction with device awareness and a concept of shared and managed memory. Thus, users can allocate memory accessible to different combinations of devices that do not require explicit memory transfers. We show the potential performance impact of this feature and discuss the possible downsides.
引用
收藏
页码:183 / 194
页数:12
相关论文
共 50 条
  • [21] The Omni OpenMP compiler on the distributed shared memory of Cenju-4
    Kusano, K
    Sato, M
    Hosomi, T
    Seo, Y
    OPENMP SHARED MEMORY PARALLEL PROGRAMMING, PROCEEDINGS, 2001, 2104 : 20 - 30
  • [22] Optimization of OpenMP Offload Shared Memory Access for Domestic Heterogeneous Platforms
    Wang, Xin
    Li, Jianan
    Han, Lin
    Zhao, Rongcai
    Zhou, Qiangwei
    Computer Engineering and Applications, 2023, 59 (10) : 75 - 85
  • [23] Evaluation of SMP Shared Memory Machines for Use With In-Memory and OpenMP Big Data Applications
    Younge, Andrew J.
    Reidy, Christopher
    Henschel, Robert
    Fox, Geoffrey C.
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1597 - 1606
  • [24] Explicit TCP window adaptation in a shared memory architecture
    Aweya, J
    Ouellette, M
    Montuno, DY
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2003, 16 (04) : 337 - 356
  • [25] OpenMP: Parallel programming API for shared memory multiprocessors and on-chip multiprocessors
    Sato, M
    ISSS'02: 15TH INTERNATIONAL SYMPOSIUM ON SYSTEM SYNTHESIS, 2002, : 109 - 111
  • [26] Achieving performance under OpenMP on ccNUMA and software distributed shared memory systems
    Chapman, B
    Bregier, F
    Patil, A
    Prabhakar, A
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2002, 14 (8-9): : 713 - 739
  • [27] The Design of MPI Based Distributed Shared Memory Systems to Support OpenMP on Clusters
    Wong, H'sien J.
    Rendell, A. P.
    2007 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2007, : 231 - 240
  • [28] UPMLIB:: A runtime system for Turning the memory performance of OpenMP programs on scalable shared-memory multiprocessors
    Nikolopoulos, DS
    Papatheodorou, TS
    Polychronopoulos, CD
    Labarta, J
    Ayguadé, E
    LANGUAGES, COMPILERS, AND RUN-TIME SYSTEMS FOR SCALABLE COMPUTERS, 2000, 1915 : 85 - 99
  • [29] Characterization of OpenMP applications on the InfiniBand-based distributed virtual shared memory system
    Park, I
    Kim, SW
    Park, K
    HIGH PERFORMANCE COMPUTING - HIPC 2004, 2004, 3296 : 430 - 439
  • [30] Automatic Performance Analysis of OpenMP Codes on a Scalable Shared Memory System Using Periscope
    Benedict, Shajulin
    Gerndt, Michael
    APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 452 - 462