dOCAL: high-level distributed programming with OpenCL and CUDA

被引：6

作者：

Rasch, Ari ^{[1
]}

Bigge, Julian ^{[1
]}

Wrodarczyk, Martin ^{[1
]}

Schulze, Richard ^{[1
]}

Gorlatch, Sergei ^{[1
]}

机构：

[1] Univ Munster, Dept Math & Comp Sci, Munster, Germany

来源：

JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 07期

关键词：

OpenCL; CUDA; Host code; Distributed system; Heterogenous system; Interoperability; Data transfer optimization;

D O I：

10.1007/s11227-019-02829-2

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the state-of-the-art parallel programming approaches OpenCL and CUDA, so-called host code is required for program's execution. Efficiently implementing host code is often a cumbersome task, especially when executing OpenCL and CUDA programs on systems with multiple nodes, each comprising different devices, e.g., multi-core CPU and graphics processing units; the programmer is responsible for explicitly managing node's and device's memory, synchronizing computations with data transfers between devices of potentially different nodes and for optimizing data transfers between devices' memories and nodes' main memories, e.g., by using pinned main memory for accelerating data transfers and overlapping the transfers with computations. We develop distributed OpenCL/CUDA abstraction layer (dOCAL)-a novel high-level C++ library that simplifies the development of host code. dOCAL combines major advantages over the state-of-the-art high-level approaches: (1) it simplifies implementing both OpenCL and CUDA host code by providing a simple-to-use, high-level abstraction API; (2) it supports executing arbitrary OpenCL and CUDA programs; (3) it allows conveniently targeting the devices of different nodes by automatically managing node-to-node communications; (4) it simplifies implementing data transfer optimizations by providing different, specially allocated memory regions, e.g., pinned main memory for overlapping data transfers with computations; (5) it optimizes memory management by automatically avoiding unnecessary data transfers; (6) it enables interoperability between OpenCL and CUDA host code for systems with devices from different vendors. Our experiments show that dOCAL significantly simplifies the development of host code for heterogeneous and distributed systems, with a low runtime overhead.

引用

页码：5117 / 5138

页数：22

共 50 条

[31] High-Level Synthesis of Multiple Dependent CUDA Kernels on FPGA
Gurumani, Swathi T.
Cholakkal, Hisham
Liang, Yun
Rupnow, Kyle
Chen, Deming
2013 18TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2013, : 305 - 312
[32] A HIGH-LEVEL APPROACH TO PROGRAMMING A ROBOT
WANG, NS
DAVIES, BJ
INTERNATIONAL JOURNAL OF MACHINE TOOLS & MANUFACTURE, 1987, 27 (01): : 57 - 63
[33] hiCUDA: High-Level GPGPU Programming
Han, Tianyi David
Abdelrahman, Tarek S.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (01) : 78 - 90
[34] WHAT ARE HIGH-LEVEL PROGRAMMING LANGUAGES
FORBES, JM
INDUSTRIAL ELECTRONICS, 1967, 5 (07): : 312 - &
[35] High-level mathematical modeling and programming
Linkoping Univ
IEEE Software, 4 (77-87):
[36] VERY HIGH-LEVEL CONCURRENT PROGRAMMING
SHI, Y
PRYWES, N
SZYMANSKI, B
PNUELI, A
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, 13 (09) : 1038 - 1046
[37] A high-level programming paradigm for SystemC
Thompson, M
Pimentel, AD
COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, 2004, 3133 : 530 - 539
[38] High-level database programming in curry
Brassel, Bernd
Hanus, Michael
Mueller, Marion
PRACTICAL ASPECTS OF DECLARATIVE LANGUAGES, PROCEEDINGS, 2008, 4902 : 316 - 332
[39] Efficient high-level parallel programming
Botorog, GH
Kuchen, H
THEORETICAL COMPUTER SCIENCE, 1998, 196 (1-2) : 71 - 107
[40] SkelCL: a high-level extension of OpenCL for multi-GPU systems
Steuwer, Michel
Gorlatch, Sergei
JOURNAL OF SUPERCOMPUTING, 2014, 69 (01): : 25 - 33

← 1 2 3 4 5 →