GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

被引：4

作者：

Ino, Fumihiko ^{[1
]}

Nakagawa, Shinta ^{[2
]}

Hagihara, Kenichi ^{[1
]}

机构：

[1] Osaka Univ, Grad Sch Informat Sci & Technol, Suita, Osaka 5650871, Japan

[2] NEC Corp Ltd, Storage Div, Fuchu, Tokyo 1838501, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2013年 / E96D卷 / 12期

关键词：

stream processing; GPGPU; CUDA; task scheduling; GRAPHICS;

D O I：

10.1587/transinf.E96.D.2604

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.

引用

页码：2604 / 2616

页数：13

共 50 条

[31] Multi-GPU Graph Analytics
Pan, Yuechao
Wang, Yangzihao
Wu, Yuduo
Yang, Carl
Owens, John D.
[J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 479 - 490
[32] Simulating cortical networks on heterogeneous multi-GPU systems
Nere, Andrew
Franey, Sean
Hashmi, Atif
Lipasti, Mikko
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (07) : 953 - 971
[33] Efficient Solving of Scan Primitive on Multi-GPU Systems
Dieguez, Adrian P.
Amor, Margarita
Doallo, Ramon
Nukada, Akira
Matsuoka, Satoshi
[J]. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 794 - 803
[34] Performance Optimization of Allreduce Operation for Multi-GPU Systems
Nukada, Akira
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3107 - 3112
[35] Accelerated MR Physics Simulations on multi-GPU systems
Xanthis, Christos G.
Venetis, Ioannis E.
Aletras, Anthony H.
[J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2013,
[36] Efficient breadth first search on multi-GPU systems
Mastrostefano, Enrico
Bernaschi, Massimo
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (09) : 1292 - 1305
[37] Dynamic load balancing on heterogeneous multi-GPU systems
Acosta, Alejandro
Blanco, Vicente
Almeida, Francisco
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (08) : 2591 - 2602
[38] Solving Multiple Tridiagonal Systems on a Multi-GPU Platform
Dieguez, Adrian P.
Amor, Margarita
Doallo, Ramon
[J]. 2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 759 - 763
[39] Tensor Movement Orchestration in Multi-GPU Training Systems
Lin, Shao-Fu
Chen, Yi-Jung
Cheng, Hsiang-Yun
Yang, Chia-Lin
[J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1140 - 1152
[40] Gossip: Efficient Communication Primitives for Multi-GPU Systems
Kobus, Robin
Juenger, Daniel
Hundt, Christian
Schmidt, Bertil
[J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,

← 1 2 3 4 5 →