GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

被引:4
|
作者
Ino, Fumihiko [1 ]
Nakagawa, Shinta [2 ]
Hagihara, Kenichi [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Suita, Osaka 5650871, Japan
[2] NEC Corp Ltd, Storage Div, Fuchu, Tokyo 1838501, Japan
来源
关键词
stream processing; GPGPU; CUDA; task scheduling; GRAPHICS;
D O I
10.1587/transinf.E96.D.2604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.
引用
收藏
页码:2604 / 2616
页数:13
相关论文
共 50 条
  • [31] Multi-GPU Graph Analytics
    Pan, Yuechao
    Wang, Yangzihao
    Wu, Yuduo
    Yang, Carl
    Owens, John D.
    [J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 479 - 490
  • [32] Simulating cortical networks on heterogeneous multi-GPU systems
    Nere, Andrew
    Franey, Sean
    Hashmi, Atif
    Lipasti, Mikko
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (07) : 953 - 971
  • [33] Efficient Solving of Scan Primitive on Multi-GPU Systems
    Dieguez, Adrian P.
    Amor, Margarita
    Doallo, Ramon
    Nukada, Akira
    Matsuoka, Satoshi
    [J]. 2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 794 - 803
  • [34] Performance Optimization of Allreduce Operation for Multi-GPU Systems
    Nukada, Akira
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3107 - 3112
  • [35] Accelerated MR Physics Simulations on multi-GPU systems
    Xanthis, Christos G.
    Venetis, Ioannis E.
    Aletras, Anthony H.
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2013,
  • [36] Efficient breadth first search on multi-GPU systems
    Mastrostefano, Enrico
    Bernaschi, Massimo
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (09) : 1292 - 1305
  • [37] Dynamic load balancing on heterogeneous multi-GPU systems
    Acosta, Alejandro
    Blanco, Vicente
    Almeida, Francisco
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (08) : 2591 - 2602
  • [38] Solving Multiple Tridiagonal Systems on a Multi-GPU Platform
    Dieguez, Adrian P.
    Amor, Margarita
    Doallo, Ramon
    [J]. 2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 759 - 763
  • [39] Tensor Movement Orchestration in Multi-GPU Training Systems
    Lin, Shao-Fu
    Chen, Yi-Jung
    Cheng, Hsiang-Yun
    Yang, Chia-Lin
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1140 - 1152
  • [40] Gossip: Efficient Communication Primitives for Multi-GPU Systems
    Kobus, Robin
    Juenger, Daniel
    Hundt, Christian
    Schmidt, Bertil
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,