Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems

被引:0
|
作者
Lee, Janghaeng [1 ]
Samadi, Mehrzad [1 ]
Park, Yongjun [1 ]
Mahlke, Scott [1 ]
机构
[1] Univ Michigan, Adv Comp Architecture Lab, Ann Arbor, MI 48109 USA
关键词
GPGPU; OpenCL; Collaboration; Data parallel; EFFICIENT;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. Unfortunately, this work distribution can be a poor solution as it under utilizes the CPU, has difficulty generalizing beyond the single CPU-GPU combination, and may waste a large fraction of time transferring data. Further, CPUs are performance competitive with GPUs on many workloads, thus simply partitioning work based on the fixed roles may be a poor choice. In this paper, we present the single kernel multiple devices (SKMD) system, a framework that transparently orchestrates collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs. The programmer is responsible for developing a single data-parallel kernel in OpenCL, while the system automatically partitions the workload across an arbitrary set of devices, generates kernels to execute the partial workloads, and efficiently merges the partial outputs together. The goal is performance improvement by maximally utilizing all available resources to execute the kernel. SKMD handles the difficult challenges of exposed data transfer costs and the performance variations GPUs have with respect to input size. On real hardware, SKMD achieves an average speedup of 29% on a system with one multicore CPU and two asymmetric GPUs compared to a fastest device execution strategy for a set of popular OpenCL kernels.
引用
收藏
页码:245 / 255
页数:11
相关论文
共 50 条
  • [1] MPtostream:an OpenMP compiler for CPU-GPU heterogeneous parallel systems
    YANG XueJun
    [J]. Science China(Information Sciences), 2012, 55 (09) : 1961 - 1971
  • [2] MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems
    XueJun Yang
    Tao Tang
    GuiBin Wang
    Jia Jia
    XinHai Xu
    [J]. Science China Information Sciences, 2012, 55 : 1961 - 1971
  • [3] MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems
    Yang XueJun
    Tang Tao
    Wang GuiBin
    Jia Jia
    Xu XinHai
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (09) : 1961 - 1971
  • [4] Efficient CPU-GPU Work Sharing for Data-Parallel Java']JavaScript Workloads
    Piao, Xianglan
    Kim, Channoh
    Oh, Younghwan
    Kim, Hanjun
    Lee, Jae W.
    [J]. WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 357 - 358
  • [5] Performance Optimization for CPU-GPU Heterogeneous Parallel System
    Wang, Yanhua
    Qiao, Jianzhong
    Lin, Shukuan
    Zhao, Tinglei
    [J]. 2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1259 - 1266
  • [6] Heterogeneous parallel_for Template for CPU-GPU Chips
    Navarro, Angeles
    Corbera, Francisco
    Rodriguez, Andres
    Vilches, Antonio
    Asenjo, Rafael
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (02) : 213 - 233
  • [7] SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration
    Lee, Janghaeng
    Samadi, Mehrzad
    Park, Yongjun
    Mahlke, Scott
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2015, 33 (03):
  • [8] Performance Analysis of AES on CPU-GPU Heterogeneous Systems
    Sanz, Victoria
    Pousa, Adrian
    Naiouf, Marcelo
    De Giusti, Armando
    [J]. CLOUD COMPUTING, BIG DATA & EMERGING TOPICS, JCC-BD&ET 2022, 2022, 1634 : 31 - 42
  • [9] Parallel Smoothers in Multigrid Method for Heterogeneous CPU-GPU Environment
    Iyer, Neha
    Ganesan, Sashikumaar
    [J]. PARALLEL COMPUTING: TECHNOLOGY TRENDS, 2020, 36 : 114 - 123
  • [10] Efficient Pattern Matching on CPU-GPU Heterogeneous Systems
    Sanz, Victoria
    Pousa, Adrian
    Naiouf, Marcelo
    De Giusti, Armando
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING (ICA3PP 2019), PT I, 2020, 11944 : 391 - 403