Offload Annotations: Bringing Heterogeneous Computing to Existing Libraries and Workloads

被引:0
|
作者
Yuan, Gina [1 ]
Palkar, Shoumik [1 ]
Narayanan, Deepak [1 ]
Zaharia, Matei [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As specialized hardware accelerators such as GPUs become increasingly popular, developers are looking for ways to target these platforms with high-level APIs. One promising approach is kernel libraries such as PyTorch or cuML, which provide interfaces that mirror CPU-only counterparts such as NumPy or Scikit-Learn. Unfortunately, these libraries are hard to develop and to adopt incrementally: they only support a subset of their CPU equivalents, only work with datasets that fit in device memory, and require developers to reason about data placement and transfers manually. To address these shortcomings, we present a new approach called offload annotations (OAs) that enables heterogeneous GPU computing in existing workloads with few or no code modifications. An annotator annotates the types and functions in a CPU library with equivalent kernel library functions and provides an offloading API to specify how the inputs and outputs of the function can be partitioned into chunks that fit in device memory and transferred between devices. A runtime then maps existing CPU functions to equivalent GPU kernels and schedules execution, data transfers and paging. In data science workloads using CPU libraries such as NumPy and Pandas, OAs enable speedups of up to 1200x and a median speedup of 6.3x by transparently offloading functions to a GPU using existing kernel libraries. In many cases, OAs match the performance of handwritten heterogeneous implementations. Finally, OAs can automatically page data in these workloads to scale to datasets larger than GPU memory, which would need to be done manually with most current GPU libraries.
引用
收藏
页码:293 / 306
页数:14
相关论文
共 8 条
  • [1] Performance Benefits of Heterogeneous Computing in HPC Workloads
    Lee, Victor W.
    Grochowski, Ed
    Geva, Robert
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 16 - 26
  • [2] Optimizing Data-Intensive Computations in Existing Libraries with Split Annotations
    Palkar, Shoumik
    Zaharia, Matei
    PROCEEDINGS OF THE TWENTY-SEVENTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '19), 2019, : 291 - 305
  • [3] KEENELAND: BRINGING HETEROGENEOUS GPU COMPUTING TO THE COMPUTATIONAL SCIENCE COMMUNITY
    Vetter, Jeffrey S.
    Glassbrook, Richard
    Dongarra, Jack
    Schwan, Karsten
    Loftis, Bruce
    McNally, Stephen
    Meredith, Jeremy
    Rogers, James
    Roth, Philip
    Spafford, Kyle
    Yalamanchili, Sudhakar
    COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (05) : 90 - 95
  • [4] Delay-Optimal Scheduling of VMs in a Queueing Cloud Computing System with Heterogeneous Workloads
    Guo, Mian
    Guan, Quansheng
    Chen, Weiqi
    Ji, Fei
    Peng, Zhiping
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (01) : 110 - 123
  • [5] Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads
    Stone, John E.
    Hallock, Michael J.
    Phillips, James C.
    Peterson, Joseph R.
    Luthey-Schulten, Zaida
    Schulten, Klaus
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 89 - 100
  • [6] Performance anomaly detection using isolation-trees in heterogeneous workloads of web applications in computing clouds
    Kardani-Moghaddam, Sara
    Buyya, Rajkumar
    Ramamohanarao, Kotagiri
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (20):
  • [7] Distributed load balancing in heterogeneous peer-to-peer networks for web computing libraries
    Gehweiler, Joachim
    Schomaker, Gunnar
    DS-RT 2006: TENTH IEEE INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL-TIME APPLICATIONS, PROCEEDINGS, 2006, : 51 - +
  • [8] Domain-specific virtual processors as a portable programming and execution model for parallel computational workloads on modern heterogeneous high-performance computing architectures
    Lyakh, Dmitry, I
    INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, 2019, 119 (12)