Ocelot: A Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous Systems

被引:87
|
作者
Diamos, Gregory [1 ]
Kerr, Andrew [1 ]
Yalamanchili, Sudhakar [1 ]
Clark, Nathan
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/1854273.1854318
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. The dynamic compiler is able to execute existing CUDA binaries without recompilation from source and supports switching between execution on an NVIDIA GPU and a many-core CPU at runtime. It has been validated against over 130 applications taken from the CUDA SDK, the UIUC Parboil benchmarks [1], the Virginia Rodinia benchmarks [2], the GPU-VSIPL signal and image processing library [3], the Thrust library [4], and several domain specific applications. This paper presents a high level overview of the implementation of the Ocelot dynamic compiler highlighting design decisions and trade-offs, and showcasing their effect on application performance. Several novel code transformations are explored that are applicable only when compiling explicitly parallel applications and traditional dynamic compiler optimizations are revisited for this new class of applications. This study is expected to inform the design of compilation tools for explicitly parallel programming models (such as OpenCL) as well as future CPU and CPU architectures.
引用
收藏
页码:353 / 364
页数:12
相关论文
共 50 条
  • [1] Billiards and related systems on the bulk-synchronous parallel model
    Marin, M
    [J]. 11TH WORKSHOP ON PARALLEL AND DISTRIBUTED SIMULATION, PROCEEDINGS, 1997, : 164 - 171
  • [2] EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications
    Chakraborty, Sourav
    Laguna, Ignacio
    Emani, Murali
    Mohror, Kathryn
    Panda, Dhabaleswar K.
    Schulz, Martin
    Subramoni, Hari
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (03):
  • [3] MigBSP plus plus : Improving Process Rescheduling on Bulk-Synchronous Parallel Applications
    Righi, Rodrigo da Rosa
    Gomes, Roberto de Quadros
    Rodrigues, Vinicius Facco
    da Costa, Cristiano Andre
    Alberti, Antonio Marcos
    [J]. 2015 IEEE/ACS 12TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2015,
  • [4] MigPF: Towards on self-organizing process rescheduling of Bulk-Synchronous Parallel applications
    Righi, Rodrigo da Rosa
    Gomes, Roberto de Quadros
    Rodrigues, Vinicius Facco
    da Costa, Cristiano Andre
    Alberti, Antonio Marcos
    Pilla, Laercio Lima
    Alexandre Navaux, Philippe Olivier
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 78 : 272 - 286
  • [5] Slack-conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications
    Kale, Vivek
    Gamblin, Todd
    Hoefler, Torsten
    de Supinski, Bronis R.
    Gropp, William D.
    [J]. 2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1392 - 1392
  • [6] Optimal period of workload redistribution for dynamic bulk synchronous computations in heterogeneous computing systems
    Li, KQ
    [J]. JOURNAL OF SUPERCOMPUTING, 2006, 35 (03): : 205 - 226
  • [7] Optimal Period of Workload Redistribution for Dynamic Bulk Synchronous Computations in Heterogeneous Computing Systems
    Keqin Li
    [J]. The Journal of Supercomputing, 2006, 35 : 205 - 226
  • [8] A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters
    Cong, Guojing
    Bhardwaj, Onkar
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 818 - 821
  • [9] A framework designed for synchronous groupware applications in heterogeneous environments
    Guicking, Axel
    Grasse, Thomas
    [J]. GROUPWARE: DESIGN, IMPLEMENTATION, AND USE, 2006, 4154 : 203 - 218
  • [10] A Dynamic Reliability Management Framework for Heterogeneous Multicore Systems
    Baldassari, Alessandro
    Bolchini, Cristiana
    Miele, Antonio
    [J]. 2017 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFT), 2017, : 68 - 73