Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

被引:0
|
作者
Farui Wang
Weizhe Zhang
Haonan Guo
Meng Hao
Gangzhao Lu
Zheng Wang
机构
[1] Harbin Institute of Technology,School of Computer Science and Technology
[2] University of Leeds,School of Computing
来源
关键词
Heterogeneous computing; Source-to-source translation; OpenMP offloading; Compilation optimization; GPUs;
D O I
暂无
中图分类号
学科分类号
摘要
Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives. Given the large user base of shared memory OpenMP programs, our approach allows programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance. OAO introduces a novel runtime optimization scheme to automatically eliminate unnecessary host–device communication to minimize the communication overhead between the host and the accelerator device. We evaluate OAO by applying it to 23 benchmarks from the PolyBench and Rodinia suites on two distinct GPU platforms. Experimental results show that OAO achieves up to 32×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document} speedup over the original OpenMP version, and can reduce the host–device communication overhead by up to 99% over the hand-translated version.
引用
收藏
页码:4957 / 4987
页数:30
相关论文
共 34 条
  • [1] Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading
    Wang, Farui
    Zhang, Weizhe
    Guo, Haonan
    Hao, Meng
    Lu, Gangzhao
    Wang, Zheng
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (05): : 4957 - 4987
  • [2] Adaptive parallelism for OpenMP task parallel programs
    Scherer, A
    Gross, T
    Zwaenepoel, W
    [J]. LANGUAGES, COMPILERS, AND RUN-TIME SYSTEMS FOR SCALABLE COMPUTERS, 2000, 1915 : 113 - 127
  • [3] DawnCC: Automatic Annotation for Data Parallelism and Offloading
    Mendonca, Gleison
    Guimaraes, Breno
    Alves, Pericles
    Pereira, Marcio
    Araujo, Guido
    Pereira, Fernando Magno Quintao
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (02)
  • [4] Parallel Data Flow analysis for OpenMP programs
    Huang, Lei
    Sethuraman, Girija
    Chapman, Barbara
    [J]. PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 138 - 142
  • [5] Mixed-Data-Model Heterogeneous Compilation and OpenMP Offloading
    Kurth, Andreas
    Wolters, Koen
    Forsberg, Bjoern
    Capotondi, Alessandro
    Marongiu, Andrea
    Grosser, Tobias
    Benini, Luca
    [J]. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON COMPILER CONSTRUCTION (CC '20), 2020, : 119 - 131
  • [6] Automatic Selection of Parallel Data for Machine Translation
    Mouratidis, Despoina
    Kermanidis, Katia Lida
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 520 : 146 - 156
  • [7] Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems
    Wang, Zheng
    Grewe, Dominik
    O'Boyle, Michael F. P.
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (04)
  • [8] Portable Mapping of Data Parallel Programs to OpenCL for Heterogeneous Systems
    Grewe, Dominik
    Wang, Zheng
    O'Boyle, Michael F. P.
    [J]. PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 161 - 170
  • [9] AUTOMATIC PARALLEL CODE GENERATION FOR NUFFT DATA TRANSLATION ON MULTICORES
    Zhang, Yuanrui
    Liu, Jun
    Kultursay, Emre
    Kandemir, Mahmut
    Pitsianis, Nikos
    Sun, Xiaobai
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2012, 21 (02)
  • [10] Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs
    Ren, Bin
    Krishnamoorthy, Sriram
    Agrawal, Kunal
    Kulkarni, Milind
    [J]. ACM SIGPLAN NOTICES, 2017, 52 (08) : 117 - 130