TOAST: Automatic tiling for iterative stencil computations on GPUs

被引:8
|
作者
Rocha, Rodrigo C. O. [1 ]
Pereira, Alyson D. [3 ]
Ramos, Luiz [2 ]
Goes, Luis F. W. [1 ]
机构
[1] Pontificia Univ Catolica Minas Gerais PUC Minas, BR-30535901 Belo Horizonte, MG, Brazil
[2] Univ Estadual Campinas, UNICAMP, BR-13083852 Campinas, SP, Brazil
[3] Univ Fed Santa Catarina, BR-88040900 Florianopolis, SC, Brazil
来源
关键词
autotuning; GPU; optimization model; parallel skeletons; stencil computation; tiling;
D O I
10.1002/cpe.4053
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on graphics processing units (GPUs). In particular, tiling is a technique that can significantly enhance application performance by improving data locality and by reducing the volume of communication between host memory and GPU. In addition, tiling enables stencil applications to process inputs that are larger than the physical GPU memory. However, implementing tiling efficiently is complex, time-consuming, and error-prone. In this paper, we propose transparently optimized automatic stencil tiling (TOAST), an automatic tiling mechanism for iterative stencil computations running on GPUs; TOAST has 3 main benefits: (1) It incorporates an optimization model that seeks to maximize data reuse within tiles while respecting the amount of dynamically available GPU memory; (2) it offers a virtualized GPU memory for stencil computations, allowing for large input data; and (3) it performs optimal tiling transparently to the developer of the parallel stencil application. The current implementation of TOAST augments the PSkel framework with an internal solver based on genetic algorithms. Our experimental results show that TOAST improves the performance of iterative stencil applications by up to 13 x compared with their multithreaded (central processing unit-based) optimized versions and up to 48 x compared with a naive tiling approach on GPU. The TOAST mechanism is able to automatically achieve a low percentual overhead of data management compared with actual stencil computation.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Automatic tiling of iterative stencil loops
    Li, ZY
    Song, YH
    [J]. ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2004, 26 (06): : 975 - 1028
  • [2] Automatic Performance Tuning of Stencil Computations on GPUs
    Garvey, Joseph D.
    Abdelrahman, Tarek S.
    [J]. 2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2015, : 300 - 309
  • [3] A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs
    Garvey, Joseph D.
    Abdelrahman, Tarek S.
    [J]. SCIENTIFIC PROGRAMMING, 2018, 2018
  • [4] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
    Yang, Yang
    Cui, Hui-Min
    Feng, Xiao-Bing
    Xue, Jing-Ling
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2012, 27 (01) : 57 - 74
  • [5] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
    Yang Yang
    Hui-Min Cui
    Xiao-Bing Feng
    Jing-Ling Xue
    [J]. Journal of Computer Science and Technology, 2012, 27 : 57 - 74
  • [6] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
    杨杨
    崔慧敏
    冯晓兵
    薛京灵
    [J]. Journal of Computer Science & Technology, 2012, (01) : 57 - 74
  • [7] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
    杨杨
    崔慧敏
    冯晓兵
    薛京灵
    [J]. Journal of Computer Science & Technology., 2012, 27 (01) - 74
  • [8] Tiling Stencil Computations to Maximize Parallelism
    Bandishti, Vinayaka
    Pananilath, Irshad
    Bondhugula, Uday
    [J]. 2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [9] EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
    Manuel de Castro
    Inmaculada Santamaria-Valenzuela
    Yuri Torres
    Arturo Gonzalez-Escribano
    Diego R. Llanos
    [J]. The Journal of Supercomputing, 2023, 79 : 9409 - 9442
  • [10] EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
    de Castro, Manuel
    Santamaria-Valenzuela, Inmaculada
    Torres, Yuri
    Gonzalez-Escribano, Arturo
    Llanos, Diego R.
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (09): : 9409 - 9442