Effective Automatic Parallelization of Stencil Computations

被引:61
|
作者
Krishnamoorthy, Sriram [1 ]
Baskaran, Muthu [1 ]
Bondhugula, Uday [1 ]
Ramanujam, J.
Rountev, Atanas [1 ]
Sadayappan, P. [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
Stencil computations; Tiling; Automatic parallelization; Load balance;
D O I
10.1145/1250734.1250761
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Performance optimization of stencil computations has been widely studied in the literature, since they occur in many computationally intensive scientific and engineering applications. Compiler frameworks have also been developed that can transform sequential stencil codes for optimization of data locality and parallelism. However, loop skewing is typically required in order to the stencil codes along the time dimension, resulting in load imbalance in pipelined parallel execution of the tiles. In this paper, we develop an approach for automatic parallelization of stencil codes, that explicitly addresses the issue of load-balanced execution of tiles. Experimental results are provided that demonstrate the effectiveness of the approach.
引用
收藏
页码:235 / 244
页数:10
相关论文
共 50 条
  • [31] Parameterized Diamond Tiling for Parallelizing Stencil Computations
    Wijesinghe, T.
    Senevirathne, K.
    Siriwardhana, C.
    Visitha, W.
    Jayasena, S.
    Rusira, T.
    Hall, M.
    [J]. 2017 3RD INTERNATIONAL MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON), 2017, : 99 - 104
  • [32] Autotuning divide-and-conquer stencil computations
    Natarajan, Ekanathan Palamadai
    Dehnavi, Maryam Mehri
    Leiserson, Charles
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (17):
  • [33] Modeling Stencil Computations on Modern HPC Architectures
    de la Cruz, Raul
    Araya-Polo, Mauricio
    [J]. HIGH PERFORMANCE COMPUTING SYSTEMS: PERFORMANCE MODELING, BENCHMARKING, AND SIMULATION, 2015, 8966 : 149 - 171
  • [34] Autotuning Stencil-Based Computations on GPUs
    Mametjanov, Azamat
    Lowell, Daniel
    Ma, Ching-Chen
    Norris, Boyana
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2012, : 266 - 274
  • [35] The memory behavior of cache oblivious stencil computations
    Frigo, Matteo
    Strumpen, Volker
    [J]. JOURNAL OF SUPERCOMPUTING, 2007, 39 (02): : 93 - 112
  • [36] Effective Automatic Computation Placement and Data Allocation for Parallelization of Regular Programs
    Reddy, Chandan
    Bondhugula, Uday
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 13 - 22
  • [37] Speeding Up Stencil Computations with Kernel Convolution
    Januario, Guilherme C.
    Rosenburg, Bryan S.
    Park, Yoonho
    Perrone, Michael
    Moreira, Jose
    Carvalho, Tereza C. M. B.
    [J]. PROCEEDINGS OF 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, (SBAC-PAD 2016), 2016, : 76 - 83
  • [38] Framework for Automatic Parallelization
    Anala, M. R.
    Dash, Deepika
    [J]. 2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING WORKSHOPS (HIPCW), 2018, : 112 - 118
  • [39] Automatic Parallelization Tools
    Qian, Ying
    [J]. WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 97 - 101
  • [40] Automatic parallelization with pMapper
    Travinin, Nadya
    Hoffmann, Henry
    Bond, Robert
    Chan, Hector
    Kepner, Jeremy
    Wong, Edmund
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2006, : 483 - +