DHTS: A Dynamic Hybrid Tiling Strategy for Optimizing Stencil Computation on GPUs

被引：0

作者：

Liu, Song ^{[1
]}

Zhang, Zengyuan ^{[1
]}

Wu, Weiguo ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Shaanxi, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Stencil computation; dynamic hybrid tiling; performance;

D O I：

10.1109/TC.2023.3271060

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Stencil computation is an important class of computational modes in scientific computing applications. Loop tiling techniques have been widely studied to accelerate stencil computations on different architectures by exploiting parallelism and data locality. Recent advanced tiling methods enable the tile-wise concurrent start-up to improve the execution performance. However, such methods statically partition all dimensions of iteration space into tiles with predetermined complex shapes and sizes, and thus lead to low thread utilization and memory access efficiency on GPUs. In this paper, we present DHTS, a novel dynamic hybrid tiling strategy for stencil computations. DHTS employs static tiling on the outer dimensions to achieve concurrent start-up parallelism, while proposes a dynamic rectangular tiling method on the inner dimensions to improve thread utilization and memory access efficiency. By deriving tile size constraints, DHTS adaptively achieves equal-size workload of tiles, and therefore reducing idle threads and increasing coalesced memory accesses within tiles. We implement the proposed strategy with different complex tile shapes. Experimental results on Titan V and Tesla V100 GPUs show that DHTS effectively improves the execution performance of 2D/3D stencils compared to state-of-the-art tiling methods, and achieves the best improvement of 28x.

引用

页码：2795 / 2807

页数：13

共 50 条

[1] TOAST: Automatic tiling for iterative stencil computations on GPUs
Rocha, Rodrigo C. O.
Pereira, Alyson D.
Ramos, Luiz
Goes, Luis F. W.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (08):
[2] Optimizing convolution operations on GPUs using adaptive tiling
van Werkhovena, Ben
Maassen, Jason
Bal, Henri E.
Seinstra, Frank J.
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 30 : 14 - 26
[3] A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs
Garvey, Joseph D.
Abdelrahman, Tarek S.
SCIENTIFIC PROGRAMMING, 2018, 2018
[4] Optimizing Stencil Code via Locality of Computation
Luo, Yulong
Tan, Guangming
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 477 - 478
[5] Optimized Three-Dimensional Stencil Computation on Fermi and Kepler GPUs
Vizitiu, Anamaria
Itu, Lucian
Nita, Cosmin
Suciu, Constantin
2014 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2014,
[6] Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs
Wen-Jing Ma
Kan Gao
Guo-Ping Long
Journal of Computer Science and Technology, 2016, 31 : 1262 - 1274
[7] Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs
Ma, Wen-Jing
Gao, Kan
Long, Guo-Ping
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2016, 31 (06) : 1262 - 1274
[8] Optimizing Stencil Computation on Multi-core DSPs
Zhu, Fugeng
Fang, Jianbin
Yu, Kainan
Qi, Xinxin
Tang, Tao
Xie, Jing
Ren, Jie
Zhang, Peng
Che, Yonggang
Huang, Chun
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 679 - 690
[9] Hexagonal Tiling based Multiple FPGAs Stencil Computation Acceleration and Optimization Methodology
Wang, Jinyu
Kang, Yifei
Li, Yiwen
Wu, Weiguo
Liu, Song
Wang, Longxiang
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 697 - 705
[10] A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs
Yang, Yang
Cui, Hui-Min
Feng, Xiao-Bing
Xue, Jing-Ling
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2012, 27 (01) : 57 - 74

← 1 2 3 4 5 →