An enhanced GPU reduction at the warp-level

被引：0

作者：

Hou Neng ^{[1
]}

He Fazhi ^{[1
]}

Zhou Yi ^{[1
]}

机构：

[1] School of Computer Science and Technology, Wuhan University

来源：

CADDM | 2016年 / 02期

关键词：

reduction; graphical processing unit; computing unified device architecture; warp-level reduction;

D O I：

10.19583/j.1003-4951.2016.02.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, graphical processing unit(GPU)-accelerated intelligent algorithms have been widely utilized for solving combination optimization problems, which are NP-hard. These intelligent algorithms involves a common operation, namely reduction, in which the best suitable candidate solution in the neighborhood is selected. As one of the main procedures, it is necessary to optimize the reduction on the GPU. In this paper, we propose an enhanced warp-based reduction on the GPU. Compared with existing block-based reduction methods, our method exploit efficiently the potential of implementation at warp level, which better matches the characteristics of current GPU architecture. Firstly, in order to improve the global memory access performance, the vectoring accessing is utilized. Secondly, at the level of thread block reduction, an enhanced warp-based reduction on the shared memory are presented to form partial results. Thirdly, for the configuration of the number of thread blocks, the number of thread blocks can be obtained by maximizing the size of thread block and the maximum size of threads per stream multi-processor on GPU. Finally, the proposed method is evaluated on three generations of NVIDIA GPUs with the better performances than previous methods.

引用

页码：43 / 52

页数：10

共 50 条

[1] Warp-Level Parallelism: Enabling Multiple Replications In Parallel on GPU
Passerat-Palmbach, Jonathan
Caux, Jonathan
Siregar, Pridi
Mazel, Claude
Hill, D. R. C.
[J]. EUROPEAN SIMULATION AND MODELLING CONFERENCE 2011, 2011, : 76 - +
[2] COX: Exposing CUDA Warp-level Functions to CPUs
Han, Ruobing
Lee, Jaewon
Sim, Jaewoong
Kim, Hyesoon
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (04)
[3] Warp-Level Divergence in GPUs: Characterization, Impact, and Mitigation
Xiang, Ping
Yang, Yi
Zhou, Huiyang
[J]. 2014 20TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA-20), 2014, : 284 - 295
[4] Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs
De Gonzalo, Simon Garcia
Huang, Sitao
Gomez-Luna, Juan
Hammond, Simon
Mutlu, Onur
Hwu, Wen-mei
[J]. PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19), 2019, : 73 - 84
[5] Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling
Singh, Jayati
Olmedo, Ignacio Sanudo
Capodieci, Nicola
Marongiu, Andrea
Caccamo, Marco
[J]. PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1275 - 1280
[6] Benchmarking the GPU memory at the warp level
Fang, Minquan
Fang, Jianbin
Zhang, Weimin
Zhou, Haifang
Liao, Jianxing
Wang, Yuangang
[J]. PARALLEL COMPUTING, 2018, 71 : 23 - 41
[7] YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve
Zhang, Feng
Su, Jiya
Liu, Weifeng
He, Bingsheng
Wu, Ruofan
Du, Xiaoyong
Wang, Rujia
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (09) : 2321 - 2337
[8] Time Warp on the GPU: Design and Assessment
Liu, Xinhu
Andelfinger, Philipp
[J]. SIGSIM-PADS'17: PROCEEDINGS OF THE 2017 ACM SIGSIM CONFERENCE ON PRINCIPLES OF ADVANCED DISCRETE SIMULATION, 2017, : 109 - 120
[9] Improving GPU Performance via Large Warps and Two-Level Warp Scheduling
Narasiman, Veynu
Shebanow, Michael
Lee, Chang Joo
Miftakhutdinov, Rustam
Mutlu, Onur
Patt, Yale N.
[J]. PROCEEDINGS OF THE 2011 44TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 44), 2011, : 308 - 317
[10] A Model-Driven Approach to Warp/Thread-Block Level GPU Cache Bypassing
Dai, Hongwen
Li, Chao
Zhou, Huiyang
Gupta, Saurabh
Kartsaklis, Christos
Mantor, Mike
[J]. 2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,

← 1 2 3 4 5 →