Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs

被引：8

作者：

Yilmazer, Ayse ^{[1
]}

Chen, Zhongliang ^{[1
]}

Kaeli, David ^{[1
]}

机构：

[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

来源：

2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM | 2014年

关键词：

GPU; SIMD Efficiency; Redundant Computation; Scalar Waving;

D O I：

10.1109/IPDPS.2014.22

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

GPUs take advantage of uniformity in program control flow and utilize SIMD execution to obtain execution efficiency. In SIMD execution, threads are batched into SIMD groups to share a common program counter and execute identical instructions on SIMD pipelines. Previous research [1] has shown that there are a significant number of scalar instructions - instructions where different threads in a SIMD group execute using the same input operands and generate the exact same output - present in a range of applications. GPUs eliminate redundant fetches and decodes by utilizing a shared common pipeline front-end. However, most GPUs do not handle scalar instruction efficiently, allowing these instructions to be redundantly executed by the threads in a SIMD group. In this paper, we propose to use scalar execution to eliminate redundant execution of scalar instructions. We introduce scalar waving as a mechanism to batch scalar operations possessing the same PC and execute them as a group on SIMD lanes for efficiency. We also propose simultaneous execution of dynamically-formed scalar waves with SIMD groups to overcome the under-utilization of SIMD lanes when encountering divergence. We evaluate our work using 22 different GPU benchmarks taken from 4 different benchmark suites. We evaluate a range of configurations using timing simulation. Our results show that scalar waving can obtain up to a 25% improvement in performance on average. Our experiments also provide insight into the amount of performance gain that we can expect with scalar waving as a function of the scalar content, occupancy, and memory characteristics of the target application.

引用

页数：10

共 50 条

[21] CPU-Style SIMD Ray Traversal on GPUs
Lier, Alexander
Stamminger, Marc
Selgrad, Kai
HIGH-PERFORMANCE GRAPHICS 2018, 2018,
[22] Heterogeneous Isolated Execution for Commodity GPUs
Jang, Insu
Tang, Adrian
Kim, Taehoon
Sethumadhavan, Simha
Huh, Jaehyuk
TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV), 2019, : 455 - 468
[23] Graviton: Trusted Execution Environments on GPUs
Volos, Stavros
Vaswani, Kapil
Bruno, Rodrigo
PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2018, : 681 - 696
[24] Autotuning of configuration for program execution in GPUs
Balaiah, Thanasekhar
Parthasarathi, Ranjani
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
[25] On the Correctness of the SIMT Execution Model of GPUs
Habermaier, Axel
Knapp, Alexander
PROGRAMMING LANGUAGES AND SYSTEMS, 2012, 7211 : 316 - 335
[26] Adaptation of Algorithms for efficient execution on GPUs
Bulavintsev, Vadim G.
Zhdanov, Dmitry D.
OPTICAL DESIGN AND TESTING XI, 2021, 11895
[27] WAM SPECIFICATION FOR PARALLEL EXECUTION ON SIMD COMPUTER
IVANETS, S
ILINSKY, N
KRYLOV, M
LECTURE NOTES IN ARTIFICIAL INTELLIGENCE, 1992, 592 : 232 - 239
[28] POSTER: An Optimized Predication Execution for SIMD extensions
Barredo, Adrian
Cebrian, Juan M.
Moreto, Miquel
Casas, Marc
Valero, Mateo
2019 28TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2019), 2019, : 478 - 479
[29] Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization
Choi, Hong Jun
Son, Dong Oh
Kim, Jong Myon
Kim, Cheol Hong
JOURNAL OF SUPERCOMPUTING, 2014, 69 (01): : 330 - 356
[30] Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization
Hong Jun Choi
Dong Oh Son
Jong Myon Kim
Cheol Hong Kim
The Journal of Supercomputing, 2014, 69 : 330 - 356

← 1 2 3 4 5 →