Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs

被引：8

作者：

Yilmazer, Ayse ^{[1
]}

Chen, Zhongliang ^{[1
]}

Kaeli, David ^{[1
]}

机构：

[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

来源：

2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM | 2014年

关键词：

GPU; SIMD Efficiency; Redundant Computation; Scalar Waving;

D O I：

10.1109/IPDPS.2014.22

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

GPUs take advantage of uniformity in program control flow and utilize SIMD execution to obtain execution efficiency. In SIMD execution, threads are batched into SIMD groups to share a common program counter and execute identical instructions on SIMD pipelines. Previous research [1] has shown that there are a significant number of scalar instructions - instructions where different threads in a SIMD group execute using the same input operands and generate the exact same output - present in a range of applications. GPUs eliminate redundant fetches and decodes by utilizing a shared common pipeline front-end. However, most GPUs do not handle scalar instruction efficiently, allowing these instructions to be redundantly executed by the threads in a SIMD group. In this paper, we propose to use scalar execution to eliminate redundant execution of scalar instructions. We introduce scalar waving as a mechanism to batch scalar operations possessing the same PC and execute them as a group on SIMD lanes for efficiency. We also propose simultaneous execution of dynamically-formed scalar waves with SIMD groups to overcome the under-utilization of SIMD lanes when encountering divergence. We evaluate our work using 22 different GPU benchmarks taken from 4 different benchmark suites. We evaluate a range of configurations using timing simulation. Our results show that scalar waving can obtain up to a 25% improvement in performance on average. Our experiments also provide insight into the amount of performance gain that we can expect with scalar waving as a function of the scalar content, occupancy, and memory characteristics of the target application.

引用

页数：10

共 50 条

[1] Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution
Lee, Sangpil
Kim, Keunsoo
Koo, Gunjae
Jeon, Hyeran
Annavaram, Murali
Ro, Won Woo
IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (05) : 834 - 847
[2] Graph-Waving architecture: Efficient execution of graph applications on GPUs
Yilmazer-Metin, Ayse
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 148 : 69 - 82
[3] Improving energy efficiency by transparently sharing SIMD Execution Units in Assymetric Multicores
Vieira, Caio
Schneider Beck, Antonio Carlos
34TH SBC/SBMICRO/IEEE/ACM SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2021), 2021,
[4] Improving Execution Efficiency of Just-in-time Compilation based Query Processing on GPUs
Paul, Johns
He, Bingsheng
Lu, Shengliang
Lau, Chiew Tong
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (02): : 202 - 214
[5] Improving Software Productivity and Performance through a Transparent SIMD Execution
Jordan, Michael Guilherme
Knorst, Tiago
Rutzig, Mateus Beck
2018 31ST SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI), 2018,
[6] Improving MLP execution efficiency
Albesano, D.
Gemello, R.
Mana, F.
CSELT Technical Reports, 1997, 25 (06): : 1103 - 1110
[7] G-Scalar: Cost-Effective Generalized Scalar Execution Architecture for Power-Efficient GPUs
Liu, Zhenhong
Gilani, Syed
Annavaram, Murali
Kim, Nam Sung
2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 601 - 612
[8] Formal specification of SIMD execution
Farrell, CA
Kieronska, DH
1996 IEEE SECOND INTERNATIONAL CONFERENCE ON ALGORITHMS & ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP'96, PROCEEDINGS OF, 1996, : 319 - 325
[9] Improving Predication Efficiency through Compaction/Restoration of SIMD Instructions
Barredo, Adrian
Cebrian, Juan M.
Moreto, Miguel
Casas, Marc
Valero, Mateo
2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 717 - 728
[10] Improving the Efficiency of Program Analysis with Symbolic Execution
Fedorov, Alexey
Kokin, Vitaliy
Andrianov, Andrey
Vysochkin, Alexey
PROCEEDINGS OF THE 2017 IEEE RUSSIA SECTION YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING CONFERENCE (2017 ELCONRUS), 2017, : 390 - 393

← 1 2 3 4 5 →