Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs

被引:8
|
作者
Yilmazer, Ayse [1 ]
Chen, Zhongliang [1 ]
Kaeli, David [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
关键词
GPU; SIMD Efficiency; Redundant Computation; Scalar Waving;
D O I
10.1109/IPDPS.2014.22
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
GPUs take advantage of uniformity in program control flow and utilize SIMD execution to obtain execution efficiency. In SIMD execution, threads are batched into SIMD groups to share a common program counter and execute identical instructions on SIMD pipelines. Previous research [1] has shown that there are a significant number of scalar instructions - instructions where different threads in a SIMD group execute using the same input operands and generate the exact same output - present in a range of applications. GPUs eliminate redundant fetches and decodes by utilizing a shared common pipeline front-end. However, most GPUs do not handle scalar instruction efficiently, allowing these instructions to be redundantly executed by the threads in a SIMD group. In this paper, we propose to use scalar execution to eliminate redundant execution of scalar instructions. We introduce scalar waving as a mechanism to batch scalar operations possessing the same PC and execute them as a group on SIMD lanes for efficiency. We also propose simultaneous execution of dynamically-formed scalar waves with SIMD groups to overcome the under-utilization of SIMD lanes when encountering divergence. We evaluate our work using 22 different GPU benchmarks taken from 4 different benchmark suites. We evaluate a range of configurations using timing simulation. Our results show that scalar waving can obtain up to a 25% improvement in performance on average. Our experiments also provide insight into the amount of performance gain that we can expect with scalar waving as a function of the scalar content, occupancy, and memory characteristics of the target application.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution
    Lee, Sangpil
    Kim, Keunsoo
    Koo, Gunjae
    Jeon, Hyeran
    Annavaram, Murali
    Ro, Won Woo
    IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (05) : 834 - 847
  • [2] Graph-Waving architecture: Efficient execution of graph applications on GPUs
    Yilmazer-Metin, Ayse
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 148 : 69 - 82
  • [3] Improving energy efficiency by transparently sharing SIMD Execution Units in Assymetric Multicores
    Vieira, Caio
    Schneider Beck, Antonio Carlos
    34TH SBC/SBMICRO/IEEE/ACM SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2021), 2021,
  • [4] Improving Execution Efficiency of Just-in-time Compilation based Query Processing on GPUs
    Paul, Johns
    He, Bingsheng
    Lu, Shengliang
    Lau, Chiew Tong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (02): : 202 - 214
  • [5] Improving Software Productivity and Performance through a Transparent SIMD Execution
    Jordan, Michael Guilherme
    Knorst, Tiago
    Rutzig, Mateus Beck
    2018 31ST SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI), 2018,
  • [6] Improving MLP execution efficiency
    Albesano, D.
    Gemello, R.
    Mana, F.
    CSELT Technical Reports, 1997, 25 (06): : 1103 - 1110
  • [7] G-Scalar: Cost-Effective Generalized Scalar Execution Architecture for Power-Efficient GPUs
    Liu, Zhenhong
    Gilani, Syed
    Annavaram, Murali
    Kim, Nam Sung
    2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 601 - 612
  • [8] Formal specification of SIMD execution
    Farrell, CA
    Kieronska, DH
    1996 IEEE SECOND INTERNATIONAL CONFERENCE ON ALGORITHMS & ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP'96, PROCEEDINGS OF, 1996, : 319 - 325
  • [9] Improving Predication Efficiency through Compaction/Restoration of SIMD Instructions
    Barredo, Adrian
    Cebrian, Juan M.
    Moreto, Miguel
    Casas, Marc
    Valero, Mateo
    2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 717 - 728
  • [10] Improving the Efficiency of Program Analysis with Symbolic Execution
    Fedorov, Alexey
    Kokin, Vitaliy
    Andrianov, Andrey
    Vysochkin, Alexey
    PROCEEDINGS OF THE 2017 IEEE RUSSIA SECTION YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING CONFERENCE (2017 ELCONRUS), 2017, : 390 - 393