Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs

被引:8
|
作者
Yilmazer, Ayse [1 ]
Chen, Zhongliang [1 ]
Kaeli, David [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
关键词
GPU; SIMD Efficiency; Redundant Computation; Scalar Waving;
D O I
10.1109/IPDPS.2014.22
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
GPUs take advantage of uniformity in program control flow and utilize SIMD execution to obtain execution efficiency. In SIMD execution, threads are batched into SIMD groups to share a common program counter and execute identical instructions on SIMD pipelines. Previous research [1] has shown that there are a significant number of scalar instructions - instructions where different threads in a SIMD group execute using the same input operands and generate the exact same output - present in a range of applications. GPUs eliminate redundant fetches and decodes by utilizing a shared common pipeline front-end. However, most GPUs do not handle scalar instruction efficiently, allowing these instructions to be redundantly executed by the threads in a SIMD group. In this paper, we propose to use scalar execution to eliminate redundant execution of scalar instructions. We introduce scalar waving as a mechanism to batch scalar operations possessing the same PC and execute them as a group on SIMD lanes for efficiency. We also propose simultaneous execution of dynamically-formed scalar waves with SIMD groups to overcome the under-utilization of SIMD lanes when encountering divergence. We evaluate our work using 22 different GPU benchmarks taken from 4 different benchmark suites. We evaluate a range of configurations using timing simulation. Our results show that scalar waving can obtain up to a 25% improvement in performance on average. Our experiments also provide insight into the amount of performance gain that we can expect with scalar waving as a function of the scalar content, occupancy, and memory characteristics of the target application.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] CPU-Style SIMD Ray Traversal on GPUs
    Lier, Alexander
    Stamminger, Marc
    Selgrad, Kai
    HIGH-PERFORMANCE GRAPHICS 2018, 2018,
  • [22] Heterogeneous Isolated Execution for Commodity GPUs
    Jang, Insu
    Tang, Adrian
    Kim, Taehoon
    Sethumadhavan, Simha
    Huh, Jaehyuk
    TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV), 2019, : 455 - 468
  • [23] Graviton: Trusted Execution Environments on GPUs
    Volos, Stavros
    Vaswani, Kapil
    Bruno, Rodrigo
    PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2018, : 681 - 696
  • [24] Autotuning of configuration for program execution in GPUs
    Balaiah, Thanasekhar
    Parthasarathi, Ranjani
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [25] On the Correctness of the SIMT Execution Model of GPUs
    Habermaier, Axel
    Knapp, Alexander
    PROGRAMMING LANGUAGES AND SYSTEMS, 2012, 7211 : 316 - 335
  • [26] Adaptation of Algorithms for efficient execution on GPUs
    Bulavintsev, Vadim G.
    Zhdanov, Dmitry D.
    OPTICAL DESIGN AND TESTING XI, 2021, 11895
  • [27] WAM SPECIFICATION FOR PARALLEL EXECUTION ON SIMD COMPUTER
    IVANETS, S
    ILINSKY, N
    KRYLOV, M
    LECTURE NOTES IN ARTIFICIAL INTELLIGENCE, 1992, 592 : 232 - 239
  • [28] POSTER: An Optimized Predication Execution for SIMD extensions
    Barredo, Adrian
    Cebrian, Juan M.
    Moreto, Miquel
    Casas, Marc
    Valero, Mateo
    2019 28TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2019), 2019, : 478 - 479
  • [29] Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization
    Choi, Hong Jun
    Son, Dong Oh
    Kim, Jong Myon
    Kim, Cheol Hong
    JOURNAL OF SUPERCOMPUTING, 2014, 69 (01): : 330 - 356
  • [30] Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization
    Hong Jun Choi
    Dong Oh Son
    Jong Myon Kim
    Cheol Hong Kim
    The Journal of Supercomputing, 2014, 69 : 330 - 356