Optimizing GPU Memory Transactions for Convolution Operations

被引:5
|
作者
Lu, Gangzhao [1 ]
Zhang, Weizhe [1 ]
Wang, Zheng [2 ]
机构
[1] Harbin Inst Technol, Comp Sci & Technol, Harbin, Peoples R China
[2] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England
关键词
Performance Optimization; Convolution; Memory Optimization; GPUs;
D O I
10.1109/CLUSTER49012.2020.00050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolution computation is a common operation in deep neural networks (DNNs) and is often responsible for performance bottlenecks during training and inferencing. Existing approaches for accelerating convolution operations aim to reduce computational complexity. However, these strategies often increase the memory footprint with extra memory accesses, thereby leaving much room for performance improvement. This paper presents a novel approach to optimize memory access for convolution operations, specifically targeting GPU execution. Our approach leverages two optimization techniques to reduce the number of memory operations for convolution operations performed on the width and height dimensions. For convolution computations on the width dimension, we exploit shuffle instructions to exchange the overlapped columns of the input for reducing the number of memory transactions. For convolution operations on the height dimension, we multiply each overlapped row of the input with multiple rows of a filter to compute multiple output elements to improve the data locality of row elements. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPu. For 2D convolution, our approach delivers over 2x faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to 1.3x speedups over the quickest algorithm of cuDNN.
引用
收藏
页码:399 / 403
页数:5
相关论文
共 50 条
  • [1] Optimizing memory transactions
    Harris, Tim
    Plesko, Mark
    Shinnar, Avraham
    Tarditi, David
    ACM SIGPLAN NOTICES, 2006, 41 (06) : 14 - 25
  • [2] Optimizing Persistent Memory Transactions
    Zardoshti, Pantea
    Zhou, Tingzhe
    Liu, Yujie
    Spear, Michael
    2019 28TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2019), 2019, : 219 - +
  • [3] Optimizing Transactions for Captured Memory
    Dragojevic, Aleksandar
    Ni, Yang
    Adl-Tabatabai, Ali-Reza
    SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 214 - 222
  • [4] Optimizing Depthwise Separable Convolution Operations on GPUs
    Lu, Gangzhao
    Zhang, Weizhe
    Wang, Zheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (01) : 70 - 87
  • [5] Hardware Support for Scratchpad Memory Transactions on GPU Architectures
    Villegas, Alejandro
    Asenjo, Rafael
    Navarro, Angeles
    Plata, Oscar
    Ubal, Rafael
    Kaeli, David
    EURO-PAR 2017: PARALLEL PROCESSING, 2017, 10417 : 273 - 286
  • [6] Optimizing convolution operations on GPUs using adaptive tiling
    van Werkhovena, Ben
    Maassen, Jason
    Bal, Henri E.
    Seinstra, Frank J.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 30 : 14 - 26
  • [7] Optimizing memory transactions for large-scale programs
    Carvalho, Fernando Miguel
    Cachopo, Joao
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 89 : 13 - 24
  • [8] Remote Invalidation: Optimizing the Critical Path of Memory Transactions
    Hassan, Ahmed
    Palmieri, Roberto
    Ravindran, Binoy
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [9] Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU
    Liu, Xiaoyan
    Zheng, Xuegui
    Yang, Hailong
    Luan, Zhongzhi
    Qian, Depei
    PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 229 - 242
  • [10] FAST CONVOLUTION KERNELS ON PASCAL GPU WITH HIGH MEMORY EFFICIENCY
    Chang, Qiong
    Onishi, Masaki
    Maruyama, Tsutomu
    HIGH PERFORMANCE COMPUTING SYMPOSIUM (HPC 2018), 2018, 50 (04):