Optimizing GPU Memory Transactions for Convolution Operations

被引:5
|
作者
Lu, Gangzhao [1 ]
Zhang, Weizhe [1 ]
Wang, Zheng [2 ]
机构
[1] Harbin Inst Technol, Comp Sci & Technol, Harbin, Peoples R China
[2] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England
关键词
Performance Optimization; Convolution; Memory Optimization; GPUs;
D O I
10.1109/CLUSTER49012.2020.00050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolution computation is a common operation in deep neural networks (DNNs) and is often responsible for performance bottlenecks during training and inferencing. Existing approaches for accelerating convolution operations aim to reduce computational complexity. However, these strategies often increase the memory footprint with extra memory accesses, thereby leaving much room for performance improvement. This paper presents a novel approach to optimize memory access for convolution operations, specifically targeting GPU execution. Our approach leverages two optimization techniques to reduce the number of memory operations for convolution operations performed on the width and height dimensions. For convolution computations on the width dimension, we exploit shuffle instructions to exchange the overlapped columns of the input for reducing the number of memory transactions. For convolution operations on the height dimension, we multiply each overlapped row of the input with multiple rows of a filter to compute multiple output elements to improve the data locality of row elements. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPu. For 2D convolution, our approach delivers over 2x faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to 1.3x speedups over the quickest algorithm of cuDNN.
引用
收藏
页码:399 / 403
页数:5
相关论文
共 50 条
  • [31] ADC-PIM: Accelerating Convolution on the GPU via In-Memory Approximate Data Comparison
    Choi, Jungwoo
    Lee, Hyuk-Jae
    Rhee, Chae Eun
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (02) : 458 - 471
  • [32] Optimizing Memory Access in TCF Processors with Compute-Update Operations
    Forsell, Martti
    Roivainen, Jussi
    Traff, Jesper Larsson
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 577 - 586
  • [33] Optimizing the operations
    Ávila, Thiago Augusto
    Metalurgia e Materiais, 2010, 66 (607): : 387 - 389
  • [34] Optimizing power efficiency for 3D stacked GPU-in-memory architecture
    Wen, Wen
    Yang, Jun
    Zhang, Youtao
    MICROPROCESSORS AND MICROSYSTEMS, 2017, 49 : 44 - 53
  • [35] Optimizing GPU Volume Rendering
    Ruijters, Daniel
    Vilanova, Anna
    JOURNAL OF WSCG, 2006, 2006, 14 (1-3): : 9 - +
  • [36] Matrix operations Using GPU
    Wang, Qihui
    Shou, Zhouxiang
    PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON APPLIED MATRIX THEORY, 2009, : 166 - 168
  • [37] Optimizing Polynomial Convolution for NTRUEncrypt
    Dai, Wei
    Whyte, William
    Zhang, Zhenfei
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (11) : 1572 - 1583
  • [38] Composable memory transactions
    Harris, Tim
    Marlow, Simon
    Jones, Simon Peyton
    Herlihy, Maurice
    COMMUNICATIONS OF THE ACM, 2008, 51 (08) : 91 - 100
  • [39] Communicating Memory Transactions
    Lesani, Mohsen
    Palsberg, Jens
    ACM SIGPLAN NOTICES, 2011, 46 (08) : 157 - 167
  • [40] DISCRETE CONVOLUTION WITH MODULO OPERATIONS
    FERRE, R
    APPLIED MATHEMATICS LETTERS, 1991, 4 (05) : 13 - 17