Optimizing GPU Memory Transactions for Convolution Operations

被引:5
|
作者
Lu, Gangzhao [1 ]
Zhang, Weizhe [1 ]
Wang, Zheng [2 ]
机构
[1] Harbin Inst Technol, Comp Sci & Technol, Harbin, Peoples R China
[2] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England
关键词
Performance Optimization; Convolution; Memory Optimization; GPUs;
D O I
10.1109/CLUSTER49012.2020.00050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolution computation is a common operation in deep neural networks (DNNs) and is often responsible for performance bottlenecks during training and inferencing. Existing approaches for accelerating convolution operations aim to reduce computational complexity. However, these strategies often increase the memory footprint with extra memory accesses, thereby leaving much room for performance improvement. This paper presents a novel approach to optimize memory access for convolution operations, specifically targeting GPU execution. Our approach leverages two optimization techniques to reduce the number of memory operations for convolution operations performed on the width and height dimensions. For convolution computations on the width dimension, we exploit shuffle instructions to exchange the overlapped columns of the input for reducing the number of memory transactions. For convolution operations on the height dimension, we multiply each overlapped row of the input with multiple rows of a filter to compute multiple output elements to improve the data locality of row elements. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPu. For 2D convolution, our approach delivers over 2x faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to 1.3x speedups over the quickest algorithm of cuDNN.
引用
收藏
页码:399 / 403
页数:5
相关论文
共 50 条
  • [41] Optimizing strided remote memory access operations on the Quadrics QsNetII network interconnect
    Nieplocha, Jarek
    Tipparaju, Vinod
    Krishnan, Manoj
    EIGHTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION, PROCEEDINGS, 2005, : 28 - 35
  • [42] Optimizing GPU Energy Efficiency with 3D Die-Stacking Graphics Memory and Reconfigurable Memory Interface
    Zhao, Jishen
    Sun, Guangyu
    Loh, Gabriel H.
    Xie, Yuan
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (04)
  • [43] GPU accelerated radio astronomy signal convolution
    Harris, Chris
    Haines, Karen
    Staveley-Smith, Lister
    EXPERIMENTAL ASTRONOMY, 2008, 22 (1-2) : 129 - 141
  • [44] Research on Winograd Convolution Acceleration for Modern GPU
    Tong G.
    Huang L.-B.
    Lyu Y.-S.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (01): : 244 - 257
  • [45] FFT and convolution performance in image filtering on GPU
    Fialka, Ondrej
    Cadik, Martin
    INFORMATION VISUALIZATION-BOOK, 2006, : 609 - +
  • [46] GPU accelerated radio astronomy signal convolution
    Chris Harris
    Karen Haines
    Lister Staveley-Smith
    Experimental Astronomy, 2008, 22 : 129 - 141
  • [47] Brief Announcement: Optimizing Persistent Transactions
    Zhou, Tingzhe
    Zardoshti, Pantea
    Spear, Michael F.
    SPAA'19: PROCEEDINGS OF THE 31ST ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURESS, 2019, 2019, : 169 - 170
  • [48] Real-time massive convolution for audio applications on GPUMassive convolution on GPU
    Jose A. Belloch
    Alberto Gonzalez
    F. J. Martínez-Zaldívar
    Antonio M. Vidal
    The Journal of Supercomputing, 2011, 58 : 449 - 457
  • [49] A general purpose contention manager for software transactions on the GPU
    Shen, Qi
    Sharp, Craig
    Davison, Richard
    Ushaw, Gary
    Ranjan, Rajiv
    Zomaya, Albert Y.
    Morgan, Graham
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 139 (139) : 1 - 17
  • [50] OPTIMIZING PRODUCTION OPERATIONS
    SCHMIDT, B
    WUNSCHE, H
    PLAPP, C
    CHEMIE INGENIEUR TECHNIK, 1995, 67 (09) : 1058 - 1058