Optimizing GPU Memory Transactions for Convolution Operations

被引:5
|
作者
Lu, Gangzhao [1 ]
Zhang, Weizhe [1 ]
Wang, Zheng [2 ]
机构
[1] Harbin Inst Technol, Comp Sci & Technol, Harbin, Peoples R China
[2] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England
关键词
Performance Optimization; Convolution; Memory Optimization; GPUs;
D O I
10.1109/CLUSTER49012.2020.00050
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolution computation is a common operation in deep neural networks (DNNs) and is often responsible for performance bottlenecks during training and inferencing. Existing approaches for accelerating convolution operations aim to reduce computational complexity. However, these strategies often increase the memory footprint with extra memory accesses, thereby leaving much room for performance improvement. This paper presents a novel approach to optimize memory access for convolution operations, specifically targeting GPU execution. Our approach leverages two optimization techniques to reduce the number of memory operations for convolution operations performed on the width and height dimensions. For convolution computations on the width dimension, we exploit shuffle instructions to exchange the overlapped columns of the input for reducing the number of memory transactions. For convolution operations on the height dimension, we multiply each overlapped row of the input with multiple rows of a filter to compute multiple output elements to improve the data locality of row elements. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPu. For 2D convolution, our approach delivers over 2x faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to 1.3x speedups over the quickest algorithm of cuDNN.
引用
收藏
页码:399 / 403
页数:5
相关论文
共 50 条
  • [21] Hardware Transactional Memory Supporting I/O Operations within Transactions
    Liu, Yi
    Zhang, Xin
    Li, He
    Li, Mingxiu
    Qian, Depei
    HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 85 - +
  • [22] GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory
    Adamek, Karel
    Dimoudi, Sofia
    Giles, Mike
    Armour, Wesley
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2020, 17 (03)
  • [23] Real-time massive convolution for audio applications on GPU Massive convolution on GPU
    Belloch, Jose A.
    Gonzalez, Alberto
    Martinez-Zaldivar, F. J.
    Vidal, Antonio M.
    JOURNAL OF SUPERCOMPUTING, 2011, 58 (03): : 449 - 457
  • [24] Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators
    Abdelfattah, Ahmad
    Dongarra, Jack
    Keyes, David
    Ltaief, Hatem
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012, 2013, 7851 : 72 - 79
  • [25] Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU
    Abdelfattah, Ahmad
    Keyes, David
    Ltaief, Hatem
    EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS, 2013, 7640 : 207 - 216
  • [26] Efficient convolution pooling on the GPU
    Suita, Shunsuke
    Nishimura, Takahiro
    Tokura, Hiroki
    Nakano, Koji
    Ito, Yasuaki
    Kasagi, Akihiko
    Tabaru, Tsuguchika
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 138 : 222 - 229
  • [27] Optimizing Hyperplane Sweep Operations Using Asynchronous Multi-grain GPU Tasks
    Kaushik, Anirudh Mohan
    Aji, Ashwin M.
    Hassaan, Muhammad Amber
    Chalmers, Noel
    Wolfe, Noah
    Moe, Scott
    Puthoor, Sooraj
    Beckmann, Bradford M.
    PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2019), 2019, : 59 - 69
  • [28] Optimizing non-coalesced memory access for irregular applications with GPU computing
    Zheng, Ran
    Liu, Yuan-dong
    Jin, Hai
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (09) : 1285 - 1301
  • [29] Optimizing non-coalesced memory access for irregular applications with GPU computing
    Ran Zheng
    Yuan-dong Liu
    Hai Jin
    Frontiers of Information Technology & Electronic Engineering, 2020, 21 : 1285 - 1301
  • [30] THRESHOLDED CONVOLUTION OPERATIONS
    SKLANSKY, J
    JOURNAL OF THE ACM, 1970, 17 (01) : 161 - &