Optimizing GPU Memory Transactions for Convolution Operations

被引：5

作者：

Lu, Gangzhao ^{[1
]}

Zhang, Weizhe ^{[1
]}

Wang, Zheng ^{[2
]}

机构：

[1] Harbin Inst Technol, Comp Sci & Technol, Harbin, Peoples R China

[2] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020) | 2020年

关键词：

Performance Optimization; Convolution; Memory Optimization; GPUs;

D O I：

10.1109/CLUSTER49012.2020.00050

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Convolution computation is a common operation in deep neural networks (DNNs) and is often responsible for performance bottlenecks during training and inferencing. Existing approaches for accelerating convolution operations aim to reduce computational complexity. However, these strategies often increase the memory footprint with extra memory accesses, thereby leaving much room for performance improvement. This paper presents a novel approach to optimize memory access for convolution operations, specifically targeting GPU execution. Our approach leverages two optimization techniques to reduce the number of memory operations for convolution operations performed on the width and height dimensions. For convolution computations on the width dimension, we exploit shuffle instructions to exchange the overlapped columns of the input for reducing the number of memory transactions. For convolution operations on the height dimension, we multiply each overlapped row of the input with multiple rows of a filter to compute multiple output elements to improve the data locality of row elements. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPu. For 2D convolution, our approach delivers over 2x faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to 1.3x speedups over the quickest algorithm of cuDNN.

引用

页码：399 / 403

页数：5

共 50 条

[31] ADC-PIM: Accelerating Convolution on the GPU via In-Memory Approximate Data Comparison
Choi, Jungwoo
Lee, Hyuk-Jae
Rhee, Chae Eun
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2022, 12 (02) : 458 - 471
[32] Optimizing Memory Access in TCF Processors with Compute-Update Operations
Forsell, Martti
Roivainen, Jussi
Traff, Jesper Larsson
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 577 - 586
[33] Optimizing the operations
Ávila, Thiago Augusto
Metalurgia e Materiais, 2010, 66 (607): : 387 - 389
[34] Optimizing power efficiency for 3D stacked GPU-in-memory architecture
Wen, Wen
Yang, Jun
Zhang, Youtao
MICROPROCESSORS AND MICROSYSTEMS, 2017, 49 : 44 - 53
[35] Optimizing GPU Volume Rendering
Ruijters, Daniel
Vilanova, Anna
JOURNAL OF WSCG, 2006, 2006, 14 (1-3): : 9 - +
[36] Matrix operations Using GPU
Wang, Qihui
Shou, Zhouxiang
PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON APPLIED MATRIX THEORY, 2009, : 166 - 168
[37] Optimizing Polynomial Convolution for NTRUEncrypt
Dai, Wei
Whyte, William
Zhang, Zhenfei
IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (11) : 1572 - 1583
[38] Composable memory transactions
Harris, Tim
Marlow, Simon
Jones, Simon Peyton
Herlihy, Maurice
COMMUNICATIONS OF THE ACM, 2008, 51 (08) : 91 - 100
[39] Communicating Memory Transactions
Lesani, Mohsen
Palsberg, Jens
ACM SIGPLAN NOTICES, 2011, 46 (08) : 157 - 167
[40] DISCRETE CONVOLUTION WITH MODULO OPERATIONS
FERRE, R
APPLIED MATHEMATICS LETTERS, 1991, 4 (05) : 13 - 17

← 1 2 3 4 5 →