Optimizing Depthwise Separable Convolution Operations on GPUs

被引:21
|
作者
Lu, Gangzhao [1 ]
Zhang, Weizhe [1 ]
Wang, Zheng [2 ]
机构
[1] Harbin Inst Technol, Sch Cyberspace Sci, Harbin 150000, Peoples R China
[2] Univ Leeds, Sch Comp, Leeds LS2 9JT, W Yorkshire, England
基金
中国国家自然科学基金;
关键词
Convolution; Graphics processing units; Instruction sets; Kernel; Standards; Training; Registers; Performance optimization; convolution; depthwise; pointwise; memory optimization; GPU utilization;
D O I
10.1109/TPDS.2021.3084813
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The depthwise separable convolution is commonly seen in convolutional neural networks (CNNs), and is widely used to reduce the computation overhead of a standard multi-channel 2D convolution. Existing implementations of depthwise separable convolutions target accelerating model training with large batch sizes with a large number of samples to be processed at once. Such approaches are inadequate for small-batch-sized model training and the typical scenario of model inference where the model takes in a few samples at once. This article aims to bridge the gap of optimizing depthwise separable convolutions by targeting the GPU architecture. We achieve this by designing two novel algorithms to improve the column and row reuse of the convolution operation to reduce the number of memory operations performed on the width and the height dimensions of the 2D convolution. Our approach employs a dynamic tile size scheme to adaptively distribute the computational data across GPU threads to improve GPU utilization and to hide the memory access latency. We apply our approach on two GPU platforms: an NVIDIA RTX 2080Ti GPU and an embedded NVIDIA Jetson AGX Xavier GPU, and two data types: 32-bit floating point (FP32) and 8-bit integer (INT8). We compared our approach against cuDNN that is heavily tuned for the NVIDIA GPU architecture. Experimental results show that our approach delivers over 2x (up to 3x) performance improvement over cuDNN. We show that, when using a moderate batch size, our approach averagely reduces the end-to-end training time of MobileNet and EfficientNet by 9.7 and 7.3 percent respectively, and reduces the end-to-end inference time of MobileNet and EfficientNet by 12.2 and 11.6 percent respectively.
引用
收藏
页码:70 / 87
页数:18
相关论文
共 50 条
  • [21] MLogNet: A Logarithmic Quantization-Based Accelerator for Depthwise Separable Convolution
    Choi, Jooyeon
    Sim, Hyeonuk
    Oh, Sangyun
    Lee, Sugil
    Lee, Jongeun
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5220 - 5231
  • [22] Traffic flow prediction based on depthwise separable convolution fusion network
    Yue Yu
    Wei Sun
    Jianhua Liu
    Changfan Zhang
    Journal of Big Data, 9
  • [23] Recognition method for adhesive fish based on depthwise separable convolution network
    Zhang L.
    Li D.
    Cao X.
    Li W.
    Tian G.
    Duan Q.
    Duan, Qingling (dqling@cau.edu.cn); Duan, Qingling (dqling@cau.edu.cn); Duan, Qingling (dqling@cau.edu.cn), 1600, Chinese Society of Agricultural Engineering (37): : 160 - 167
  • [24] dsODENet: Neural ODE and Depthwise Separable Convolution for Domain Adaptation on FPGAs
    Kawakami, Hiroki
    Watanabe, Hirohisa
    Sugiura, Keisuke
    Matsutani, Hiroki
    30TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2022), 2022, : 152 - 156
  • [25] Depthwise Separable Convolution ResNet with attention mechanism for Alzheimer's detection
    Kadri, Rahma
    Bouaziz, Bassem
    Tmar, Mohamed
    Gargouri, Faiez
    2022 INTERNATIONAL CONFERENCE ON TECHNOLOGY INNOVATIONS FOR HEALTHCARE, ICTIH, 2022, : 47 - 52
  • [26] An Efficient Method for DPM Code Localization Based on Depthwise Separable Convolution
    Li, Yusheng
    Tian, Yong
    Tian, Jindong
    Zhou, Fei
    IEEE ACCESS, 2019, 7 : 42014 - 42023
  • [27] Dual Aggregated Federated Learning with Depthwise Separable Convolution for Smart Healthcare
    Xia, Yanyan
    Chen, Siguang
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 154 - 159
  • [28] Recognition of Crop Diseases Based on Depthwise Separable Convolution in Edge Computing
    Gu, Musong
    Li, Kuan-Ching
    Li, Zhongwen
    Han, Qiyi
    Fan, Wenjie
    SENSORS, 2020, 20 (15) : 1 - 16
  • [29] Deep Neural Networks with Depthwise Separable Convolution for Music Genre Classification
    Liang, Yunming
    Zhou, Yi
    Wan, Tongtang
    Shu, Xiaofeng
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 267 - 270
  • [30] An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
    Liu, Bing
    Zou, Danyin
    Feng, Lei
    Feng, Shou
    Fu, Ping
    Li, Junbao
    ELECTRONICS, 2019, 8 (03)