Fast Convolution Operations on Many-Core Architectures

被引:6
|
作者
Li, Shigang [1 ]
Zhang, Yunquan [1 ]
Xiang, Chunyang [2 ]
Shi, Lei [2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] Zhengzhou Univ, Sch Informat Engn, Zhengzhou, Peoples R China
关键词
Convolution; GPU; Intel MIC; OpenCL; Deep learning; Computer vision;
D O I
10.1109/HPCC-CSS-ICESS.2015.94
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Convolution operations have been widely used in many important application domains, such as deep learning and computer vision, in which convolution is always the most time-consuming part. High computational throughput and memory bandwidth make many-core architectures the promising targets to accelerate these applications. In this paper, we implement and optimize different convolution operations, including 1D convolution, 2D convolution and multi-channel 2D convolution executed in mini-batch mode, on both GPU and Intel MIC many-core architectures. We find out that the performance bottleneck of 1D and 2D convolutions is on registers rather than local memory or L1/L2 cache, and therefore, register tiling is used to improve the performance. In addition, we present a novel solution for multi-channel 2D convolution, in which convolution is conducted on images directly instead of being translated to matrix multiplication, and the data reuse of the algorithm is fully exploited. We further summarize the parameters of autotuning for multi-channel 2D convolution and prune the search space based on heuristics. The experimental results show that, for the large filter size, our solution gets up to 33% performance improvement over cuDNN-v2 and up to 28% over clBLAS-based implementation, on GTX TITAN and AMD W8000 respectively. On Intel MIC, our solution gets up to 25% of the theoretical peak performance.
引用
收藏
页码:316 / 323
页数:8
相关论文
共 50 条
  • [1] A performance model of dense matrix operations on many-core architectures
    Long, Guoping
    Fan, Dongrui
    Zhang, Junchao
    Song, Fenglong
    Yuan, Nan
    Lin, Wei
    [J]. EURO-PAR 2008 PARALLEL PROCESSING, PROCEEDINGS, 2008, 5168 : 120 - 129
  • [2] A polyphase filter for many-core architectures
    Adamek, K.
    Novotny, J.
    Armour, W.
    [J]. ASTRONOMY AND COMPUTING, 2016, 16 : 1 - 16
  • [3] A Dynamic Schema to increase performance in Many-core Architectures through Percolation operations
    Garcia, Elkin
    Orozco, Daniel
    Khant, Rishi
    Venetis, Ioannis E.
    Livingston, Kelly
    Gao, Guang R.
    [J]. 2013 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2013, : 276 - 285
  • [4] A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
    Rokos, Georgios
    Gorman, Gerard
    Kelly, Paul H. J.
    [J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 414 - 425
  • [5] MANY-TASK COMPUTING ON MANY-CORE ARCHITECTURES
    Valero-Lara, Pedro
    Nookala, Poornima
    Pelayo, Fernando L.
    Jansson, Johan
    Dimitropoulos, Serapheim
    Raicu, Ioan
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2016, 17 (01): : 33 - 46
  • [6] Performance Evaluation of OpenFOAM on Many-Core Architectures
    Brzobohaty, Tomas
    Riha, Lubomir
    Karasek, Tomas
    Kozubek, Tomas
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
  • [7] Graph Reachability on Parallel Many-Core Architectures
    Quer, Stefano
    Calabrese, Andrea
    [J]. COMPUTATION, 2020, 8 (04) : 1 - 26
  • [8] Fast parallel beam propagation method based on multi-core and many-core architectures
    Shaaban, Adel
    Sayed, M.
    Hameed, Mohamed Farhat O.
    Saleh, Hassan, I
    Gomaa, L. R.
    Du, Yi-Chun
    Obayya, S. S. A.
    [J]. OPTIK, 2019, 180 : 484 - 491
  • [9] A Compressive Sensing Algorithm for Many-Core Architectures
    Borghi, A.
    Darbon, J.
    Peyronnet, S.
    Chan, T. F.
    Osher, S.
    [J]. ADVANCES IN VISUAL COMPUTING, PT II, 2010, 6454 : 678 - 686
  • [10] Power Gating Clustered Many-Core Architectures
    Musoll, Enric
    [J]. JOURNAL OF LOW POWER ELECTRONICS, 2008, 4 (03) : 290 - 300