Analysis and Optimization of Direct Convolution Execution on Multi-Core Processors

被引:0
|
作者
Mannino, Mirco [1 ]
Peccerillo, Biagio [1 ]
Mondelli, Andrea [2 ]
Bartolini, Sandro [1 ]
机构
[1] Univ Siena, Dept Informat Engn & Math, I-53100 Siena, Italy
[2] Huawei Technol Co Ltd, Cambridge CB4 0WG, England
关键词
Convolutional neural networks; direct convolution; multi-core; multi-threading; performance evaluation;
D O I
10.1109/ACCESS.2023.3283312
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, convolutional neural networks are among the most widely used types of deep learning networks thanks to their usefulness in many application domains. There are many efforts to find methods to increase their training and inference performance and efficiency. One of the most widely used technique to implement convolution consists of flattening tensors into 2D matrices and carrying out the operation through a matrix-matrix multiplication routine, which has highly optimized implementations in high-performance libraries. However, this kind of approach uses extra time and memory to transform and store the tensors involved. For this reason, direct convolution is becoming increasingly popular. Direct convolution can be implemented as a series of nested loops iterating over tensor dimensions and it does not require extra memory. In this work, we evaluate on various multi-core CPUs the performance and scalability effects deriving from different parallelization strategies, loop organizations, and SIMD-vectorization approaches with different compilers in relation with architectural aspects. We discuss each parameter thoroughly and distill our findings in a set of heuristics that can be used to quickly achieve a high-performance implementation in accordance to the underlying hardware and the characteristics of the convolutional layer at hand. By adopting a per-layer approach, we increase performance up to 60-70% compared to a static implementation for all the layers. Moreover, our results are comparable, or even better (up to 1.67 x speedup) than matrix-matrix multiplication-based convolution in a multi-core system.
引用
收藏
页码:57514 / 57528
页数:15
相关论文
共 50 条
  • [41] A new direct acyclic graph task scheduling method for heterogeneous Multi-Core processors
    Xiao, Feng
    Chen, Shushan
    Han, Xingxing
    Huang, Shujuan
    Zhang, Wenjuan
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 104
  • [42] A WCET Analysis Method for Multi-Core Processors with Multi-Tier Coherence Protocol
    Zhu Y.
    Shi X.
    Yao Y.
    Li L.
    Ren P.
    Dong W.
    Li J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 30 - 42
  • [43] Analysis and design of computerized numerical controls for execution on multi-core systems
    Vivanco, Jose Maria
    Keinert, Matthias
    Lechler, Armin
    Verl, Alexander
    RESEARCH AND INNOVATION IN MANUFACTURING: KEY ENABLING TECHNOLOGIES FOR THE FACTORIES OF THE FUTURE - PROCEEDINGS OF THE 48TH CIRP CONFERENCE ON MANUFACTURING SYSTEMS, 2016, 41 : 864 - 869
  • [44] Securing Multi-core Multi-threaded Packet Processors
    Chasaki, Danai
    PROCEEDINGS OF THE EIGHTH ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS'12), 2012, : 149 - 150
  • [45] Optimizing one by one direct convolution on ARMv8 multi-core CPUs
    Wang, Qinglin
    Li, Dongsheng
    Mei, Songzhu
    Shen, Siqi
    Huang, Xiandong
    2020 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2020), 2020, : 43 - 47
  • [46] Workload-Aware Voltage Regulator Optimization for Power Efficient Multi-Core Processors
    Sinkar, Abhishek A.
    Wang, Hao
    Kim, Nam Sung
    DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2012), 2012, : 1134 - 1137
  • [47] A Framework for the Derivation of WCET Analyses for Multi-Core Processors
    Jacobs, Michael
    Hahn, Sebastian
    Hack, Sebastian
    PROCEEDINGS OF THE 28TH EUROMICRO CONFERENCE ON REAL-TIME SYSTEMS ECRTS 2016, 2016, : 141 - 151
  • [48] Comprehensive scheduling algorithm for asymmetric multi-core processors
    Chen, Rui-Zhong
    Qi, De-Yu
    Lin, Wei-Wei
    Li, Jian
    Ruan Jian Xue Bao/Journal of Software, 2013, 24 (02): : 343 - 357
  • [49] Novel parallel hough transform on multi-core processors
    Chen, Yen-Kuang
    Li, Wenlong
    Li, Jianguo
    Wang, Tao
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 1457 - 1460
  • [50] Evaluating and Modeling Power Consumption of Multi-Core Processors
    Basmadjian, Robert
    de Meer, Hermann
    2012 THIRD INTERNATIONAL CONFERENCE ON FUTURE ENERGY SYSTEMS: WHERE ENERGY, COMPUTING AND COMMUNICATION MEET (E-ENERGY), 2012,