Optimizing Pointwise Convolutions on Multi-core DSPs

被引:0
|
作者
Wang, Yang [1 ,2 ,3 ]
Wang, Qinglin [1 ,2 ]
Pei, Xiangdong [1 ,2 ]
Mei, Songzhu [1 ]
Liu, Jie [1 ,2 ]
机构
[1] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Lab Digitizing Software Frontier Equipment, Changsha 410073, Peoples R China
[3] Beijing Inst Astronaut Syst Engn, Beijing 100076, Peoples R China
基金
中国国家自然科学基金;
关键词
CNNs; Pointwise Convolution; Direct Convolution; DSPs; Parallel algorithm;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pointwise convolutions are widely used in various convolutional neural networks, due to low computation complexity and parameter requirements. However, pointwise convolutions are still time-consuming like regular convolutions. As a result of increasing power consumption, low-power embedded processors have been brought into high-performance computing field, such as multi-core digital signal processors (DSPs). In this paper, we propose a high-performance multi-level parallel direct implementation of pointwise convolutions on multi-core DSPs in FT-M7032, a CPU-DSP heterogeneous prototype processor. The main optimizations include on-chip memory blocking, loop ordering, vectorization, register blocking, and multi-core parallelization. The experimental results show that the proposed direct implementation achieves much better performance than GEMM-based ones on FT-M7032, and a speedup of up to 79.26 times is achieved.
引用
收藏
页码:209 / 223
页数:15
相关论文
共 50 条
  • [1] High performance dilated convolutions on multi-core DSPs
    Yang Wang
    Qinglin Wang
    Xiangdong Pei
    Songzhu Mei
    Rongchun Li
    Jie Liu
    CCF Transactions on High Performance Computing, 2024, 6 : 78 - 93
  • [2] High performance dilated convolutions on multi-core DSPs
    Wang, Yang
    Wang, Qinglin
    Pei, Xiangdong
    Mei, Songzhu
    Li, Rongchun
    Liu, Jie
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2024, 6 (01) : 78 - 93
  • [3] Optimizing General Matrix Multiplications on Modern Multi-core DSPs
    Yu, Kainan
    Qi, Xinxin
    Zhang, Peng
    Fang, Jianbin
    Dong, Dezun
    Wang, Ruibo
    Tang, Tao
    Huang, Chun
    Che, Yonggang
    Wang, Zheng
    PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 964 - 975
  • [4] Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs
    Yin, Shangfei
    Wang, Qinglin
    Hao, Ruochen
    Zhou, Tianyang
    Mei, Songzhu
    Liu, Jie
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 451 - 461
  • [5] OpenMDSP:Extending OpenMP to Program Multi-Core DSPs
    何江舟
    陈文光
    陈光日
    郑纬民
    汤志忠
    叶寒栋
    Journal of Computer Science & Technology, 2014, 29 (02) : 316 - 331
  • [6] OpenMDSP: Extending OpenMP to Program Multi-Core DSPs
    He, Jiang-Zhou
    Chen, Wen-Guang
    Chen, Guang-Ri
    Zheng, Wei-Min
    Tang, Zhi-Zhong
    Ye, Han-Dong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (02) : 316 - 331
  • [7] OpenMDSP: Extending OpenMP to Program Multi-Core DSPs
    Jiang-Zhou He
    Wen-Guang Chen
    Guang-Ri Chen
    Wei-Min Zheng
    Zhi-Zhong Tang
    Han-Dong Ye
    Journal of Computer Science and Technology, 2014, 29 : 316 - 331
  • [8] thSORT: an efficient parallel sorting algorithm on multi-core DSPs
    Yang, Mouzhi
    Zhang, Peng
    Fang, Jianbin
    Liu, Weifeng
    Huang, Chun
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2024, 6 (5) : 503 - 518
  • [9] Efficient and portable Winograd convolutions for multi-core processors
    Dolz, Manuel F.
    Martinez, Hector
    Castello, Adrian
    Alonso-Jorda, Pedro
    Quintana-Orti, Enrique S.
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 10589 - 10610
  • [10] Efficient and portable Winograd convolutions for multi-core processors
    Manuel F. Dolz
    Héctor Martínez
    Adrián Castelló
    Pedro Alonso-Jordá
    Enrique S. Quintana-Ortí
    The Journal of Supercomputing, 2023, 79 : 10589 - 10610