Optimizing Pointwise Convolutions on Multi-core DSPs

被引：0

作者：

Wang, Yang ^{[1
,2
,3
]}

Wang, Qinglin ^{[1
,2
]}

Pei, Xiangdong ^{[1
,2
]}

Mei, Songzhu ^{[1
]}

Liu, Jie ^{[1
,2
]}

机构：

[1] Natl Univ Def Technol, Natl Key Lab Parallel & Distributed Comp, Changsha 410073, Peoples R China

[2] Natl Univ Def Technol, Lab Digitizing Software Frontier Equipment, Changsha 410073, Peoples R China

[3] Beijing Inst Astronaut Syst Engn, Beijing 100076, Peoples R China

来源：

ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT VII | 2024年 / 14493卷

基金：

中国国家自然科学基金;

关键词：

CNNs; Pointwise Convolution; Direct Convolution; DSPs; Parallel algorithm;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pointwise convolutions are widely used in various convolutional neural networks, due to low computation complexity and parameter requirements. However, pointwise convolutions are still time-consuming like regular convolutions. As a result of increasing power consumption, low-power embedded processors have been brought into high-performance computing field, such as multi-core digital signal processors (DSPs). In this paper, we propose a high-performance multi-level parallel direct implementation of pointwise convolutions on multi-core DSPs in FT-M7032, a CPU-DSP heterogeneous prototype processor. The main optimizations include on-chip memory blocking, loop ordering, vectorization, register blocking, and multi-core parallelization. The experimental results show that the proposed direct implementation achieves much better performance than GEMM-based ones on FT-M7032, and a speedup of up to 79.26 times is achieved.

引用

页码：209 / 223

页数：15

共 50 条

[1] High performance dilated convolutions on multi-core DSPs
Yang Wang
Qinglin Wang
Xiangdong Pei
Songzhu Mei
Rongchun Li
Jie Liu
CCF Transactions on High Performance Computing, 2024, 6 : 78 - 93
[2] High performance dilated convolutions on multi-core DSPs
Wang, Yang
Wang, Qinglin
Pei, Xiangdong
Mei, Songzhu
Li, Rongchun
Liu, Jie
CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2024, 6 (01) : 78 - 93
[3] Optimizing General Matrix Multiplications on Modern Multi-core DSPs
Yu, Kainan
Qi, Xinxin
Zhang, Peng
Fang, Jianbin
Dong, Dezun
Wang, Ruibo
Tang, Tao
Huang, Chun
Che, Yonggang
Wang, Zheng
PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 964 - 975
[4] Optimizing Irregular-Shaped Matrix-Matrix Multiplication on Multi-Core DSPs
Yin, Shangfei
Wang, Qinglin
Hao, Ruochen
Zhou, Tianyang
Mei, Songzhu
Liu, Jie
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 451 - 461
[5] OpenMDSP:Extending OpenMP to Program Multi-Core DSPs
何江舟
陈文光
陈光日
郑纬民
汤志忠
叶寒栋
Journal of Computer Science & Technology, 2014, 29 (02) : 316 - 331
[6] OpenMDSP: Extending OpenMP to Program Multi-Core DSPs
He, Jiang-Zhou
Chen, Wen-Guang
Chen, Guang-Ri
Zheng, Wei-Min
Tang, Zhi-Zhong
Ye, Han-Dong
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (02) : 316 - 331
[7] OpenMDSP: Extending OpenMP to Program Multi-Core DSPs
Jiang-Zhou He
Wen-Guang Chen
Guang-Ri Chen
Wei-Min Zheng
Zhi-Zhong Tang
Han-Dong Ye
Journal of Computer Science and Technology, 2014, 29 : 316 - 331
[8] thSORT: an efficient parallel sorting algorithm on multi-core DSPs
Yang, Mouzhi
Zhang, Peng
Fang, Jianbin
Liu, Weifeng
Huang, Chun
CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2024, 6 (5) : 503 - 518
[9] Efficient and portable Winograd convolutions for multi-core processors
Dolz, Manuel F.
Martinez, Hector
Castello, Adrian
Alonso-Jorda, Pedro
Quintana-Orti, Enrique S.
JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 10589 - 10610
[10] Efficient and portable Winograd convolutions for multi-core processors
Manuel F. Dolz
Héctor Martínez
Adrián Castelló
Pedro Alonso-Jordá
Enrique S. Quintana-Ortí
The Journal of Supercomputing, 2023, 79 : 10589 - 10610

← 1 2 3 4 5 →