An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

被引:3
|
作者
Safonov, Ilia [1 ]
Kornilov, Anton [1 ]
Makienko, Daria [1 ]
机构
[1] Natl Res Nucl Univ MEPhI, Comp Sci & Control Syst Dept, Kashirskoye Highway, 31, Moscow 115409, Russia
关键词
GEMM; SIMD instructions; outer product; fixed point; DSP; parallel processing;
D O I
10.3390/electronics12010078
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.
引用
收藏
页数:16
相关论文
共 50 条