An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

被引：3

作者：

Safonov, Ilia ^{[1
]}

Kornilov, Anton ^{[1
]}

Makienko, Daria ^{[1
]}

机构：

[1] Natl Res Nucl Univ MEPhI, Comp Sci & Control Syst Dept, Kashirskoye Highway, 31, Moscow 115409, Russia

来源：

ELECTRONICS | 2023年 / 12卷 / 01期

关键词：

GEMM; SIMD instructions; outer product; fixed point; DSP; parallel processing;

D O I：

10.3390/electronics12010078

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.

引用

页数：16

共 50 条

[1] PROGRAM DIVIDES 32-BIT BY 16-BIT NUMBERS
ALI, A
[J]. EDN, 1988, 33 (05) : 174 - 174
[2] A 32-BIT PROCESSOR ON A 16-BIT BUS
BUDZINSKI, M
[J]. CONTROL ENGINEERING, 1986, 33 (03) : 84 - 84
[3] Power analysis of a 32-bit RISC microcontroller integrated with a 16-bit DSP
Bajwa, RS
Schumann, N
Kojima, H
[J]. 1997 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, PROCEEDINGS, 1997, : 137 - 142
[4] 32-BIT ARITHMETIC SERVES 16-BIT LANGUAGES
GRAPPEL, RD
[J]. EDN MAGAZINE-ELECTRICAL DESIGN NEWS, 1980, 25 (07): : 82 - 82
[5] PORTING FROM 16-BIT TO 32-BIT EXTENDED DOS
HUFFMAN, J
[J]. DR DOBBS JOURNAL, 1993, 18 (01): : 28 - &
[6] STD BUS COMPETES WITH 16-BIT AND 32-BIT BUSES
SHAPIRO, SF
[J]. COMPUTER DESIGN, 1985, 24 (09): : 51 - &
[7] 32-BIT MINICOMPUTER ACHIEVES FULL 16-BIT COMPATIBILITY
WALLACH, S
HOLLAND, C
[J]. COMPUTER DESIGN, 1981, 20 (01): : 111 - 120
[8] 16-BIT 68000 MICROPROCESSOR CAMPS ON 32-BIT FRONTIER
HARTMAN, B
[J]. ELECTRONICS, 1979, 52 (21): : 118 - 125
[9] $5.25 buys a 16-bit fixed-point DSP
不详
[J]. COMPUTER DESIGN, 1996, 35 (09): : 112 - 112
[10] 16-BIT MICROS FORTIFY THEIR POSITIONS AGAINST 32-BIT INTRUDERS
WILSON, R
[J]. COMPUTER DESIGN, 1988, 27 (16): : 43 - 47

← 1 2 3 4 5 →