An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

被引：3

作者：

Safonov, Ilia ^{[1
]}

Kornilov, Anton ^{[1
]}

Makienko, Daria ^{[1
]}

机构：

[1] Natl Res Nucl Univ MEPhI, Comp Sci & Control Syst Dept, Kashirskoye Highway, 31, Moscow 115409, Russia

来源：

ELECTRONICS | 2023年 / 12卷 / 01期

关键词：

GEMM; SIMD instructions; outer product; fixed point; DSP; parallel processing;

D O I：

10.3390/electronics12010078

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.

引用

页数：16

共 50 条

[31] 16-bit floating point instructions for embedded multimedia applications
Lacassagne, L
Etiemble, D
Kablia, SAO
[J]. CAMP 2005: Seventh International Workshop on Computer Architecture for Machine Perception , Proceedings, 2005, : 198 - 203
[32] Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats
Lehmann, Moritz
Krause, Mathias J.
Amati, Giorgio
Sega, Marcello
Harting, Jens
Gekle, Stephan
[J]. PHYSICAL REVIEW E, 2022, 106 (01)
[33] Empowering edge devices: FPGA-based 16-bit fixed-point accelerator with SVD for CNN on 32-bit memory-limited systems
Yanamala, Rama Muni Reddy
Pullakandam, Muralidhar
[J]. INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2024, 52 (09) : 4755 - 4782
[34] MOTOROLA DSP96002 32-BIT FLOATING-POINT DSP
不详
[J]. EDN, 1995, 40 (10) : 68 - 68
[35] $5 buys a 20-MHz, 16-bit fixed-point DSP controller
不详
[J]. COMPUTER DESIGN, 1996, 35 (01): : 118 - 118
[36] ADA COMPILER YIELDS SOURCE CODE IN C, OPERATES ON BOTH 16-BIT AND 32-BIT COMPUTERS
WALLER, L
[J]. ELECTRONICS-US, 1983, 56 (14): : 49 - 50
[37] JBCore32 32-bit/16-bit Embedded Microprocessor with its system software and program development environment
Cheng, X
Tong, D
Cui, GZ
Wang, KY
[J]. CHINESE JOURNAL OF ELECTRONICS, 2001, 10 (02) : 188 - +
[38] ANALOG DEVICES ADSP-2100 FAMILY 16-BIT FIXED-POINT DSP
不详
[J]. EDN, 1995, 40 (10) : 47 - 47
[39] ZILOG Z893XX FAMILY 16-BIT FIXED-POINT DSP
不详
[J]. EDN, 1995, 40 (10) : 92 - 92
[40] Acoustic echo canceller system materialized with a 16-bit fixed point processing type DSP
Sakaguchi, J
Hoshino, T
Fujii, K
Ohga, J
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1999, E82A (12) : 2818 - 2821

← 1 2 3 4 5 →