An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

被引:3
|
作者
Safonov, Ilia [1 ]
Kornilov, Anton [1 ]
Makienko, Daria [1 ]
机构
[1] Natl Res Nucl Univ MEPhI, Comp Sci & Control Syst Dept, Kashirskoye Highway, 31, Moscow 115409, Russia
关键词
GEMM; SIMD instructions; outer product; fixed point; DSP; parallel processing;
D O I
10.3390/electronics12010078
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix multiplication is an important operation for many engineering applications. Sometimes new features that include matrix multiplication should be added to existing and even out-of-date embedded platforms. In this paper, an unusual problem is considered: how to implement matrix multiplication of 32-bit signed integers and fixed-point numbers on DSP having SIMD instructions for 16-bit integers only. For examined tasks, matrix size may vary from several tens to two hundred. The proposed mathematical approach for dense rectangular matrix multiplication of 32-bit numbers comprises decomposition of 32-bit matrices to matrices of 16-bit numbers, four matrix multiplications of 16-bit unsigned integers via outer product, and correction of outcome for signed integers and fixed point numbers. Several tricks for performance optimization are analyzed. In addition, ways for block-wise and parallel implementations are described. An implementation of the proposed method by means of 16-bit vector instructions is faster than matrix multiplication using 32-bit scalar instructions and demonstrates performance close to a theoretically achievable limit. The described technique can be generalized for matrix multiplication of n-bit integers and fixed point numbers via handling with matrices of n/2-bit integers. In conclusion, recommendations for practitioners who work on implementation of matrix multiplication for various DSP are presented.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] 16-bit floating point instructions for embedded multimedia applications
    Lacassagne, L
    Etiemble, D
    Kablia, SAO
    [J]. CAMP 2005: Seventh International Workshop on Computer Architecture for Machine Perception , Proceedings, 2005, : 198 - 203
  • [32] Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats
    Lehmann, Moritz
    Krause, Mathias J.
    Amati, Giorgio
    Sega, Marcello
    Harting, Jens
    Gekle, Stephan
    [J]. PHYSICAL REVIEW E, 2022, 106 (01)
  • [33] Empowering edge devices: FPGA-based 16-bit fixed-point accelerator with SVD for CNN on 32-bit memory-limited systems
    Yanamala, Rama Muni Reddy
    Pullakandam, Muralidhar
    [J]. INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2024, 52 (09) : 4755 - 4782
  • [34] MOTOROLA DSP96002 32-BIT FLOATING-POINT DSP
    不详
    [J]. EDN, 1995, 40 (10) : 68 - 68
  • [35] $5 buys a 20-MHz, 16-bit fixed-point DSP controller
    不详
    [J]. COMPUTER DESIGN, 1996, 35 (01): : 118 - 118
  • [36] ADA COMPILER YIELDS SOURCE CODE IN C, OPERATES ON BOTH 16-BIT AND 32-BIT COMPUTERS
    WALLER, L
    [J]. ELECTRONICS-US, 1983, 56 (14): : 49 - 50
  • [37] JBCore32 32-bit/16-bit Embedded Microprocessor with its system software and program development environment
    Cheng, X
    Tong, D
    Cui, GZ
    Wang, KY
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2001, 10 (02) : 188 - +
  • [38] ANALOG DEVICES ADSP-2100 FAMILY 16-BIT FIXED-POINT DSP
    不详
    [J]. EDN, 1995, 40 (10) : 47 - 47
  • [39] ZILOG Z893XX FAMILY 16-BIT FIXED-POINT DSP
    不详
    [J]. EDN, 1995, 40 (10) : 92 - 92
  • [40] Acoustic echo canceller system materialized with a 16-bit fixed point processing type DSP
    Sakaguchi, J
    Hoshino, T
    Fujii, K
    Ohga, J
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1999, E82A (12) : 2818 - 2821