A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引:1
|
作者
Jia, Xun [1 ]
Wu, Gunning [1 ]
Xie, Xianghui [1 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China
基金
美国国家科学基金会;
关键词
matrix multiplication; linear array; accelerator; high-performance; architecture;
D O I
10.1109/ISPA/IUCC.2017.00063
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.
引用
收藏
页码:396 / 402
页数:7
相关论文
共 50 条
  • [31] Masking FALCON’s Floating-Point Multiplication in Hardware
    Karabulut, Emre
    Aysu, Aydin
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024, 2024 (04): : 483 - 508
  • [32] ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION WITH AN FMA
    Jeannerod, Claude-Pierre
    Kornerup, Peter
    Louvet, Nicolas
    Muller, Jean-Michel
    MATHEMATICS OF COMPUTATION, 2017, 86 (304) : 881 - 898
  • [33] HIGH-PERFORMANCE 22-BIT FLOATING-POINT DIGITAL SIGNAL PROCESSOR.
    Jufuku, Toshio
    Ichiura, Noboru
    Nakaya, Shigehisa
    1600,
  • [34] High-Performance Floating-Point VLSI Architecture of a Lifting-Based Wavelet Processor
    Guntoro, Andre
    Momeni, Massoud
    Keil, Hans-Peter
    Glesner, Manfred
    ICSES 2008 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS, CONFERENCE PROCEEDINGS, 2008, : 35 - 38
  • [35] Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L
    Chatterjee, S
    Bachega, LR
    Bergner, P
    Dockser, KA
    Gunnels, JA
    Gupta, M
    Gustavson, FG
    Lapkowski, CA
    Liu, GK
    Mendell, M
    Nair, R
    Wait, CD
    Ward, TJC
    Wu, P
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2005, 49 (2-3) : 377 - 391
  • [36] Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L
    Chatterjee, S. (sc@us.ibm.com), 1600, IBM Corporation (49): : 2 - 3
  • [37] 32-BIT FLOATING-POINT IC HERALDS APPEARANCE OF HIGH-PERFORMANCE FAMILY
    不详
    ELECTRONIC PRODUCTS MAGAZINE, 1985, 27 (17): : 31 - 32
  • [38] Anatomy of high-performance matrix multiplication
    Goto, Kazushige
    Van De Geijn, Robert A.
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 34 (03):
  • [39] High-performance floating point divide
    Liddicoat, AA
    Flynn, MJ
    EUROMICRO SYMPOSIUM ON DIGITAL SYSTEMS DESIGN, PROCEEDINGS, 2001, : 354 - 361
  • [40] A HIGH-PERFORMANCE FLOATING POINT COPROCESSOR
    WOLRICH, G
    MCLELLAN, E
    HARADA, L
    MONTANARO, J
    YODLOWSKI, RAJ
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1984, 19 (05) : 690 - 696