A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引:1
|
作者
Jia, Xun [1 ]
Wu, Gunning [1 ]
Xie, Xianghui [1 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China
基金
美国国家科学基金会;
关键词
matrix multiplication; linear array; accelerator; high-performance; architecture;
D O I
10.1109/ISPA/IUCC.2017.00063
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.
引用
收藏
页码:396 / 402
页数:7
相关论文
共 50 条
  • [21] Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication
    Khayyat, Ahmad
    Manjikian, Naraig
    CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2014, 37 (02): : 65 - 75
  • [22] Accurate Complex Multiplication in Floating-Point Arithmetic
    Lefevre, Vincent
    Muller, Jean-Michel
    2019 IEEE 26TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2019, : 23 - 29
  • [23] High-Performance Computation in Residue Number System Using Floating-Point Arithmetic
    Isupov, Konstantin
    COMPUTATION, 2021, 9 (02) : 1 - 15
  • [24] HIGH-PERFORMANCE FPGA-BASED FLOATING-POINT ADDER WITH THREE INPUTS
    Guntoro, Andre
    Glesner, Manfred
    2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2, 2008, : 626 - 629
  • [25] Floating-point division on programmable high-performance signal-processing hardware
    Pilz, NA
    Adamson, K
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING II, 2002, : 519 - 524
  • [26] SWM: A High-Performance Sparse-Winograd Matrix Multiplication CNN Accelerator
    Wu, Di
    Fan, Xitian
    Cao, Wei
    Wang, Lingli
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (05) : 936 - 949
  • [27] SpWMM: A High-Performance Sparse-Winograd Matrix-Matrix Multiplication Accelerator for CNNs
    Wu, Di
    Cao, Wei
    Wang, Lingli
    2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 255 - 258
  • [28] Anytime Floating-Point Addition and Multiplication - Concepts and Implementations
    Brand, Marcel
    Witterauf, Michael
    Bosio, Alberto
    Teich, Juergen
    2020 IEEE 31ST INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2020), 2020, : 157 - 164
  • [29] Floating-point matrix product on FPGA
    Bensaali, Faycal
    Amira, Abbes
    Sotudeh, Reza
    2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 466 - +
  • [30] Open source high performance floating-point modules
    Hemmert, K. Scott
    Underwood, Keith D.
    FCCM 2006: 14TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2006, : 349 - +