A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引:1
|
作者
Jia, Xun [1 ]
Wu, Gunning [1 ]
Xie, Xianghui [1 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China
基金
美国国家科学基金会;
关键词
matrix multiplication; linear array; accelerator; high-performance; architecture;
D O I
10.1109/ISPA/IUCC.2017.00063
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.
引用
收藏
页码:396 / 402
页数:7
相关论文
共 50 条
  • [41] A comparison of three rounding algorithms for IEEE floating-point multiplication
    Even, G
    Seidel, PM
    14TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 1999, : 225 - 232
  • [42] Binary Integer Decimal-Based Floating-Point Multiplication
    Gonzalez-Navarro, Sonia
    Tsen, Charles
    Schulte, Michael J.
    IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (07) : 1460 - 1466
  • [43] Floating-point accumulation circuit for matrix applications
    Bodnar, Michael R.
    Humphrey, John R.
    Curt, Petersen F.
    Prather, Dennis W.
    FCCM 2006: 14TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2006, : 303 - +
  • [44] LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks
    Lo, Yun-Chen
    Tsai, Yu-Chih
    Liu, Ren-Shuo
    IEEE COMPUTER ARCHITECTURE LETTERS, 2023, 22 (02) : 125 - 128
  • [45] Minimally Biased Multipliers for Approximate Integer and Floating-Point Multiplication
    Saadat, Hassaan
    Bokhari, Haseeb
    Parameswaran, Sri
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2623 - 2635
  • [46] A Reconfigurable Multiple-Precision Floating-Point Dot Product Unit for High-Performance Computing
    Mao, Wei
    Li, Kai
    Xie, Xinang
    Zhao, Shirui
    Li, He
    Yu, Hao
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1793 - 1798
  • [47] How to half the latency of IEEE compliant floating-point multiplication
    Seidel, PM
    24TH EUROMICRO CONFERENCE - PROCEEDING, VOLS 1 AND 2, 1998, : 329 - 332
  • [48] FORMALIZATION AND IMPLEMENTATION OF FLOATING-POINT MATRIX OPERATIONS
    KULISCH, U
    BOHLENDER, G
    COMPUTING, 1976, 16 (03) : 239 - 261
  • [49] A comparison of three rounding algorithms for IEEE floating-point multiplication
    Even, G
    Seidel, PM
    IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (07) : 638 - 650
  • [50] ADVANCING THE STANDARD IN FLOATING-POINT PERFORMANCE
    BRIGHTMAN, T
    HIGH PERFORMANCE SYSTEMS-THE MAGAZINE FOR TECHNOLOGY CHAMPIONS, 1989, 10 (11): : 59 - &