A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引：1

作者：

Jia, Xun ^{[1
]}

Wu, Gunning ^{[1
]}

Xie, Xianghui ^{[1
]}

机构：

[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China

来源：

2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017) | 2017年

基金：

美国国家科学基金会;

关键词：

matrix multiplication; linear array; accelerator; high-performance; architecture;

D O I：

10.1109/ISPA/IUCC.2017.00063

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.

引用

页码：396 / 402

页数：7

共 50 条

[41] A comparison of three rounding algorithms for IEEE floating-point multiplication
Even, G
Seidel, PM
14TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 1999, : 225 - 232
[42] Binary Integer Decimal-Based Floating-Point Multiplication
Gonzalez-Navarro, Sonia
Tsen, Charles
Schulte, Michael J.
IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (07) : 1460 - 1466
[43] Floating-point accumulation circuit for matrix applications
Bodnar, Michael R.
Humphrey, John R.
Curt, Petersen F.
Prather, Dennis W.
FCCM 2006: 14TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2006, : 303 - +
[44] LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks
Lo, Yun-Chen
Tsai, Yu-Chih
Liu, Ren-Shuo
IEEE COMPUTER ARCHITECTURE LETTERS, 2023, 22 (02) : 125 - 128
[45] Minimally Biased Multipliers for Approximate Integer and Floating-Point Multiplication
Saadat, Hassaan
Bokhari, Haseeb
Parameswaran, Sri
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2623 - 2635
[46] A Reconfigurable Multiple-Precision Floating-Point Dot Product Unit for High-Performance Computing
Mao, Wei
Li, Kai
Xie, Xinang
Zhao, Shirui
Li, He
Yu, Hao
PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1793 - 1798
[47] How to half the latency of IEEE compliant floating-point multiplication
Seidel, PM
24TH EUROMICRO CONFERENCE - PROCEEDING, VOLS 1 AND 2, 1998, : 329 - 332
[48] FORMALIZATION AND IMPLEMENTATION OF FLOATING-POINT MATRIX OPERATIONS
KULISCH, U
BOHLENDER, G
COMPUTING, 1976, 16 (03) : 239 - 261
[49] A comparison of three rounding algorithms for IEEE floating-point multiplication
Even, G
Seidel, PM
IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (07) : 638 - 650
[50] ADVANCING THE STANDARD IN FLOATING-POINT PERFORMANCE
BRIGHTMAN, T
HIGH PERFORMANCE SYSTEMS-THE MAGAZINE FOR TECHNOLOGY CHAMPIONS, 1989, 10 (11): : 59 - &

← 1 2 3 4 5 →