A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引：1

作者：

Jia, Xun ^{[1
]}

Wu, Gunning ^{[1
]}

Xie, Xianghui ^{[1
]}

机构：

[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China

来源：

2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017) | 2017年

基金：

美国国家科学基金会;

关键词：

matrix multiplication; linear array; accelerator; high-performance; architecture;

D O I：

10.1109/ISPA/IUCC.2017.00063

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.

引用

页码：396 / 402

页数：7

共 50 条

[21] Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication
Khayyat, Ahmad
Manjikian, Naraig
CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2014, 37 (02): : 65 - 75
[22] Accurate Complex Multiplication in Floating-Point Arithmetic
Lefevre, Vincent
Muller, Jean-Michel
2019 IEEE 26TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2019, : 23 - 29
[23] High-Performance Computation in Residue Number System Using Floating-Point Arithmetic
Isupov, Konstantin
COMPUTATION, 2021, 9 (02) : 1 - 15
[24] HIGH-PERFORMANCE FPGA-BASED FLOATING-POINT ADDER WITH THREE INPUTS
Guntoro, Andre
Glesner, Manfred
2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2, 2008, : 626 - 629
[25] Floating-point division on programmable high-performance signal-processing hardware
Pilz, NA
Adamson, K
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING II, 2002, : 519 - 524
[26] SWM: A High-Performance Sparse-Winograd Matrix Multiplication CNN Accelerator
Wu, Di
Fan, Xitian
Cao, Wei
Wang, Lingli
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (05) : 936 - 949
[27] SpWMM: A High-Performance Sparse-Winograd Matrix-Matrix Multiplication Accelerator for CNNs
Wu, Di
Cao, Wei
Wang, Lingli
2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 255 - 258
[28] Anytime Floating-Point Addition and Multiplication - Concepts and Implementations
Brand, Marcel
Witterauf, Michael
Bosio, Alberto
Teich, Juergen
2020 IEEE 31ST INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2020), 2020, : 157 - 164
[29] Floating-point matrix product on FPGA
Bensaali, Faycal
Amira, Abbes
Sotudeh, Reza
2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 466 - +
[30] Open source high performance floating-point modules
Hemmert, K. Scott
Underwood, Keith D.
FCCM 2006: 14TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2006, : 349 - +

← 1 2 3 4 5 →