A High-Performance Accelerator for Floating-Point Matrix Multiplication

被引：1

作者：

Jia, Xun ^{[1
]}

Wu, Gunning ^{[1
]}

Xie, Xianghui ^{[1
]}

机构：

[1] State Key Lab Math Engn & Adv Comp, Wuxi 214125, Peoples R China

来源：

2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017) | 2017年

基金：

美国国家科学基金会;

关键词：

matrix multiplication; linear array; accelerator; high-performance; architecture;

D O I：

10.1109/ISPA/IUCC.2017.00063

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Matrix multiplication is a widely-used routine in science and engineering applications. Accelerating this routine is important, because applications with large-scale matrix multiplication are increasingly common, especially in the area of high-performance computing (HPC). However, existing computing platforms including CPU, GPGPU and FPGA suffer from unsatisfactory performance or efficiency for this routine. In this paper, we propose a high-performance accelerator for double-precision floating-point matrix multiplication, and build a performance model for design space exploration based on a memory access scheduling. Impact of architecture parameters on accelerator performance and efficiency are evaluated and analyzed. Experimental results show that our proposed accelerator with 256 processing elements (PEs) can achieve a maximum performance of 767.99 GFLOPS and an efficiency of 99.99% for large-scale matrix multiplication, which is well suited to the requirement of HPC applications.

引用

页码：396 / 402

页数：7

共 50 条

[31] Masking FALCON’s Floating-Point Multiplication in Hardware
Karabulut, Emre
Aysu, Aydin
IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024, 2024 (04): : 483 - 508
[32] ERROR BOUNDS ON COMPLEX FLOATING-POINT MULTIPLICATION WITH AN FMA
Jeannerod, Claude-Pierre
Kornerup, Peter
Louvet, Nicolas
Muller, Jean-Michel
MATHEMATICS OF COMPUTATION, 2017, 86 (304) : 881 - 898
[33] HIGH-PERFORMANCE 22-BIT FLOATING-POINT DIGITAL SIGNAL PROCESSOR.
Jufuku, Toshio
Ichiura, Noboru
Nakaya, Shigehisa
1600,
[34] High-Performance Floating-Point VLSI Architecture of a Lifting-Based Wavelet Processor
Guntoro, Andre
Momeni, Massoud
Keil, Hans-Peter
Glesner, Manfred
ICSES 2008 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS, CONFERENCE PROCEEDINGS, 2008, : 35 - 38
[35] Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L
Chatterjee, S
Bachega, LR
Bergner, P
Dockser, KA
Gunnels, JA
Gupta, M
Gustavson, FG
Lapkowski, CA
Liu, GK
Mendell, M
Nair, R
Wait, CD
Ward, TJC
Wu, P
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2005, 49 (2-3) : 377 - 391
[36] Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L
Chatterjee, S. (sc@us.ibm.com), 1600, IBM Corporation (49): : 2 - 3
[37] 32-BIT FLOATING-POINT IC HERALDS APPEARANCE OF HIGH-PERFORMANCE FAMILY
不详
ELECTRONIC PRODUCTS MAGAZINE, 1985, 27 (17): : 31 - 32
[38] Anatomy of high-performance matrix multiplication
Goto, Kazushige
Van De Geijn, Robert A.
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 34 (03):
[39] High-performance floating point divide
Liddicoat, AA
Flynn, MJ
EUROMICRO SYMPOSIUM ON DIGITAL SYSTEMS DESIGN, PROCEEDINGS, 2001, : 354 - 361
[40] A HIGH-PERFORMANCE FLOATING POINT COPROCESSOR
WOLRICH, G
MCLELLAN, E
HARADA, L
MONTANARO, J
YODLOWSKI, RAJ
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1984, 19 (05) : 690 - 696

← 1 2 3 4 5 →