A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC

被引：1

作者：

Li, Boyu ^{[1
]}

Li, Kai ^{[2
]}

Zhou, Jiajun ^{[1
]}

Ren, Yuan ^{[1
]}

Mao, Wei ^{[2
]}

Yu, Hao ^{[2
]}

Wong, Ngai ^{[1
]}

机构：

[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China

[2] Southern Univ Sci & Technol, Sch Microelect, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2024年 / 71卷 / 03期

关键词：

Multiple-precision; floating-point; fixed-point; PE; MAC; HPC; UNIT; ARCHITECTURE; ADD;

D O I：

10.1109/TCSII.2023.3322259

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating- and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support $9\times $ BFloat16 (BF16), 4 $\times $ half-precision (FP16), $4\times $ TensorFloat-32 (TF32) and $1\times $ single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72 $\times $ INT2, 36 $\times $ INT4 and 9 $\times $ INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10 $\times $ and 4 $\times $ improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference.

引用

页码：1401 / 1405

页数：5

共 50 条

[1] A Configurable Floating-Point Multiple-Precision Processing Element for HPC and AI Converged Computing
Mao, Wei
Li, Kai
Cheng, Quan
Dai, Liuyao
Li, Boyu
Xie, Xinang
Li, He
Lin, Longyang
Yu, Hao
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (02) : 213 - 226
[2] Multiple-precision fixed-point vector multiply-accumulator using shared segmentation
Tan, D
Danyshl, A
Liebelt, M
16TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 2003, : 12 - 19
[3] An algorithm for multiple-precision floating-point-multiplication
Takahashi, D
APPLIED MATHEMATICS AND COMPUTATION, 2005, 166 (02) : 291 - 298
[4] A Reconfigurable Multiple-Precision Floating-Point Dot Product Unit for High-Performance Computing
Mao, Wei
Li, Kai
Xie, Xinang
Zhao, Shirui
Li, He
Yu, Hao
PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1793 - 1798
[5] ALGORITHM 693 - A FORTRAN PACKAGE FOR FLOATING-POINT MULTIPLE-PRECISION ARITHMETIC
SMITH, DM
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1991, 17 (02): : 273 - 283
[6] A MULTIPLE-PRECISION FLOATING-POINT INTERPRETIVE PROGRAM FOR CONTROL DATA 1604
STROUD, AH
SECREST, D
COMPUTER JOURNAL, 1963, 6 (01): : 62 - &
[7] (FLOATING-POINT DYNAMIC-VARIABLE-RANGE MULTIPLE-PRECISION OPERATORS)
FRECON, L
CLAIRE, L
ELECTRONICS LETTERS, 1972, 8 (08) : 191 - &
[8] MPFR: A multiple-precision binary floating-point library with correct rounding
Fousse, Laurent
Hanrot, Guillaume
Leflvre, Vincent
Plissier, Patrick
Zimmermann, Paul
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2007, 33 (02):
[9] Energy-Efficient Multiple-Precision Floating-Point Multiplier for Embedded Applications
Kuang, Shiann-Rong
Wu, Kun-Yi
Yu, Kee-Khuan
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2013, 72 (01): : 43 - 55
[10] Energy-Efficient Multiple-Precision Floating-Point Multiplier for Embedded Applications
Shiann-Rong Kuang
Kun-Yi Wu
Kee-Khuan Yu
Journal of Signal Processing Systems, 2013, 72 : 43 - 55

← 1 2 3 4 5 →