A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC

被引：1

作者：

Li, Boyu ^{[1
]}

Li, Kai ^{[2
]}

Zhou, Jiajun ^{[1
]}

Ren, Yuan ^{[1
]}

Mao, Wei ^{[2
]}

Yu, Hao ^{[2
]}

Wong, Ngai ^{[1
]}

机构：

[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China

[2] Southern Univ Sci & Technol, Sch Microelect, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2024年 / 71卷 / 03期

关键词：

Multiple-precision; floating-point; fixed-point; PE; MAC; HPC; UNIT; ARCHITECTURE; ADD;

D O I：

10.1109/TCSII.2023.3322259

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating- and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support $9\times $ BFloat16 (BF16), 4 $\times $ half-precision (FP16), $4\times $ TensorFloat-32 (TF32) and $1\times $ single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72 $\times $ INT2, 36 $\times $ INT4 and 9 $\times $ INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10 $\times $ and 4 $\times $ improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference.

引用

页码：1401 / 1405

页数：5

共 50 条

[31] SMURF: Scalar Multiple-precision Unum Risc-V Floating-point Accelerator for Scientific Computing
Bocco, Andrea
Durand, Yves
De Dinechin, Florent
CONFERENCE FOR NEXT GENERATION ARITHMETIC 2019 (CONGA), 2019,
[32] A FIXED-POINT MULTIPLE SHOOTING METHOD
MEYER, PW
COMPUTING, 1988, 40 (01) : 75 - 83
[33] Dual fixed-point: An efficient alternative to floating-point computation
Ewe, CT
Cheung, PYK
Constantinides, GA
FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2004, 3203 : 200 - 208
[34] Automated floating-point to fixed-point conversion with the fixify environment
Belanovic, P
Rupp, M
16TH INTERNATIONAL WORKSHOP ON RAPID SYSTEM PROTOTYPING, PROCEEDINGS: SHORTENING THE PATH FROM SPECIFICATION TO PROTOTYPE, 2005, : 172 - 178
[35] VLSI design of low-cost and high-precision fixed-point reconfigurable FFT processors
Xiao, Hao
Yin, Xiang
Wu, Ning
Chen, Xin
Li, Jun
Chen, Xiaoxing
IET COMPUTERS AND DIGITAL TECHNIQUES, 2018, 12 (03): : 105 - 110
[36] $10 floating-point DSP approaches fixed-point price
Levy, M
EDN, 1998, 43 (08) : 11 - 11
[37] Fixed-Point Computing Element Design for Transcendental Functions and Primary Operations in Speech Processing
Chang, Chung-Hsien
Chen, Shi-Huang
Chen, Bo-Wei
Ji, Wen
Bharanitharan, K.
Wang, Jhing-Fa
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (05) : 1993 - 1997
[38] Floating Point Multiple-Precision Fused Multiply Add Architecture for Deep Learning Computation on Artix 7 FPGA Board
Vinotheni, Malar Shanmugam
Kumar, Veerabadran J. A. W. A. H. A. R. S. E. N. T. H. I. L.
ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2024, 24 (04) : 93 - 102
[39] A Low-power Carry Cut-Back Approximate Adder with Fixed-point Implementation and Floating-point Precision
Camus, Vincent
Schlachter, Jeremy
Enz, Christian
2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
[40] Enhancing the implementation of mathematical formulas for fixed-point and floating-point arithmetics
Matthieu Martel
Formal Methods in System Design, 2009, 35 : 265 - 278

← 1 2 3 4 5 →