A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC

被引:1
|
作者
Li, Boyu [1 ]
Li, Kai [2 ]
Zhou, Jiajun [1 ]
Ren, Yuan [1 ]
Mao, Wei [2 ]
Yu, Hao [2 ]
Wong, Ngai [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[2] Southern Univ Sci & Technol, Sch Microelect, Shenzhen 518055, Peoples R China
关键词
Multiple-precision; floating-point; fixed-point; PE; MAC; HPC; UNIT; ARCHITECTURE; ADD;
D O I
10.1109/TCSII.2023.3322259
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating- and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support $9\times $ BFloat16 (BF16), 4 $\times $ half-precision (FP16), $4\times $ TensorFloat-32 (TF32) and $1\times $ single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72 $\times $ INT2, 36 $\times $ INT4 and 9 $\times $ INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10 $\times $ and 4 $\times $ improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference.
引用
收藏
页码:1401 / 1405
页数:5
相关论文
共 50 条
  • [31] SMURF: Scalar Multiple-precision Unum Risc-V Floating-point Accelerator for Scientific Computing
    Bocco, Andrea
    Durand, Yves
    De Dinechin, Florent
    CONFERENCE FOR NEXT GENERATION ARITHMETIC 2019 (CONGA), 2019,
  • [32] A FIXED-POINT MULTIPLE SHOOTING METHOD
    MEYER, PW
    COMPUTING, 1988, 40 (01) : 75 - 83
  • [33] Dual fixed-point: An efficient alternative to floating-point computation
    Ewe, CT
    Cheung, PYK
    Constantinides, GA
    FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2004, 3203 : 200 - 208
  • [34] Automated floating-point to fixed-point conversion with the fixify environment
    Belanovic, P
    Rupp, M
    16TH INTERNATIONAL WORKSHOP ON RAPID SYSTEM PROTOTYPING, PROCEEDINGS: SHORTENING THE PATH FROM SPECIFICATION TO PROTOTYPE, 2005, : 172 - 178
  • [35] VLSI design of low-cost and high-precision fixed-point reconfigurable FFT processors
    Xiao, Hao
    Yin, Xiang
    Wu, Ning
    Chen, Xin
    Li, Jun
    Chen, Xiaoxing
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2018, 12 (03): : 105 - 110
  • [36] $10 floating-point DSP approaches fixed-point price
    Levy, M
    EDN, 1998, 43 (08) : 11 - 11
  • [37] Fixed-Point Computing Element Design for Transcendental Functions and Primary Operations in Speech Processing
    Chang, Chung-Hsien
    Chen, Shi-Huang
    Chen, Bo-Wei
    Ji, Wen
    Bharanitharan, K.
    Wang, Jhing-Fa
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (05) : 1993 - 1997
  • [38] Floating Point Multiple-Precision Fused Multiply Add Architecture for Deep Learning Computation on Artix 7 FPGA Board
    Vinotheni, Malar Shanmugam
    Kumar, Veerabadran J. A. W. A. H. A. R. S. E. N. T. H. I. L.
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2024, 24 (04) : 93 - 102
  • [39] A Low-power Carry Cut-Back Approximate Adder with Fixed-point Implementation and Floating-point Precision
    Camus, Vincent
    Schlachter, Jeremy
    Enz, Christian
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
  • [40] Enhancing the implementation of mathematical formulas for fixed-point and floating-point arithmetics
    Matthieu Martel
    Formal Methods in System Design, 2009, 35 : 265 - 278