A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC

被引:1
|
作者
Li, Boyu [1 ]
Li, Kai [2 ]
Zhou, Jiajun [1 ]
Ren, Yuan [1 ]
Mao, Wei [2 ]
Yu, Hao [2 ]
Wong, Ngai [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[2] Southern Univ Sci & Technol, Sch Microelect, Shenzhen 518055, Peoples R China
关键词
Multiple-precision; floating-point; fixed-point; PE; MAC; HPC; UNIT; ARCHITECTURE; ADD;
D O I
10.1109/TCSII.2023.3322259
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating- and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support $9\times $ BFloat16 (BF16), 4 $\times $ half-precision (FP16), $4\times $ TensorFloat-32 (TF32) and $1\times $ single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72 $\times $ INT2, 36 $\times $ INT4 and 9 $\times $ INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10 $\times $ and 4 $\times $ improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference.
引用
收藏
页码:1401 / 1405
页数:5
相关论文
共 50 条
  • [1] A Configurable Floating-Point Multiple-Precision Processing Element for HPC and AI Converged Computing
    Mao, Wei
    Li, Kai
    Cheng, Quan
    Dai, Liuyao
    Li, Boyu
    Xie, Xinang
    Li, He
    Lin, Longyang
    Yu, Hao
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2022, 30 (02) : 213 - 226
  • [2] Multiple-precision fixed-point vector multiply-accumulator using shared segmentation
    Tan, D
    Danyshl, A
    Liebelt, M
    16TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 2003, : 12 - 19
  • [3] An algorithm for multiple-precision floating-point-multiplication
    Takahashi, D
    APPLIED MATHEMATICS AND COMPUTATION, 2005, 166 (02) : 291 - 298
  • [4] A Reconfigurable Multiple-Precision Floating-Point Dot Product Unit for High-Performance Computing
    Mao, Wei
    Li, Kai
    Xie, Xinang
    Zhao, Shirui
    Li, He
    Yu, Hao
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1793 - 1798
  • [5] ALGORITHM 693 - A FORTRAN PACKAGE FOR FLOATING-POINT MULTIPLE-PRECISION ARITHMETIC
    SMITH, DM
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1991, 17 (02): : 273 - 283
  • [6] A MULTIPLE-PRECISION FLOATING-POINT INTERPRETIVE PROGRAM FOR CONTROL DATA 1604
    STROUD, AH
    SECREST, D
    COMPUTER JOURNAL, 1963, 6 (01): : 62 - &
  • [7] (FLOATING-POINT DYNAMIC-VARIABLE-RANGE MULTIPLE-PRECISION OPERATORS)
    FRECON, L
    CLAIRE, L
    ELECTRONICS LETTERS, 1972, 8 (08) : 191 - &
  • [8] MPFR: A multiple-precision binary floating-point library with correct rounding
    Fousse, Laurent
    Hanrot, Guillaume
    Leflvre, Vincent
    Plissier, Patrick
    Zimmermann, Paul
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2007, 33 (02):
  • [9] Energy-Efficient Multiple-Precision Floating-Point Multiplier for Embedded Applications
    Kuang, Shiann-Rong
    Wu, Kun-Yi
    Yu, Kee-Khuan
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2013, 72 (01): : 43 - 55
  • [10] Energy-Efficient Multiple-Precision Floating-Point Multiplier for Embedded Applications
    Shiann-Rong Kuang
    Kun-Yi Wu
    Kee-Khuan Yu
    Journal of Signal Processing Systems, 2013, 72 : 43 - 55