New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference

被引:25
|
作者
Zhang, Hao [1 ]
Chen, Dongdong [2 ]
Ko, Seok-Bum [1 ]
机构
[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada
[2] Intel Corp, San Jose, CA 95134 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Neural networks; Standards; Deep learning; Training; Hardware; Adders; Pipelines; Multiply-accumulate unit; multiple-precision arithmetic; flexible precision arithmetic; deep neural network computing; computer arithmetic; ADD;
D O I
10.1109/TC.2019.2936192
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a new flexible multiple-precision multiply-accumulate (MAC) unit is proposed for deep neural network training and inference. The proposed MAC unit supports both fixed-point operations and floating-point operations. For floating-point format, the proposed unit supports one 16-bit MAC operation or sum of two 8-bit multiplications plus a 16-bit addend. To make the proposed MAC unit more versatile, the bit-width of exponent and mantissa can be flexibly exchanged. By setting the bit-width of exponent to zero, the proposed MAC unit also supports fixed-point operations. For fixed-point format, the proposed unit supports one 16-bit MAC or sum of two 8-bit multiplications plus a 16-bit addend. Moreover, the proposed unit can be further divided to support sum of four 4-bit multiplications plus a 16-bit addend. At the lowest precision, the proposed MAC unit supports accumulating of eight 1-bit logic AND operations to enable the support of binary neural networks. Compared to the standard 16-bit half-precision MAC unit, the proposed MAC unit provides more flexibility with only 21.8 percent area overhead. Compared to a standard 32-bit single-precision MAC unit, the proposed MAC unit requires much less hardware cost but still provides 8-bit exponent in the numerical format to maintain large dynamic range for deep learning computing.
引用
下载
收藏
页码:26 / 38
页数:13
相关论文
共 50 条
  • [1] Low-Complexity Precision-Scalable Multiply-Accumulate Unit Architectures for Deep Neural Network Accelerators
    Li, Wenjie
    Hu, Aokun
    Wang, Gang
    Xu, Ningyi
    He, Guanghui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (04) : 1610 - 1614
  • [2] A Reconfigurable Fused Multiply-Accumulate For Miscellaneous Operators in Deep Neural Network
    Lei, Lei
    Chen, Zhiming
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [3] Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing
    Camusy, Vincent
    Meiy, Linyan
    Enz, Christian
    Verhelst, Marian
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) : 697 - 711
  • [4] Survey of Precision-Scalable Multiply-Accumulate Units for Neural-Network Processing
    Camus, Vincent
    Enz, Christian
    Verhelst, Marian
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 57 - 61
  • [5] A Posit Based Multiply-accumulate Unit with Small Quire Size for Deep Neural Networks
    Nakahara Y.
    Masuda Y.
    Kiyama M.
    Amagasaki M.
    Iida M.
    IPSJ Transactions on System LSI Design Methodology, 2022, 15 : 16 - 19
  • [6] New design of an RSFQ parallel multiply-accumulate unit
    Kataeva, Irina
    Engseth, Henrik
    Kidiyarova-Shevchenko, Anna
    SUPERCONDUCTOR SCIENCE & TECHNOLOGY, 2006, 19 (05): : S381 - S386
  • [7] Efficient Posit Multiply-Accumulate Unit Generator for Deep Learning Applications
    Zhang, Hao
    He, Jiongrui
    Ko, Seok-Bum
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [8] Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training
    Tatsumi, Mariko
    Filip, Silviu-Ioan
    White, Caroline
    Sentieys, Olivier
    Lemieux, Guy
    2022 21ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2022), 2022, : 28 - 36
  • [9] Overflow Aware Quantization: Accelerating Neural Network Inference by Low-bit Multiply-Accumulate Operations
    Xie, Hongwei
    Song, Yafei
    Cai, Ling
    Li, Mingyang
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 868 - 875
  • [10] Sensitivity-Based Error Resilient Techniques With Heterogeneous Multiply-Accumulate Unit for Voltage Scalable Deep Neural Network Accelerators
    Shin, Dongyeob
    Choi, Wonseok
    Park, Jongsun
    Ghosh, Swaroop
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (03) : 520 - 531