New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference

被引：25

作者：

Zhang, Hao ^{[1
]}

Chen, Dongdong ^{[2
]}

Ko, Seok-Bum ^{[1
]}

机构：

[1] Univ Saskatchewan, Dept Elect & Comp Engn, Saskatoon, SK S7N 5A2, Canada

[2] Intel Corp, San Jose, CA 95134 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2020年 / 69卷 / 01期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Neural networks; Standards; Deep learning; Training; Hardware; Adders; Pipelines; Multiply-accumulate unit; multiple-precision arithmetic; flexible precision arithmetic; deep neural network computing; computer arithmetic; ADD;

D O I：

10.1109/TC.2019.2936192

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, a new flexible multiple-precision multiply-accumulate (MAC) unit is proposed for deep neural network training and inference. The proposed MAC unit supports both fixed-point operations and floating-point operations. For floating-point format, the proposed unit supports one 16-bit MAC operation or sum of two 8-bit multiplications plus a 16-bit addend. To make the proposed MAC unit more versatile, the bit-width of exponent and mantissa can be flexibly exchanged. By setting the bit-width of exponent to zero, the proposed MAC unit also supports fixed-point operations. For fixed-point format, the proposed unit supports one 16-bit MAC or sum of two 8-bit multiplications plus a 16-bit addend. Moreover, the proposed unit can be further divided to support sum of four 4-bit multiplications plus a 16-bit addend. At the lowest precision, the proposed MAC unit supports accumulating of eight 1-bit logic AND operations to enable the support of binary neural networks. Compared to the standard 16-bit half-precision MAC unit, the proposed MAC unit provides more flexibility with only 21.8 percent area overhead. Compared to a standard 32-bit single-precision MAC unit, the proposed MAC unit requires much less hardware cost but still provides 8-bit exponent in the numerical format to maintain large dynamic range for deep learning computing.

引用

下载

页码：26 / 38

页数：13

共 50 条

[1] Low-Complexity Precision-Scalable Multiply-Accumulate Unit Architectures for Deep Neural Network Accelerators
Li, Wenjie
Hu, Aokun
Wang, Gang
Xu, Ningyi
He, Guanghui
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (04) : 1610 - 1614
[2] A Reconfigurable Fused Multiply-Accumulate For Miscellaneous Operators in Deep Neural Network
Lei, Lei
Chen, Zhiming
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[3] Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing
Camusy, Vincent
Meiy, Linyan
Enz, Christian
Verhelst, Marian
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) : 697 - 711
[4] Survey of Precision-Scalable Multiply-Accumulate Units for Neural-Network Processing
Camus, Vincent
Enz, Christian
Verhelst, Marian
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 57 - 61
[5] A Posit Based Multiply-accumulate Unit with Small Quire Size for Deep Neural Networks
Nakahara Y.
Masuda Y.
Kiyama M.
Amagasaki M.
Iida M.
IPSJ Transactions on System LSI Design Methodology, 2022, 15 : 16 - 19
[6] New design of an RSFQ parallel multiply-accumulate unit
Kataeva, Irina
Engseth, Henrik
Kidiyarova-Shevchenko, Anna
SUPERCONDUCTOR SCIENCE & TECHNOLOGY, 2006, 19 (05): : S381 - S386
[7] Efficient Posit Multiply-Accumulate Unit Generator for Deep Learning Applications
Zhang, Hao
He, Jiongrui
Ko, Seok-Bum
2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
[8] Mixing Low-Precision Formats in Multiply-Accumulate Units for DNN Training
Tatsumi, Mariko
Filip, Silviu-Ioan
White, Caroline
Sentieys, Olivier
Lemieux, Guy
2022 21ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2022), 2022, : 28 - 36
[9] Overflow Aware Quantization: Accelerating Neural Network Inference by Low-bit Multiply-Accumulate Operations
Xie, Hongwei
Song, Yafei
Cai, Ling
Li, Mingyang
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 868 - 875
[10] Sensitivity-Based Error Resilient Techniques With Heterogeneous Multiply-Accumulate Unit for Voltage Scalable Deep Neural Network Accelerators
Shin, Dongyeob
Choi, Wonseok
Park, Jongsun
Ghosh, Swaroop
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (03) : 520 - 531

← 1 2 3 4 5 →