Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

被引:308
|
作者
Sharma, Hardik [1 ]
Park, Jongse [1 ]
Suda, Naveen [2 ]
Lai, Liangzhen [2 ]
Chau, Benson [1 ]
Chandra, Vikas [2 ]
Esmaeilzadeh, Hadi [3 ]
机构
[1] Georgia Inst Technol, Alternat Comp Technol ACT Lab, Atlanta, GA 30332 USA
[2] Arm Inc, Cambridge, England
[3] Univ Calif San Diego, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
Bit-Level Composability; Dynamic Composability; Deep Neural Networks; Accelerators; DNN; Convolutional Neural Networks; CNN; Long Short-Term Memory; LSTM; Recurrent Neural Networks; RNN; Quantization; Bit Fusion; Bit Brick;
D O I
10.1109/ISCA.2018.00069
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensity. Fully realizing the potential of acceleration in this domain requires understanding and leveraging algorithmic properties of DNNs. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent loss of accuracy, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of Bit Fusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss [1] and Stripes [2]. In the same area, frequency, and process technology, Bit Fusion offers 3.9x speedup and 5.1 x energy savings over Eyeriss. Compared to Stripes, Bit Fusion provides 2.6x speedup and 3.9x energy reduction at 45 nm node when Bit Fusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, Bit Fusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while Bit Fusion merely consumes 895 milliwatts of power.
引用
收藏
页码:764 / 775
页数:12
相关论文
共 50 条
  • [1] Optimize Dataflow of DNN on Bit-Level Composable Architecture
    Gao, Hanyuan
    Gong, Lei
    Wang, Teng
    Computer Engineering and Applications, 60 (18): : 147 - 157
  • [2] Bit-Beading: Stringing bit-level MAC results for Accelerating Neural Networks
    Anwar, Zeeshan
    Longchar, Imlijungla
    Kapoor, Hemangee K.
    PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 216 - 221
  • [3] BIT-LEVEL SYNCHRONIZATION IN MICROPROCESSOR NETWORKS
    SINTONEN, L
    UOTILA, P
    IEE PROCEEDINGS-E COMPUTERS AND DIGITAL TECHNIQUES, 1981, 128 (03): : 103 - 106
  • [4] An improved architecture for bit-level matrix multiplication
    Grover, RS
    Shang, WJ
    Li, Q
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2257 - 2264
  • [5] Approach to Improve the Performance Using Bit-level Sparsity in Neural Networks
    Kang, Yesung
    Kwon, Eunji
    Lee, Seunggyu
    Byun, Younghoon
    Lee, Youngjoo
    Kang, Seokhyeong
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1516 - 1521
  • [6] DeepRecon: Dynamically Reconfigurable Architecture for Accelerating Deep Neural Networks
    Rzayev, Tayyar
    Moradi, Saber
    Albonesi, David H.
    Manohar, Rajit
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 116 - 124
  • [7] AN IMPROVED BIT-LEVEL SYSTOLIC ARCHITECTURE FOR IIR FILTERING
    KNOWLES, SC
    MCWHIRTER, JG
    SYSTOLIC ARRAY PROCESSORS, 1989, : 205 - 214
  • [8] Exploiting neural networks bit-level redundancy to mitigate the impact of faults at inference
    Izan Catalán
    José Flich
    Carles Hernández
    The Journal of Supercomputing, 2025, 81 (1)
  • [9] Bit-Balance: Model-Hardware Codesign for Accelerating NNs by Exploiting Bit-Level Sparsity
    Sun, Wenhao
    Zou, Zhiwei
    Liu, Deng
    Sun, Wendi
    Chen, Song
    Kang, Yi
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (01) : 152 - 163
  • [10] Accelerating Fully Homomorphic Encryption by Bridging Modular and Bit-Level Arithmetic
    Chielle, Eduardo
    Mazonka, Oleg
    Gamil, Homer
    Maniatakos, Michail
    2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,