Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

被引:308
|
作者
Sharma, Hardik [1 ]
Park, Jongse [1 ]
Suda, Naveen [2 ]
Lai, Liangzhen [2 ]
Chau, Benson [1 ]
Chandra, Vikas [2 ]
Esmaeilzadeh, Hadi [3 ]
机构
[1] Georgia Inst Technol, Alternat Comp Technol ACT Lab, Atlanta, GA 30332 USA
[2] Arm Inc, Cambridge, England
[3] Univ Calif San Diego, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
Bit-Level Composability; Dynamic Composability; Deep Neural Networks; Accelerators; DNN; Convolutional Neural Networks; CNN; Long Short-Term Memory; LSTM; Recurrent Neural Networks; RNN; Quantization; Bit Fusion; Bit Brick;
D O I
10.1109/ISCA.2018.00069
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensity. Fully realizing the potential of acceleration in this domain requires understanding and leveraging algorithmic properties of DNNs. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent loss of accuracy, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of Bit Fusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss [1] and Stripes [2]. In the same area, frequency, and process technology, Bit Fusion offers 3.9x speedup and 5.1 x energy savings over Eyeriss. Compared to Stripes, Bit Fusion provides 2.6x speedup and 3.9x energy reduction at 45 nm node when Bit Fusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, Bit Fusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while Bit Fusion merely consumes 895 milliwatts of power.
引用
收藏
页码:764 / 775
页数:12
相关论文
共 50 条
  • [21] A MODULO BIT-LEVEL SYSTOLIC COMPILER
    JULLIEN, GA
    BANDYOPADHYAY, S
    MILLER, WC
    FROST, R
    1989 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, 1989, : 457 - 460
  • [22] Bit-level stopping of turbo decoding
    Kim, Dong Ho
    Kim, Sang Wu
    IEEE COMMUNICATIONS LETTERS, 2006, 10 (03) : 183 - 185
  • [23] Accelerating matrix-centric graph processing on GPUs through bit-level optimizations
    Chen, Jou-An
    Sung, Hsin-Hsuan
    Shen, Xipeng
    Tallent, Nathan
    Barker, Kevin
    Li, Ang
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 177 : 53 - 67
  • [24] Flexible-width Bit-level Compressor for Convolutional Neural Network
    Zhu, Junhan
    Chen, Xiaoliang
    Du, Li
    Geng, Haoran
    Bai, Yichuan
    Li, Yuandong
    Du, Yuan
    Wang, Zhongfeng
    2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
  • [25] Designing Efficient Bit-Level Sparsity-Tolerant Memristive Networks
    Lyu, Bo
    Wen, Shiping
    Yang, Yin
    Chang, Xiaojun
    Sun, Junwei
    Chen, Yiran
    Huang, Tingwen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (09) : 11979 - 11988
  • [26] Bit Efficient Quantization for Deep Neural Networks
    Nayak, Prateeth
    Zhang, David
    Chai, Sek
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 52 - 56
  • [27] MODIFIED BIT-LEVEL SYSTOLIC INNER PRODUCT/CONVOLVER ARCHITECTURE WITH INCREASED THROUGHPUT
    EVANS, RA
    EAMES, R
    ELECTRONICS LETTERS, 1987, 23 (09) : 460 - 461
  • [28] Model for a CMOS bit-level product cell
    Gonzalez-Navarro, Yesenia E.
    Gomez-Castaneda, Felipe
    Moreno-Cadenas, Jose A.
    Flores-Nava, Luis M.
    Arellano-Cardenas, Oliverio
    2007 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING, 2007, : 155 - 158
  • [29] 4-Bit Serial-Parallel Multiplier and Bit-Level Systolic Architecture for Implementation of Discrete Orthogonal Transforms
    Murty, M. N.
    Nayak, S. S.
    Padhy, B.
    Panda, S. N.
    GLOBAL TRENDS IN COMPUTING AND COMMUNICATION SYSTEMS, PT 1, 2012, 269 : 91 - +
  • [30] Bit-level architectures for Montgomery's multiplication
    Nibouche, O
    Bouridane, A
    Nibouche, M
    ICECS 2001: 8TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-III, CONFERENCE PROCEEDINGS, 2001, : 273 - 276