Laius: An 8-bit Fixed-point CNN Hardware Inference Engine

被引:34
|
作者
Li, Zhisheng [1 ]
Wang, Lei [1 ]
Guo, Shasha [1 ]
Deng, Yu [1 ]
Dou, Qiang [1 ]
Zhou, Haifang [1 ]
Lu, Wenyuan [2 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha, Hunan, Peoples R China
[2] Xian Satellite Monitoring & Control Ctr, Xian, Shaanxi, Peoples R China
关键词
CNN accelerator; FPGA; LeNet; Inference; Implementation;
D O I
10.1109/ISPA/IUCC.2017.00030
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Network (CNN) is one of the most effective neural network model for many classification tasks, such as voice recognition, computer vision and biological information processing. Unfortunately, Computation of CNN is both memory-intensive and computation-intensive, which brings a huge challenge to the design of the hardware accelerators. A large number of hardware accelerators for CNN inference are designed by the industry and the academia. Most of the engines are based on 32-bit floating point matrix multiplication, where the data precision is over-provisioned for inference job and the hardware cost are too high. In this paper, a 8-bit fixed-point LeNet inference engine (Laius) is designed and implemented on FPGA. In order to reduce the consumption of FPGA resource, we proposed a methodology to find the optimal bit-length for weight and bias in LeNet, which results in using 8-bit fixed point for most of the computation and using 16-bit fixed point for other computation. The PE (Processing Element) design is proposed. Pipelining and PE tiling technique is use to improve the performance of the inference engine. By theoretical analysis, we came to the conclusion that DSP resource in FPGA is the most critical resource, it should be carefully used during the design process. We implement the inference engine on Xilinx 485t FPGA. Experiment result shows that the designed LeNet inference engine can achieve 44.9 Gops throughput with 8-bit fixed-point operation after pipelining. Moreover, with only 1% loss of accuracy, the 8-bit fixed-point engine largely reduce 31.43% in latency, 87.01% in LUT consumption, 66.50% in BRAM consumption, 65.11% in DSP consumption and 47.95% reduction in power compared to a 32-bit fixed-point inference engine with the same structure.
引用
收藏
页码:143 / 150
页数:8
相关论文
共 50 条
  • [31] AN 8-BIT SINGLE-CHIP MICROCOMPUTER FOR AUTOMOTIVE ENGINE CONTROL
    KATORI, S
    IWASAKI, J
    MAEHASHI, Y
    MICROPROCESSORS AND MICROSYSTEMS, 1982, 6 (07) : 347 - 353
  • [32] Efficient Neural Image Decoding via Fixed-Point Inference
    Hong, Weixin
    Chen, Tong
    Lu, Ming
    Pu, Shiliang
    Ma, Zhan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (09) : 3618 - 3630
  • [33] Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference
    Moreau, Thierry
    Chen, Tianqi
    Ceze, Luis
    1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING, 2018,
  • [34] Hardware-software partition of fixed-point hardware accelerator from statistical perspective
    Zhou, F
    Yang, J
    Shi, LX
    Zhang, Y
    2005 6th International Conference on ASIC Proceedings, Books 1 and 2, 2005, : 148 - 151
  • [35] Toward secured IoT devices: a shuffled 8-bit AES hardware implementation
    Harcha, Ghita
    Lapotre, Vianney
    Chavet, Cyrille
    Coussy, Philippe
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [36] Analytical Optimization of Bit-Widths in Fixed-Point LTI Systems
    Sarbishei, Omid
    Radecka, Katarzyna
    Zilic, Zeljko
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (03) : 343 - 355
  • [37] Unifying bit-width optimisation for fixed-point and floating-point designs
    Gaffar, AA
    Mencer, O
    Luk, W
    Cheung, PYK
    12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, : 79 - 88
  • [38] A 64-bit orthorectification algorithm using fixed-point arithmetic
    French, Joseph C.
    Balster, Eric J.
    Turri, William F.
    HIGH-PERFORMANCE COMPUTING IN REMOTE SENSING III, 2013, 8895
  • [39] Bit Accurate Roundoff Noise Analysis of Fixed-Point Linear Controllers
    Helaire, Thibault
    Menard, Daniel
    Sentieys, Olivier
    2008 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-AIDED CONTROL SYSTEM DESIGN, 2008, : 183 - 188
  • [40] Optimization of the 24-Bit Fixed-Point Format for the Laplacian Source
    Peric, Zoran H.
    Dincic, Milan R.
    MATHEMATICS, 2023, 11 (03)