Laius: An 8-bit Fixed-point CNN Hardware Inference Engine

被引:34
|
作者
Li, Zhisheng [1 ]
Wang, Lei [1 ]
Guo, Shasha [1 ]
Deng, Yu [1 ]
Dou, Qiang [1 ]
Zhou, Haifang [1 ]
Lu, Wenyuan [2 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha, Hunan, Peoples R China
[2] Xian Satellite Monitoring & Control Ctr, Xian, Shaanxi, Peoples R China
关键词
CNN accelerator; FPGA; LeNet; Inference; Implementation;
D O I
10.1109/ISPA/IUCC.2017.00030
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Network (CNN) is one of the most effective neural network model for many classification tasks, such as voice recognition, computer vision and biological information processing. Unfortunately, Computation of CNN is both memory-intensive and computation-intensive, which brings a huge challenge to the design of the hardware accelerators. A large number of hardware accelerators for CNN inference are designed by the industry and the academia. Most of the engines are based on 32-bit floating point matrix multiplication, where the data precision is over-provisioned for inference job and the hardware cost are too high. In this paper, a 8-bit fixed-point LeNet inference engine (Laius) is designed and implemented on FPGA. In order to reduce the consumption of FPGA resource, we proposed a methodology to find the optimal bit-length for weight and bias in LeNet, which results in using 8-bit fixed point for most of the computation and using 16-bit fixed point for other computation. The PE (Processing Element) design is proposed. Pipelining and PE tiling technique is use to improve the performance of the inference engine. By theoretical analysis, we came to the conclusion that DSP resource in FPGA is the most critical resource, it should be carefully used during the design process. We implement the inference engine on Xilinx 485t FPGA. Experiment result shows that the designed LeNet inference engine can achieve 44.9 Gops throughput with 8-bit fixed-point operation after pipelining. Moreover, with only 1% loss of accuracy, the 8-bit fixed-point engine largely reduce 31.43% in latency, 87.01% in LUT consumption, 66.50% in BRAM consumption, 65.11% in DSP consumption and 47.95% reduction in power compared to a 32-bit fixed-point inference engine with the same structure.
引用
收藏
页码:143 / 150
页数:8
相关论文
共 50 条
  • [41] IBM MWAVE FAMILY 16-BIT FIXED-POINT DSP
    不详
    EDN, 1995, 40 (10) : 63 - 63
  • [42] A Fixed-Point Neural Network For Keyword Detection on Resource Constrained Hardware
    Shah, Mohit
    Wang, Jingcheng
    Blaauw, David
    Sylvester, Dennis
    Kim, Hun-Seok
    Chakrabarti, Chaitali
    2015 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2015), 2015,
  • [43] Accelerating Fixed-Point Simulations Using Width Reconfigurable Hardware Architectures
    Shahin, Keyvan
    Huebner, Michael
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 275 - 276
  • [44] FPGA hardware linear regression implementation using fixed-point arithmetic
    Pedrobon Ferreira, Willian de Assis
    Grout, Ian
    Rodrigues da Silva, Alexandre Cesar
    2019 32ND SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2019), 2019,
  • [45] Training Deep Neural Networks with 8-bit Floating Point Numbers
    Wang, Naigang
    Choi, Jungwook
    Brand, Daniel
    Chen, Chia-Yu
    Gopalakrishnan, Kailash
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [46] IMPLEMENTATION OF MULTIBYTE FLOATING POINT ARITHMETIC IN 8-BIT MICROPROCESSOR.
    Dutta, Uma
    Bhattacharya, Debjani
    Sarma, A.Das
    Mechanical engineering bulletin, 1986, 17 (03): : 104 - 113
  • [47] Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit
    Langroudi, Seyed H. F.
    Pandit, Tej
    Kudithipudi, Dhireesha
    2018 1ST WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING FOR EMBEDDED APPLICATIONS (EMC2), 2018, : 19 - 23
  • [48] A Fixed-Point Operator for Inference in Variational Bayesian Latent Gaussian Models
    Sheth, Rishit
    Khardon, Roni
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 761 - 769
  • [49] Energy Efficient Fixed-point Inference System of Convolutional Neural Network
    Lo, Chun Yan
    Sham, Chiu-Wing
    2020 IEEE 63RD INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2020, : 403 - 406
  • [50] CANET: Quantized Neural Network Inference With 8-bit Carry-Aware Accumulator
    Yang, Jingxuan
    Wang, Xiaoqin
    Jiang, Yiying
    IEEE ACCESS, 2024, 12 : 38765 - 38772