Laius: An 8-bit Fixed-point CNN Hardware Inference Engine

被引:34
|
作者
Li, Zhisheng [1 ]
Wang, Lei [1 ]
Guo, Shasha [1 ]
Deng, Yu [1 ]
Dou, Qiang [1 ]
Zhou, Haifang [1 ]
Lu, Wenyuan [2 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha, Hunan, Peoples R China
[2] Xian Satellite Monitoring & Control Ctr, Xian, Shaanxi, Peoples R China
关键词
CNN accelerator; FPGA; LeNet; Inference; Implementation;
D O I
10.1109/ISPA/IUCC.2017.00030
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Network (CNN) is one of the most effective neural network model for many classification tasks, such as voice recognition, computer vision and biological information processing. Unfortunately, Computation of CNN is both memory-intensive and computation-intensive, which brings a huge challenge to the design of the hardware accelerators. A large number of hardware accelerators for CNN inference are designed by the industry and the academia. Most of the engines are based on 32-bit floating point matrix multiplication, where the data precision is over-provisioned for inference job and the hardware cost are too high. In this paper, a 8-bit fixed-point LeNet inference engine (Laius) is designed and implemented on FPGA. In order to reduce the consumption of FPGA resource, we proposed a methodology to find the optimal bit-length for weight and bias in LeNet, which results in using 8-bit fixed point for most of the computation and using 16-bit fixed point for other computation. The PE (Processing Element) design is proposed. Pipelining and PE tiling technique is use to improve the performance of the inference engine. By theoretical analysis, we came to the conclusion that DSP resource in FPGA is the most critical resource, it should be carefully used during the design process. We implement the inference engine on Xilinx 485t FPGA. Experiment result shows that the designed LeNet inference engine can achieve 44.9 Gops throughput with 8-bit fixed-point operation after pipelining. Moreover, with only 1% loss of accuracy, the 8-bit fixed-point engine largely reduce 31.43% in latency, 87.01% in LUT consumption, 66.50% in BRAM consumption, 65.11% in DSP consumption and 47.95% reduction in power compared to a 32-bit fixed-point inference engine with the same structure.
引用
收藏
页码:143 / 150
页数:8
相关论文
共 50 条
  • [1] CNNET: A Configurable Hardware Accelerator for Efficient Inference of 8-bit Fixed-Point CNNs
    Agbalessi, Christie
    Indovina, Mark A.
    2023 IEEE 36TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE, SOCC, 2023, : 196 - 201
  • [2] Laius: an energy-efficient FPGA CNN accelerator with the support of a fixed-point training framework
    Nie, Zikai
    Li, Zhisheng
    Wang, Lei
    Guo, Shasha
    Deng, Yu
    Deng, Rangyu
    Dou, Qiang
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 21 (03) : 418 - 428
  • [3] Low-drift fixed-point 8x8 IDCT approximation with 8-bit transform factors
    Reznik, Yuriy A.
    Hsu, De
    Panda, Prasanjit
    Pillai, Br Esh
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 2877 - 2880
  • [4] Running 8-bit Dynamic Fixed-point Convolutional Neural Network on Low-cost ARM Platforms
    Peng Peng
    You Mingyu
    Xu Weisheng
    2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 4564 - 4568
  • [5] Design of 16-bit fixed-point CNN coprocessor based on FPGA
    Liang, Feng
    Yang, Yichen
    Zhang, Guohe
    Zhang, Xueliang
    Wu, Bin
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [6] Design and Comparison of 8-Bit Hybrid and Fixed Point Arithmetic Unit
    Sugumaran, Premganesh
    Naziri, Siti Zarina Md
    Ismail, Rizalafande Che
    2ND INTERNATIONAL CONFERENCE ON APPLIED PHOTONICS AND ELECTRONICS 2019 (INCAPE 2019), 2020, 2203
  • [7] Efficient Dynamic Fixed-Point Quantization of CNN Inference Accelerators for Edge Devices
    Wu, Yueh-Chi
    Huang, Chih-Tsun
    2019 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2019,
  • [8] Fixed-Point Configurable Hardware Components
    Rocher, Romuald
    Menard, Daniel
    Herve, Nicolas
    Sentieys, Olivier
    EURASIP JOURNAL ON EMBEDDED SYSTEMS, 2006, (01) : 1 - 13
  • [9] Quantizaiton for Deep Neural Network Training with 8-bit Dynamic Fixed Point
    Sakai, Yasufumi
    2020 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2020), 2020, : 126 - 130
  • [10] A Winograd-Based Highly-Parallel Convolution Engine for 8-bit CNN Acceleration
    Chen, Yong-Tai
    Ou, Yu-Feng
    Huang, Chao-Tsung
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 395 - 398