Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA

被引:53
|
作者
Wang, Junsong [1 ]
Lou, Qiuwen [2 ]
Zhang, Xiaofan [3 ]
Zhu, Chao [1 ]
Lin, Yonghua [1 ]
Chen, Deming [3 ]
机构
[1] IBM Res China, Beijing, Peoples R China
[2] Univ Notre Dame, Notre Dame, IN 46556 USA
[3] Univ Illinois, Champaign, IL USA
关键词
D O I
10.1109/FPL.2018.00035
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Neural network accelerators with low latency and low energy consumption are desirable for edge computing. To create such accelerators, we propose a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes. This flow covers both network training and FPGA-based network deployment, which facilitates the design space exploration and simplifies the tradeoff between network accuracy and computation efficiency. Using this flow helps hardware designers to deliver a network accelerator in edge devices under strict resource and power constraints. We present the proposed flow by supporting hybrid ELB settings within a neural network. Results show that our design can deliver very high performance peaking at 103 TOPS and classify up to 325.3 image/s/watt while running large-scale neural networks for less than 5W using embedded FPGA. To the best of our knowledge, it is the most energy efficient solution in comparison to GPU or other FPGA implementations reported so far in the literature.
引用
收藏
页码:163 / 169
页数:7
相关论文
共 41 条
  • [21] Low-power optimization by smart bit-width allocation in a SystemC-based ASIC design environment
    Mallik, Arindam
    Sinha, Debjit
    Banerjee, Prithviraj
    Zhou, Hai
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2007, 26 (03) : 447 - 455
  • [22] Accelerating the neural network controller embedded implementation on FPGA with novel dropout techniques for a solar inverter
    Sturtz, Jordan
    Surendranath, Kushal Kalyan Devalampeta
    Sam, Maxwell
    Fu, Xingang
    Hingu, Chanakya Dinesh
    Challoo, Rajab
    Qingge, Letu
    [J]. PERVASIVE AND MOBILE COMPUTING, 2024, 104
  • [23] Design of High Performance Convolutional Neural Network Accelerator for Embedded FPGA
    Zeng, Chenglong
    Liu, Qiang
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (09): : 1645 - 1652
  • [24] Low-Power Design Methodology of Voltage Over-Scalable Circuit with Critical Path Isolation and Bit-Width Scaling
    Masuda, Yutaka
    Nagayama, Jun
    Cheng, TaiYu
    Ishihara, Tohru
    Momiyama, Yoichi
    Hashimoto, Masanori
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2022, E105A (03) : 509 - 517
  • [25] Minimalist Design for Accelerating Convolutional Neural Networks for Low-end FPGA platforms
    Morcel, Raghid
    Akkary, Haitham
    Hajj, Hazem
    Saghir, Mazen
    Keshavamurthy, Anil
    Khanna, Rahul
    Artail, Hassan
    [J]. 2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 196 - 196
  • [26] Hybrid neural network design and implementation on FPGA for infant cry recognition
    Suaste-Rivas, Israel
    Diaz-Mendez, Alejandro
    Reyes-Garcia, Carlos A.
    Reyes-Galaviz, Orion F.
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 703 - 709
  • [27] An FPGA-based Hybrid Neural Network accelerator for embedded satellite image classification
    Lemaire, Edgar
    Moretti, Matthieu
    Daniel, Lionel
    Miramond, Benoit
    Millet, Philippe
    Feresin, Frederic
    Bilavarn, Sebastien
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [28] Extremely Low-bit Convolution Optimization for Quantized Neural Network on Modern Computer Architectures
    Han, Qingchang
    Hu, Yongmin
    Yu, Fengwei
    Yang, Hailong
    Liu, Bing
    Hu, Peng
    Gong, Ruihao
    Wang, Yanfei
    Wang, Rui
    Luan, Zhongzhi
    Qian, Depei
    [J]. PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [29] Performance Analysis of Bit-Width Reduced Floating-Point Arithmetic Units in FPGAs: A Case Study of Neural Network-Based Face Detector
    Lee, Yongsoon
    Choi, Younhee
    Ko, Seok-Bum
    Lee, Moon Ho
    [J]. EURASIP JOURNAL ON EMBEDDED SYSTEMS, 2009, (01)
  • [30] Overflow Aware Quantization: Accelerating Neural Network Inference by Low-bit Multiply-Accumulate Operations
    Xie, Hongwei
    Song, Yafei
    Cai, Ling
    Li, Mingyang
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 868 - 875