Efficient Hardware Optimization Strategies for Deep Neural Networks Acceleration Chip

被引:0
|
作者
Zhang Meng [1 ]
Zhang Jingwei [1 ]
Li Guoqing [1 ]
Wu Ruixia [1 ]
Zeng Xiaoyang [2 ]
机构
[1] Southeast Univ, Sch Elect Sci & Engn, Natl ASIC Engn Ctr, Nanjing 210096, Peoples R China
[2] Fudan Univ, Natl ASIC Key Lab, Shanghai 200433, Peoples R China
基金
国家重点研发计划;
关键词
Deep Neural Networks (DNN); Object detection; Neural network accelerator; Low power consumption; Hardware optimization;
D O I
10.11999/JEIT210002
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Lightweight neural networks deployed on low-power platforms have proven to be effective solutions for Artificial Intelligence (AI) and Internet Of Things (IOT) domains such as Unmanned Aerial Vehicle (UAV) detection and unmanned driving. However, in the case of limited resources, it is very challenging to build Deep Neural Networks (DNN) accelerator with both high precision and low delay. In this paper, a series of efficient hardware optimization strategies are proposed, including stackable shared Processing Engine (PE) to balance the inconsistency of data reuse and memory access patterns in different convolutions; Regulable loop parallelism and channel augmentation are proposed to increase effectively the access bandwidth between accelerator and external memory. It also improve the efficiency of DNN shallow layers computing; Pre-Workflow is applied to improve the overall parallelism of heterogeneous systems. Verified by Xilinx Ultra96 V2 board, the hardware optimization strategies in this paper improve effectively the design of DNN acceleration chips like iSmart3-SkyNet and SkrSkr-SkyNet. The results show that the optimized accelerator processes 78.576 frames per second, and the power consumption of each picture is 0.068 Joules.
引用
收藏
页码:1510 / 1517
页数:8
相关论文
共 17 条
  • [1] DONG Zhen, 2020, ARXIV200608357
  • [2] A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA
    Fan, Hongxiang
    Liu, Shuanglong
    Ferianc, Martin
    Ng, Ho-Cheung
    Que, Zhiqiang
    Liu, Shen
    Niu, Xinyu
    Luk, Wayne
    [J]. 2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT 2018), 2018, : 17 - 24
  • [3] Hao Chan Jia, 2019, The Interpreter, P1
  • [4] JIANG W, 2020, SKRSKR DACSDC 2020 2
  • [5] A System-Level Solution for Low-Power Object Detection
    Li, Fanrong
    Mo, Zitao
    Wang, Peisong
    Liu, Zejian
    Zhang, Jiayun
    Li, Gang
    Hu, Qinghao
    He, Xiangyu
    Leng, Cong
    Zhang, Yang
    Cheng, Jian
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2461 - 2468
  • [6] A High Performance FPGA-based Accelerator for Large-Scale Convolutional Neural Networks
    Li, Huimin
    Fan, Xitian
    Jiao, Li
    Cao, Wei
    Zhou, Xuegong
    Wang, Lingli
    [J]. 2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
  • [7] Motamedi M, 2016, ASIA S PACIF DES AUT, P575, DOI 10.1109/ASPDAC.2016.7428073
  • [8] You Only Look Once: Unified, Real-Time Object Detection
    Redmon, Joseph
    Divvala, Santosh
    Girshick, Ross
    Farhadi, Ali
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 779 - 788
  • [9] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
    Ren, Shaoqing
    He, Kaiming
    Girshick, Ross
    Sun, Jian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) : 1137 - 1149
  • [10] Tan M., 2020, P IEEE CVF C COMP VI, P10781, DOI DOI 10.48550/ARXIV.1911.09070