Efficient Hardware Optimization Strategies for Deep Neural Networks Acceleration Chip

被引：0

作者：

Zhang Meng ^{[1
]}

Zhang Jingwei ^{[1
]}

Li Guoqing ^{[1
]}

Wu Ruixia ^{[1
]}

Zeng Xiaoyang ^{[2
]}

机构：

[1] Southeast Univ, Sch Elect Sci & Engn, Natl ASIC Engn Ctr, Nanjing 210096, Peoples R China

[2] Fudan Univ, Natl ASIC Key Lab, Shanghai 200433, Peoples R China

来源：

JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY | 2021年 / 43卷 / 06期

基金：

国家重点研发计划;

关键词：

Deep Neural Networks (DNN); Object detection; Neural network accelerator; Low power consumption; Hardware optimization;

D O I：

10.11999/JEIT210002

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Lightweight neural networks deployed on low-power platforms have proven to be effective solutions for Artificial Intelligence (AI) and Internet Of Things (IOT) domains such as Unmanned Aerial Vehicle (UAV) detection and unmanned driving. However, in the case of limited resources, it is very challenging to build Deep Neural Networks (DNN) accelerator with both high precision and low delay. In this paper, a series of efficient hardware optimization strategies are proposed, including stackable shared Processing Engine (PE) to balance the inconsistency of data reuse and memory access patterns in different convolutions; Regulable loop parallelism and channel augmentation are proposed to increase effectively the access bandwidth between accelerator and external memory. It also improve the efficiency of DNN shallow layers computing; Pre-Workflow is applied to improve the overall parallelism of heterogeneous systems. Verified by Xilinx Ultra96 V2 board, the hardware optimization strategies in this paper improve effectively the design of DNN acceleration chips like iSmart3-SkyNet and SkrSkr-SkyNet. The results show that the optimized accelerator processes 78.576 frames per second, and the power consumption of each picture is 0.068 Joules.

引用

页码：1510 / 1517

页数：8

共 17 条

[1] DONG Zhen, 2020, ARXIV200608357
[2] A Real-Time Object Detection Accelerator with Compressed SSDLite on FPGA
Fan, Hongxiang
Liu, Shuanglong
Ferianc, Martin
Ng, Ho-Cheung
Que, Zhiqiang
Liu, Shen
Niu, Xinyu
Luk, Wayne
[J]. 2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT 2018), 2018, : 17 - 24
[3] Hao Chan Jia, 2019, The Interpreter, P1
[4] JIANG W, 2020, SKRSKR DACSDC 2020 2
[5] A System-Level Solution for Low-Power Object Detection
Li, Fanrong
Mo, Zitao
Wang, Peisong
Liu, Zejian
Zhang, Jiayun
Li, Gang
Hu, Qinghao
He, Xiangyu
Leng, Cong
Zhang, Yang
Cheng, Jian
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2461 - 2468
[6] A High Performance FPGA-based Accelerator for Large-Scale Convolutional Neural Networks
Li, Huimin
Fan, Xitian
Jiao, Li
Cao, Wei
Zhou, Xuegong
Wang, Lingli
[J]. 2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
[7] Motamedi M, 2016, ASIA S PACIF DES AUT, P575, DOI 10.1109/ASPDAC.2016.7428073
[8] You Only Look Once: Unified, Real-Time Object Detection
Redmon, Joseph
Divvala, Santosh
Girshick, Ross
Farhadi, Ali
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 779 - 788
[9] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Ren, Shaoqing
He, Kaiming
Girshick, Ross
Sun, Jian
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) : 1137 - 1149
[10] Tan M., 2020, P IEEE CVF C COMP VI, P10781, DOI DOI 10.48550/ARXIV.1911.09070

← 1 2 →