Design and Implementation of Convolutional Neural Networks Accelerator Based on Multidie

被引:4
|
作者
Song, Qingzeng [1 ]
Zhang, Jiabing [1 ]
Sun, Liankun [1 ]
Jin, Guanghao [2 ]
机构
[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin 300387, Peoples R China
[2] Beijing Polytech, Sch Telecommun Engn, Beijing 100176, Peoples R China
来源
IEEE ACCESS | 2022年 / 10卷
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Quantization (signal); Object detection; Field programmable gate arrays; Mathematical models; Hardware acceleration; Hardware accelerator; multi-die; object detection; YOLOv4-tiny;
D O I
10.1109/ACCESS.2022.3199441
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To achieve real-time object detection tasks with high throughput and low latency, this paper proposes a multi-die hardware accelerator architecture. It implements three accelerators on the VU9P chip, each of which is bound to an independent super logic region (SLR). To reduce off-chip memory access and power consumption, this design uses three on-chip buffers to store the weights and intermediate result data on one hand; on the other hand, it minimizes data access and movement and maximizes data reuse. This design uses an 8-bit quantization strategy for both weights and feature maps to achieve twice the throughput and computational efficiency of a single digital signal processor (DSP). In addition, many operators are designed in the accelerator, and all of them are fully parameterized, so it is easy to extend the network, and the control of the accelerator can be realized by configuring the instruction group. By accelerating the YOLOv4-tiny algorithm, the accelerator architecture can achieve a frame rate of 148.14 frames per second (FPS) and a peak throughput of 2.76 tera operations per second (TOPS) at 200 MHz with an energy efficiency ratio of 93.15 GOPS/W. The code can be found at https://github.com/19801201/Verilog_CNN_Accelerator.
引用
收藏
页码:91497 / 91508
页数:12
相关论文
共 50 条
  • [1] An FPGA-based Accelerator Implementation for Deep Convolutional Neural Networks
    Zhou, Yongmei
    Jiang, Jingfei
    [J]. PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 829 - 832
  • [2] Design of Convolutional Neural Networks Accelerator Based on Fast Filter Algorithm
    Wang, Wei
    Zhou, Kaili
    Wang, Yichang
    Wang, Guang
    Yuan, Jun
    [J]. Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2019, 41 (11): : 2578 - 2584
  • [3] Design of Convolutional Neural Networks Accelerator Based on Fast Filter Algorithm
    Wang Wei
    Zhou Kaili
    Wang Yichang
    Wang Guang
    Yuan Jun
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (11) : 2578 - 2584
  • [4] A FIFO BASED ACCELERATOR FOR CONVOLUTIONAL NEURAL NETWORKS
    Panchbhaiyye, Vineet
    Ogunfunmi, Tokunbo
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1758 - 1762
  • [5] Design and Implementation of a Universal Shift Convolutional Neural Network Accelerator
    Song, Qingzeng
    Cui, Weizhi
    Sun, Liankun
    Jin, Guanghao
    [J]. IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (01) : 17 - 20
  • [6] Convolutional Neural Networks Inference Accelerator Design using Selective Convolutional Layer
    Huang, Tzu-Huan
    Goh, Emil
    Wey, I-Chyn
    Teo, T. Hui
    [J]. 2023 IEEE 16TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP, MCSOC, 2023, : 166 - 170
  • [7] An Efficient FIFO Based Accelerator for Convolutional Neural Networks
    Panchbhaiyye, Vineet
    Ogunfunmi, Tokunbo
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (10): : 1117 - 1129
  • [8] An Efficient FIFO Based Accelerator for Convolutional Neural Networks
    Vineet Panchbhaiyye
    Tokunbo Ogunfunmi
    [J]. Journal of Signal Processing Systems, 2021, 93 : 1117 - 1129
  • [9] Exploring Optimized Accelerator Design for Binarized Convolutional Neural Networks
    Ueyoshi, Kodai
    Ando, Kota
    Orimo, Kentaro
    Ikebe, Masayuki
    Asai, Tetsuya
    Motomura, Masato
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2510 - 2516
  • [10] An Efficient and Flexible Accelerator Design for Sparse Convolutional Neural Networks
    Xie, Xiaoru
    Lin, Jun
    Wang, Zhongfeng
    Wei, Jinghe
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (07) : 2936 - 2949