An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks

被引：5

作者：

Islam, Md Najrul ^{[1
]}

Shrestha, Rahul ^{[1
]}

Chowdhury, Shubhajit Roy ^{[1
]}

机构：

[1] Indian Inst Technol IIT Mandi, Sch Comp & Elect Engn, Mandi 175075, Himachal Prades, India

来源：

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS | 2022年 / 30卷 / 12期

关键词：

Convolutional neural network (CNN); digital VLSI-architecture design; field-programmable gate array (FPGA); VGG-16 and GoogLeNet neural networks; VLSI; CNN;

D O I：

10.1109/TVLSI.2022.3210963

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article proposes an uninterrupted processing technique for the convolutional neural network (CNN) accelerator. It primarily allows the CNN accelerator to simultaneously perform both processing element (PE) operation and data fetching that reduces its latency and enhances the achievable throughput. Corresponding to the suggested technique, this work also presents a low latency VLSI-architecture of the CNN accelerator using the new random access line-buffer (RALB)-based design of PE array. Subsequently, the proposed CNN-accelerator architecture has been further optimized by reusing the local data in PE array, incurring better energy conservation. Our CNN accelerator has been hardware implemented on Zynq-UltraScale+ MPSoC-ZCU102 FPGA board, and it operates at a maximum clock frequency of 340 MHz, consuming 4.11 W of total power. In addition, the suggested CNN accelerator with 864 PEs delivers a peak throughput of 587.52 GOPs and an adequate energy efficiency of 142.95 GOPs/W. Comparison of aforementioned implementation results with the literature has shown that our CNN accelerator delivers 33.42% higher throughput and 6.24x better energy efficiency than the state-of-the-art work. Eventually, the field-programmable gate array (FPGA) prototype of the proposed CNN accelerator has been functionally validated using the real-world test setup for the detection of object from input image, using the GoogLeNet neural network.

引用

页码：1891 / 1901

页数：11

共 50 条

[1] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
Feng, Gan
Hu, Zuyi
Chen, Song
Wu, Feng
2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
[2] EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks
Liang, Shengwen
Wang, Ying
Liu, Cheng
He, Lei
Li, Huawei
Xu, Dawen
Li, Xiaowei
IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (09) : 1511 - 1525
[3] AN ENERGY-EFFICIENT MEMORY-BASED HIGH-THROUGHPUT VLSI ARCHITECTURE FOR CONVOLUTIONAL NETWORKS
Kang, Mingu
Gonugondla, Sujan K.
Keel, Min-Sun
Shanbhag, Naresh R.
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1037 - 1041
[4] FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks With Efficient DSP and Memory Optimization
Li, Jindong
Shen, Guobin
Zhao, Dongcheng
Zhang, Qian
Zeng, Yi
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (08) : 1178 - 1191
[5] High-throughput, energy-efficient network-on-chip-based hardware accelerators
Majumder, Turbo
Pande, Partha Pratim
Kalyanaraman, Ananth
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2013, 3 (01): : 36 - 46
[6] Hardware Design of an Energy-Efficient High-Throughput Median Filter
Lin, Shih-Hsiang
Chen, Pei-Yin
Lin, Chang-Hsing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (11) : 1728 - 1732
[7] High-throughput and Energy-efficient Graph Processing on FPGA
Zhou, Shijie
Chelmis, Charalampos
Prasanna, Viktor K.
2016 IEEE 24TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2016, : 103 - 110
[8] PNeuro: a scalable energy-efficient programmable hardware accelerator for neural networks
Carbon, A.
Philippe, J-M.
Bichler, O.
Schmit, R.
Tain, B.
Briand, D.
Ventroux, N.
Paindavoine, M.
Brousse, O.
PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 1039 - 1044
[9] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Chen, Yu-Hsin
Krishna, Tushar
Emer, Joel
Sze, Vivienne
2016 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2016, 59 : 262 - U363
[10] Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
Chen, Yu-Hsin
Krishna, Tushar
Emer, Joel S.
Sze, Vivienne
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) : 127 - 138

← 1 2 3 4 5 →