Accelerating CNN Algorithm with Fine-grained Dataflow Architectures

被引：3

作者：

Xiang, Taoran ^{[1
,2
]}

Feng, Yujing ^{[1
]}

Ye, Xiaochun ^{[1
]}

Tan, Xu ^{[1
,2
]}

Li, Wenming ^{[1
]}

Zhu, Yatao ^{[1
]}

Wu, Meng ^{[1
]}

Zhang, Hao ^{[1
]}

Fan, Dongrui ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, ICT, State Key Lab Comp Architecture, Beijing, Peoples R China

[2] UCAS, Sch Comp & Control Engn, Beijing, Peoples R China

来源：

IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年

基金：

中国国家自然科学基金;

关键词：

fine-grained dataflow; Convolutional Neural Network; general accelerator; data reuse; high parallel;

D O I：

10.1109/HPCC/SmartCity/DSS.2018.00063

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional Neural Network(CNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for CNN. However, accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing CNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme for implementing and optimizing CNN on fine-grained dataflow architecture based accelerators. The experiment results reveal that by using our scheme, the performance of AlexNet running on the dataflow accelerator is 3.11x higher than that on NVIDIA Tesla K80, and the power consumption of our hardware is 8.52x lower than that of K80.

引用

页码：243 / 251

页数：9

共 50 条

[1] Fine-Grained Synchronizations and Dataflow Programming on GPUs
Li, Ang
van den Braak, Gert-Jan
Corporaal, Henk
Kumar, Akash
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 109 - 118
[2] Leveraging Fine-grained Structured Sparsity for CNN Inference on Systolic Array Architectures
Liu, Linqiao
Brown, Stephen
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 301 - 305
[3] HyConv: Accelerating Multi-Phase CNN Computation by Fine-Grained Policy Selection
Li, Xiaqing
Zhang, Guangyan
Wang, Zhufan
Zheng, Weimin
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (02) : 388 - 399
[4] A Fine-grained Performance Model for GPU Architectures
Bombieri, Nicola
Busato, Federico
Fummi, Franco
PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2016, : 1267 - 1272
[5] Towards Fine-Grained Dataflow Parallelism in Big Data Systems
Ertel, Sebastian
Adam, Justus
Castrillon, Jeronimo
LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2017, 2019, 11403 : 281 - 282
[6] Accelerating RSA with Fine-Grained Parallelism Using GPU
Yang, Yang
Guan, Zhi
Sun, Huiping
Chen, Zhong
INFORMATION SECURITY PRACTICE AND EXPERIENCE, ISPEC 2015, 2015, 9065 : 454 - 468
[7] Bilinear CNN Models for Fine-grained Visual Recognition
Lin, Tsung-Yu
RoyChowdhury, Aruni
Maji, Subhransu
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1449 - 1457
[8] Fault Diagnosis of Gearbox in Multiple Conditions Based on Fine-Grained Classification CNN Algorithm
Jiang, Pengcheng
Cong, Hua
Wang, Jing
Zhang, Dongsheng
SHOCK AND VIBRATION, 2020, 2020
[9] Fast Attention CNN for Fine-Grained Crack Segmentation
Lee, Hyunnam
Yoo, Juhan
SENSORS, 2023, 23 (04)
[10] Strengthening Component Architectures by Modeling Fine-grained Entities
Bures, Tomas
Jezek, Pavel
Malohlava, Michal
Poch, Tomas
Sery, Ondrej
2011 37TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2011), 2011, : 124 - 128

← 1 2 3 4 5 →