Accelerating CNN Algorithm with Fine-grained Dataflow Architectures

被引：3

作者：

Xiang, Taoran ^{[1
,2
]}

Feng, Yujing ^{[1
]}

Ye, Xiaochun ^{[1
]}

Tan, Xu ^{[1
,2
]}

Li, Wenming ^{[1
]}

Zhu, Yatao ^{[1
]}

Wu, Meng ^{[1
]}

Zhang, Hao ^{[1
]}

Fan, Dongrui ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, ICT, State Key Lab Comp Architecture, Beijing, Peoples R China

[2] UCAS, Sch Comp & Control Engn, Beijing, Peoples R China

来源：

IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年

基金：

中国国家自然科学基金;

关键词：

fine-grained dataflow; Convolutional Neural Network; general accelerator; data reuse; high parallel;

D O I：

10.1109/HPCC/SmartCity/DSS.2018.00063

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional Neural Network(CNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for CNN. However, accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing CNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme for implementing and optimizing CNN on fine-grained dataflow architecture based accelerators. The experiment results reveal that by using our scheme, the performance of AlexNet running on the dataflow accelerator is 3.11x higher than that on NVIDIA Tesla K80, and the power consumption of our hardware is 8.52x lower than that of K80.

引用

页码：243 / 251

页数：9

共 50 条

[21] Fine-Grained Ship Classification by Combining CNN and Swin Transformer
Huang, Liang
Wang, Fengxiang
Zhang, Yalun
Xu, Qingxia
REMOTE SENSING, 2022, 14 (13)
[22] LR-CNN FOR FINE-GRAINED CLASSIFICATION WITH VARYING RESOLUTION
Chevalier, M.
Thome, N.
Cord, M.
Fournier, J.
Henaff, G.
Dusch, E.
2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3101 - 3105
[23] Kernelized Bilinear CNN Models for Fine-Grained Visual Recognition
Ge S.-Y.
Gao Z.-L.
Zhang B.-B.
Li P.-H.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2019, 47 (10): : 2134 - 2141
[24] An FPGA Overlay for CNN Inference with Fine-grained Flexible Parallelism
Choudhury, Ziaul
Shrivastava, Shashwat
Ramapantulu, Lavanya
Purini, Suresh
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (03)
[25] Fine-Grained Intoxicated Gait Classification Using a Bilinear CNN
Li, Ruojun
Agu, Emmanuel
Sarwar, Atifa
Grimone, Kristin
Herman, Debra
Abrantes, Ana M.
Stein, Michael D.
IEEE SENSORS JOURNAL, 2023, 23 (23) : 29733 - 29748
[26] Part-Stacked CNN for Fine-Grained Visual Categorization
Huang, Shaoli
Xu, Zhe
Tao, Dacheng
Zhang, Ya
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1173 - 1182
[27] Multi-Scale CNN for Fine-Grained Image Recognition
Won, Chee Sun
IEEE ACCESS, 2020, 8 : 116663 - 116674
[28] A FINE-GRAINED PARALLEL MEMORY COMPACTION ALGORITHM
WEEMEEUW, P
DEMOEN, B
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 20 (02) : 176 - 186
[29] Fine-Grained Accident Detection: Database and Algorithm
Yu, Hongyang
Zhang, Xinfeng
Wang, Yaowei
Huang, Qingming
Yin, Baocai
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1059 - 1069
[30] Fine-grained Differential Harmony Search Algorithm
Lin, Xiaoyu
Zhong, Yiwen
Wang, Yingxu
PROCEEDINGS OF 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2015, : 59 - 66

← 1 2 3 4 5 →