A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern

被引：15

作者：

Yang, Tao ^{[1
]}

Liao, Yunkun ^{[1
]}

Shi, Jianping ^{[3
]}

Liang, Yun ^{[4
]}

Jing, Naifeng ^{[1
]}

Jiang, Li ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China

[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China

[3] SenseTime Grp Ltd, Shanghai, Peoples R China

[4] Peking Univ, Sch EECS, Beijing, Peoples R China

来源：

2020 30TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL) | 2020年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/FPL50879.2020.00050

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Field-Programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd transformation and weight pruning are widely adopted to reduce the storage and arithmetic overhead in matrix multiplication of CNN on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. In this paper, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely Sub-Row-Balanced Sparsity (SRBS) pattern, to overcome the above challenge. Then, we develop a 2-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Finally, we design an FPGA accelerator that takes advantage of the SRBS pattern to eliminate low-parallelism computation and irregular memory accesses. Experimental results on VGG16 and Resnet-18 with CIFAR-10 and Imagenet show up to 4.4x and 3.06x speedup compared with the state-of-the-art dense Winograd accelerator and 52% (theoretical upper-bound is 72%) performance enhancement compared with the state-of-the-art sparse Winograd accelerator. The resulting sparsity ratio is 80% and 75% and the loss of model accuracy is negligible.

引用

页码：254 / 261

页数：8

共 50 条

[1] BISWSRBS: AWinograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed PrecisionQuantization
Yang, Tao
He, Zhezhi
Kou, Tengchuan
Li, Qingzheng
Han, Qi
Yu, Haibao
Liu, Fangxin
Liang, Yun
Jiang, Li
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2021, 14 (04)
[2] PCNN: Pattern-based Fine-Grained Regular Pruning Towards Optimizing CNN Accelerators
Tan, Zhanhong
Song, Jiebo
Ma, Xiaolong
Tan, Sia-Huat
Chen, Hongyang
Miao, Yuanqing
Wu, Yifu
Ye, Shaokai
Wang, Yanzhi
Li, Dehui
Ma, Kaisheng
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[3] A Winograd-Based Integrated Photonics Accelerator for Convolutional Neural Networks
Mehrabian, Armin
Miscuglio, Mario
Alkabani, Yousra
Sorger, Volker J.
El-Ghazawi, Tarek
IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2020, 26 (01)
[4] Leveraging Fine-grained Structured Sparsity for CNN Inference on Systolic Array Architectures
Liu, Linqiao
Brown, Stephen
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 301 - 305
[5] A Fine-grained Optimization to Winograd Convolution Based on Micro-architectural Features of CPU
Chen, Xiaofeng
Chen, Zhiguang
Lu, Yutong
Huang, Dan
19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021), 2021, : 1199 - 1208
[6] Attentive Fine-Grained Structured Sparsity for Image Restoration
Oh, Junghun
Kim, Heewon
Nah, Seungjun
Hong, Cheeun
Choi, Jonghyun
Lee, Kyoung Mu
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17652 - 17661
[7] Edge-Side Fine-Grained Sparse CNN Accelerator With Efficient Dynamic Pruning Scheme
Wu, Bi
Yu, Tianyang
Chen, Ke
Liu, Weiqiang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (03) : 1285 - 1298
[8] Data Stream Oriented Fine-grained Sparse CNN Accelerator with Efficient Unstructured Pruning Strategy
Yu, Tianyang
Wu, Bi
Chen, Ke
Yan, Chenggang
Liu, Weiqiang
PROCEEDINGS OF THE 32ND GREAT LAKES SYMPOSIUM ON VLSI 2022, GLSVLSI 2022, 2022, : 243 - 248
[9] An Operation-Minimized FPGA Accelerator Design by Dynamically Exploiting Sparsity in CNN Winograd Transform
Di, Xinkai
Yang, Haigang
Huang, Zhihong
Mao, Ning
32ND IEEE INTERNATIONAL SYSTEM ON CHIP CONFERENCE (IEEE SOCC 2019), 2019, : 50 - 55
[10] FINE-GRAINED COMPLEXITY OF REGULAR PATH QUERIES
Casel, Katrin
Schmid, Markus l.
LOGICAL METHODS IN COMPUTER SCIENCE, 2023, 19 (04)

← 1 2 3 4 5 →