A Winograd-Based Highly-Parallel Convolution Engine for 8-bit CNN Acceleration

被引：2

作者：

Chen, Yong-Tai ^{[1
]}

Ou, Yu-Feng ^{[1
]}

Huang, Chao-Tsung ^{[1
]}

机构：

[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA | 2022年

关键词：

Winograd convolution; highly-parallel; computational imaging; CNN; quantization;

D O I：

10.1109/AICAS54282.2022.9869911

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional neural network (CNN) accelerators for computational imaging typically use 8-bit fixed-point models for efficient computation, but the convolution engine still dominates the chip area. Quantizing models in lower bitwidths can cut down resource demand effectively, but it results in a significant loss of output quality. Another approach to reducing computational complexity is through Winograd convolution which lessens the demand for logic gates without diminishing model quality. Nevertheless, the resource reduction ratio of Winograd convolution declines with input bitwidths, and it needs even more gates than direct convolution at 8-bit. In this paper, we realize an area-efficient convolution engine for 8-bit computational imaging models by considering Winograd convolution and quantization jointly. First, we elaborate hardware sharing techniques for highly-parallel Winograd convolution. Then we propose an un-even scheme for Winograd-domain quantization that yields only up to 0.16 dB of PSNR drop on computational imaging models. Finally, we implement a highly-parallel Winograd convolution engine for 8-bit CNN inference. Synthesized with TSMC 40nm technology, the engine uses 2.17M of logic gates for delivering 5.12 TOPS of inference capability, saving 29.5% and 41.1 % of logic gates compared to a direct convolution engine and a naive Winograd implementation respectively. On modified FFDNet and EDSR baselines, it achieves up to Full HD 20 fps with merely 0.09 dB of PSNR drop on average.

引用

页码：395 / 398

页数：4

共 12 条

[1] WinoTrain: Winograd-Aware Training for Accurate Full 8-bit Convolution Acceleration
Mori, Pierpaolo
Sampath, Shambhavi Balamuthu
Frickenstein, Lukas
Vemparala, Manoj-Rohit
Fasfous, Nael
Frickenstein, Alexander
Stechele, Walter
Passerone, Claudio
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[2] Customized Instruction on RISC-V for Winograd-Based Convolution Acceleration
Wang, Shihang
Zhu, Jianghan
Wang, Qi
He, Can
Ye, Terry Tao
2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 65 - 68
[3] A Quality-Oriented Reconfigurable Convolution Engine Using Cross-Shaped Sparse Kernels for Highly-Parallel CNN Acceleration
Weng, Chi-Wen
Huang, Chao-Tsung
2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
[4] WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration
Xiang, Siwei
Lv, Xianxian
Meng, Yishuo
Wang, Jianfei
Lu, Cimang
Yang, Chen
ELECTRONICS, 2023, 12 (24)
[5] Laius: An 8-bit Fixed-point CNN Hardware Inference Engine
Li, Zhisheng
Wang, Lei
Guo, Shasha
Deng, Yu
Dou, Qiang
Zhou, Haifang
Lu, Wenyuan
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 143 - 150
[6] eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference
Huang, Chao-Tsung
Ding, Yu-Chun
Wang, Huan-Ching
Weng, Chi-Wen
Lin, Kai-Ping
Wang, Li-Wei
Chen, Li-De
MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 182 - 195
[7] Clipping-Based Post Training 8-Bit Quantization of Convolution Neural Networks for Object Detection
Chen, Leisheng
Lou, Peihuang
APPLIED SCIENCES-BASEL, 2022, 12 (23):
[8] ASIC Implementation and Analysis of Extrinsic EHW Based Power and Area Optimised 8-Bit Asynchronous Parallel MAC
Dhanasekaran, D.
Bagan, K. Boopathy
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (01): : 266 - 280
[9] Highly Efficient Implementation of NIST-Compliant Koblitz Curve for 8-bit AVR-Based Sensor Nodes
Seo, Seog Chung
Seo, Hwajeong
IEEE ACCESS, 2018, 6 : 67637 - 67652
[10] A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification
Venkataramanaiah, Shreyas Kolala
Meng, Jian
Suh, Han-Sok
Yeo, Injune
Saikia, Jyotishman
Cherupally, Sai Kiran
Zhang, Yichi
Zhang, Zhiru
Seo, Jae-Sun
ESSCIRC 2022- IEEE 48TH EUROPEAN SOLID STATE CIRCUITS CONFERENCE (ESSCIRC), 2022, : 89 - 92

← 1 2 →