A Winograd-Based Highly-Parallel Convolution Engine for 8-bit CNN Acceleration

被引:2
|
作者
Chen, Yong-Tai [1 ]
Ou, Yu-Feng [1 ]
Huang, Chao-Tsung [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan
关键词
Winograd convolution; highly-parallel; computational imaging; CNN; quantization;
D O I
10.1109/AICAS54282.2022.9869911
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional neural network (CNN) accelerators for computational imaging typically use 8-bit fixed-point models for efficient computation, but the convolution engine still dominates the chip area. Quantizing models in lower bitwidths can cut down resource demand effectively, but it results in a significant loss of output quality. Another approach to reducing computational complexity is through Winograd convolution which lessens the demand for logic gates without diminishing model quality. Nevertheless, the resource reduction ratio of Winograd convolution declines with input bitwidths, and it needs even more gates than direct convolution at 8-bit. In this paper, we realize an area-efficient convolution engine for 8-bit computational imaging models by considering Winograd convolution and quantization jointly. First, we elaborate hardware sharing techniques for highly-parallel Winograd convolution. Then we propose an un-even scheme for Winograd-domain quantization that yields only up to 0.16 dB of PSNR drop on computational imaging models. Finally, we implement a highly-parallel Winograd convolution engine for 8-bit CNN inference. Synthesized with TSMC 40nm technology, the engine uses 2.17M of logic gates for delivering 5.12 TOPS of inference capability, saving 29.5% and 41.1 % of logic gates compared to a direct convolution engine and a naive Winograd implementation respectively. On modified FFDNet and EDSR baselines, it achieves up to Full HD 20 fps with merely 0.09 dB of PSNR drop on average.
引用
收藏
页码:395 / 398
页数:4
相关论文
共 12 条
  • [1] WinoTrain: Winograd-Aware Training for Accurate Full 8-bit Convolution Acceleration
    Mori, Pierpaolo
    Sampath, Shambhavi Balamuthu
    Frickenstein, Lukas
    Vemparala, Manoj-Rohit
    Fasfous, Nael
    Frickenstein, Alexander
    Stechele, Walter
    Passerone, Claudio
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [2] Customized Instruction on RISC-V for Winograd-Based Convolution Acceleration
    Wang, Shihang
    Zhu, Jianghan
    Wang, Qi
    He, Can
    Ye, Terry Tao
    2021 IEEE 32ND INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2021), 2021, : 65 - 68
  • [3] A Quality-Oriented Reconfigurable Convolution Engine Using Cross-Shaped Sparse Kernels for Highly-Parallel CNN Acceleration
    Weng, Chi-Wen
    Huang, Chao-Tsung
    2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
  • [4] WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration
    Xiang, Siwei
    Lv, Xianxian
    Meng, Yishuo
    Wang, Jianfei
    Lu, Cimang
    Yang, Chen
    ELECTRONICS, 2023, 12 (24)
  • [5] Laius: An 8-bit Fixed-point CNN Hardware Inference Engine
    Li, Zhisheng
    Wang, Lei
    Guo, Shasha
    Deng, Yu
    Dou, Qiang
    Zhou, Haifang
    Lu, Wenyuan
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 143 - 150
  • [6] eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference
    Huang, Chao-Tsung
    Ding, Yu-Chun
    Wang, Huan-Ching
    Weng, Chi-Wen
    Lin, Kai-Ping
    Wang, Li-Wei
    Chen, Li-De
    MICRO'52: THE 52ND ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2019, : 182 - 195
  • [7] Clipping-Based Post Training 8-Bit Quantization of Convolution Neural Networks for Object Detection
    Chen, Leisheng
    Lou, Peihuang
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [8] ASIC Implementation and Analysis of Extrinsic EHW Based Power and Area Optimised 8-Bit Asynchronous Parallel MAC
    Dhanasekaran, D.
    Bagan, K. Boopathy
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (01): : 266 - 280
  • [9] Highly Efficient Implementation of NIST-Compliant Koblitz Curve for 8-bit AVR-Based Sensor Nodes
    Seo, Seog Chung
    Seo, Hwajeong
    IEEE ACCESS, 2018, 6 : 67637 - 67652
  • [10] A 28nm 8-bit Floating-Point Tensor Core based CNN Training Processor with Dynamic Activation/Weight Sparsification
    Venkataramanaiah, Shreyas Kolala
    Meng, Jian
    Suh, Han-Sok
    Yeo, Injune
    Saikia, Jyotishman
    Cherupally, Sai Kiran
    Zhang, Yichi
    Zhang, Zhiru
    Seo, Jae-Sun
    ESSCIRC 2022- IEEE 48TH EUROPEAN SOLID STATE CIRCUITS CONFERENCE (ESSCIRC), 2022, : 89 - 92