DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization

被引:0
|
作者
See, Jin-Chuan [1 ]
Ng, Hui-Fuang [1 ]
Tan, Hung-Khoon [1 ]
Chang, Jing-Jing [1 ]
Lee, Wai-Kong [2 ]
Hwang, Seong Oun [2 ]
机构
[1] Faculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman, Kampar, Petaling Jaya,31900, Malaysia
[2] Department of Computer Engineering, Gachon University, Seongnam,13120, Korea, Republic of
关键词
Convolutional neural network - Deep learning - Hardware - Memory consumption - Memory storage - Memory-management - Power-of-two - Quantisation - Quantization (signal);
D O I
暂无
中图分类号
学科分类号
摘要
To fulfil the tight area and memory constraints in IoT applications, the design of efficient Convolutional Neural Network (CNN) hardware becomes crucial. Quantization of CNN is one of the promising approach that allows the compression of large CNN into a much smaller one, which is very suitable for IoT applications. Among various proposed quantization schemes, Power-of-two (PoT) quantization enables efficient hardware implementation and small memory consumption for CNN accelerators, but requires retraining of CNN to retain its accuracy. This paper proposes a two-level post-training static quantization technique (DoubleQ) that combines the 8-bit and PoT weight quantization. The CNN weight is first quantized to 8-bit (level one), then further quantized to PoT (level two). This allows multiplication to be carried out using shifters, by expressing the weights in their PoT exponent form. DoubleQ also reduces the memory storage requirement for CNN, as only the exponent of the weights is needed for storage. However, DoubleQ trades the accuracy of the network for reduced memory storage. To recover the accuracy, a selection process (DoubleQExt) was proposed to strategically select some of the less informative layers in the network to be quantized with PoT at the second level. On ResNet-20, the proposed DoubleQ can reduce the memory consumption by 37.50% with 7.28% accuracy degradation compared to 8-bit quantization. By applying DoubleQExt, the accuracy is only degraded by 1.19% compared to 8-bit version while achieving a memory reduction of 23.05%. This result is also 1% more accurate than the state-of-the-art work (SegLog). The proposed DoubleQExt also allows flexible configuration to trade off the memory consumption with better accuracy, which is not found in the other state-of-the-art works. With the proposed two-level weight quantization, one can achieve a more efficient hardware architecture for CNN with minimal impact to the accuracy, which is crucial for IoT applications. © 2013 IEEE.
引用
收藏
页码:169082 / 169091
相关论文
共 50 条
  • [1] DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization
    See, Jin-Chuan
    Ng, Hui-Fuang
    Tan, Hung-Khoon
    Chang, Jing-Jing
    Lee, Wai-Kong
    Hwang, Seong Oun
    IEEE ACCESS, 2021, 9 : 169082 - 169091
  • [2] Research on Efficient CNN Acceleration Through Mixed Precision Quantization: A Comprehensive Methodology
    He, Yizhi
    Liu, Wenlong
    Tahir, Muhammad
    Li, Zhao
    Zhang, Shaoshuang
    Amur, Hussain Bux
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (12) : 806 - 817
  • [3] Energy-Efficient Hardware Acceleration through Computing in the Memory
    Paul, Somnath
    Karam, Robert
    Bhunia, Swarup
    Puri, Ruchir
    2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
  • [4] An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization
    Xia, Tian
    Zhao, Boran
    Ma, Jian
    Fu, Gelin
    Zhao, Wenzhe
    Zheng, Nanning
    Ren, Pengju
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (03) : 1242 - 1255
  • [5] Optimization for Efficient Hardware Implementation of CNN on FPGA
    Farrukh, Fasih Ud Din
    Xie, Tuo
    Zhang, Chun
    Wang, Zhihua
    PROCEEDINGS OF 2018 IEEE INTERNATIONAL CONFERENCE ON INTEGRATED CIRCUITS, TECHNOLOGIES AND APPLICATIONS (ICTA 2018), 2018, : 88 - 89
  • [6] Hardware Acceleration of CNN with One-Hot Quantization of Weights and Activations
    Li, Gang
    Wang, Peisong
    Liu, Zejian
    Leng, Cong
    Cheng, Jian
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 971 - 974
  • [7] Efficient GPU Hardware Transactional Memory through Early Conflict Resolution
    Chen, Sui
    Peng, Lu
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA-22), 2016, : 274 - 284
  • [8] A Memory-Efficient CNN Accelerator Using Segmented Logarithmic Quantization and Multi-Cluster Architecture
    Xu, Jiawei
    Huan, Yuxiang
    Huang, Boming
    Chu, Haoming
    Jin, Yi
    Zheng, Li-Rong
    Zou, Zhuo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (06) : 2142 - 2146
  • [9] CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory-Based Neural Network Accelerators
    Bai, Jinyu
    Sun, Sifan
    Zhao, Weisheng
    Kang, Wang
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (01) : 189 - 202
  • [10] Efficient Hardware Implementation of Cellular Neural Networks with Powers-of-Two Based Incremental Quantization
    Xu, Xiaowei
    Lu, Qing
    Wang, Tianchen
    Liu, Jinglan
    Hu, Yu
    Shi, Yiyu
    PROCEEDINGS OF NEUROMORPHIC COMPUTING SYMPOSIUM (NCS 2017), 2017,