DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization

被引:0
|
作者
See, Jin-Chuan [1 ]
Ng, Hui-Fuang [1 ]
Tan, Hung-Khoon [1 ]
Chang, Jing-Jing [1 ]
Lee, Wai-Kong [2 ]
Hwang, Seong Oun [2 ]
机构
[1] Faculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman, Kampar, Petaling Jaya,31900, Malaysia
[2] Department of Computer Engineering, Gachon University, Seongnam,13120, Korea, Republic of
关键词
Convolutional neural network - Deep learning - Hardware - Memory consumption - Memory storage - Memory-management - Power-of-two - Quantisation - Quantization (signal);
D O I
暂无
中图分类号
学科分类号
摘要
To fulfil the tight area and memory constraints in IoT applications, the design of efficient Convolutional Neural Network (CNN) hardware becomes crucial. Quantization of CNN is one of the promising approach that allows the compression of large CNN into a much smaller one, which is very suitable for IoT applications. Among various proposed quantization schemes, Power-of-two (PoT) quantization enables efficient hardware implementation and small memory consumption for CNN accelerators, but requires retraining of CNN to retain its accuracy. This paper proposes a two-level post-training static quantization technique (DoubleQ) that combines the 8-bit and PoT weight quantization. The CNN weight is first quantized to 8-bit (level one), then further quantized to PoT (level two). This allows multiplication to be carried out using shifters, by expressing the weights in their PoT exponent form. DoubleQ also reduces the memory storage requirement for CNN, as only the exponent of the weights is needed for storage. However, DoubleQ trades the accuracy of the network for reduced memory storage. To recover the accuracy, a selection process (DoubleQExt) was proposed to strategically select some of the less informative layers in the network to be quantized with PoT at the second level. On ResNet-20, the proposed DoubleQ can reduce the memory consumption by 37.50% with 7.28% accuracy degradation compared to 8-bit quantization. By applying DoubleQExt, the accuracy is only degraded by 1.19% compared to 8-bit version while achieving a memory reduction of 23.05%. This result is also 1% more accurate than the state-of-the-art work (SegLog). The proposed DoubleQExt also allows flexible configuration to trade off the memory consumption with better accuracy, which is not found in the other state-of-the-art works. With the proposed two-level weight quantization, one can achieve a more efficient hardware architecture for CNN with minimal impact to the accuracy, which is crucial for IoT applications. © 2013 IEEE.
引用
收藏
页码:169082 / 169091
相关论文
共 50 条
  • [21] Towards an Efficient Hardware Implementation of CNN-Based Object Trackers
    El-Shafie, Al-Hussein A.
    Zaki, Mohamed
    Habib, S. E. D.
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [22] CNN Acceleration With Hardware-Efficient Dataflow for Super-Resolution
    Lee, Sumin
    Joo, Sunghwan
    Ahn, Hong Keun
    Jung, Seong-Ook
    IEEE ACCESS, 2020, 8 : 187754 - 187765
  • [23] Differentiable Product Quantization for Memory Efficient Camera Relocalization
    Laskar, Zakaria
    Melekhov, Iaroslav
    Benbihi, Assia
    Wang, Shuzhe
    Kannala, Juho
    COMPUTER VISION - ECCV 2024, PT LXXXV, 2025, 15143 : 470 - 489
  • [24] Efficient Memory Pool Allocation Algorithm for CNN Inference
    Abraham, Arun
    Sahni, Manas
    Parashar, Akshay
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 345 - 352
  • [25] Hardware-aware Quantization/Mapping Strategies for Compute-in-Memory Accelerators
    Huang, Shanshi
    Jiang, Hongwu
    Yu, Shimeng
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2023, 28 (03)
  • [26] Skipping CNN Convolutions Through Efficient Memoization
    de Moura, Rafael Fao
    Santos, Paulo C.
    de Lima, Joao Paulo C.
    Alves, Marco A. Z.
    Beck, Antonio C. S.
    Carro, Luigi
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2019, 2019, 11733 : 65 - 76
  • [27] In-Memory Computing Architecture for Efficient Hardware Security
    Ajmi, Hala
    Zayer, Fakhreddine
    Belgacem, Hamdi
    arXiv,
  • [28] DESIGN AND HARDWARE IMPLEMENTATION OF A MEMORY EFFICIENT HUFFMAN DECODING
    HASHEMIAN, R
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1994, 40 (03) : 345 - 352
  • [29] Hardware-Efficient Autonomous Quantum Memory Protection
    Leghtas, Zaki
    Kirchmair, Gerhard
    Vlastakis, Brian
    Schoelkopf, Robert J.
    Devoret, Michel H.
    Mirrahimi, Mazyar
    PHYSICAL REVIEW LETTERS, 2013, 111 (12)
  • [30] Memory bandwidth efficient hardware architecture for AVS encoder
    Ding, Dandan
    Yao, Shuo
    Yu, Lu
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (02) : 675 - 680