DoubleQExt: Hardware and Memory Efficient CNN Through Two Levels of Quantization

被引:0
|
作者
See, Jin-Chuan [1 ]
Ng, Hui-Fuang [1 ]
Tan, Hung-Khoon [1 ]
Chang, Jing-Jing [1 ]
Lee, Wai-Kong [2 ]
Hwang, Seong Oun [2 ]
机构
[1] Faculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman, Kampar, Petaling Jaya,31900, Malaysia
[2] Department of Computer Engineering, Gachon University, Seongnam,13120, Korea, Republic of
关键词
Convolutional neural network - Deep learning - Hardware - Memory consumption - Memory storage - Memory-management - Power-of-two - Quantisation - Quantization (signal);
D O I
暂无
中图分类号
学科分类号
摘要
To fulfil the tight area and memory constraints in IoT applications, the design of efficient Convolutional Neural Network (CNN) hardware becomes crucial. Quantization of CNN is one of the promising approach that allows the compression of large CNN into a much smaller one, which is very suitable for IoT applications. Among various proposed quantization schemes, Power-of-two (PoT) quantization enables efficient hardware implementation and small memory consumption for CNN accelerators, but requires retraining of CNN to retain its accuracy. This paper proposes a two-level post-training static quantization technique (DoubleQ) that combines the 8-bit and PoT weight quantization. The CNN weight is first quantized to 8-bit (level one), then further quantized to PoT (level two). This allows multiplication to be carried out using shifters, by expressing the weights in their PoT exponent form. DoubleQ also reduces the memory storage requirement for CNN, as only the exponent of the weights is needed for storage. However, DoubleQ trades the accuracy of the network for reduced memory storage. To recover the accuracy, a selection process (DoubleQExt) was proposed to strategically select some of the less informative layers in the network to be quantized with PoT at the second level. On ResNet-20, the proposed DoubleQ can reduce the memory consumption by 37.50% with 7.28% accuracy degradation compared to 8-bit quantization. By applying DoubleQExt, the accuracy is only degraded by 1.19% compared to 8-bit version while achieving a memory reduction of 23.05%. This result is also 1% more accurate than the state-of-the-art work (SegLog). The proposed DoubleQExt also allows flexible configuration to trade off the memory consumption with better accuracy, which is not found in the other state-of-the-art works. With the proposed two-level weight quantization, one can achieve a more efficient hardware architecture for CNN with minimal impact to the accuracy, which is crucial for IoT applications. © 2013 IEEE.
引用
收藏
页码:169082 / 169091
相关论文
共 50 条
  • [41] Efficient Eager Management of Conflicts for Scalable Hardware Transactional Memory
    Titos-Gil, Ruben
    Acacio, Manuel E.
    Garcia, Jose M.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (01) : 59 - 71
  • [42] A Hardware Acceleration Scheme for Memory-Efficient Flow Processing
    Yang, Xin
    Sezer, Sakir
    O'Neill, Shane
    2014 27TH IEEE INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC), 2014, : 437 - 442
  • [43] A Memory-Efficient Hardware Architecture for Deformable Convolutional Networks
    Yu, Yue
    Luo, Jiapeng
    Mao, Wendong
    Wang, Zhongfeng
    2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), 2021, : 140 - 145
  • [44] SAFER SLOTH: Efficient, Hardware-Tailored Memory Protection
    Danner, Daniel
    Mueller, Rainer
    Schroeder-Preikschat, Wolfgang
    Hofer, Wanja
    Lohmann, Daniel
    2014 IEEE 20TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), 2014, : 37 - 47
  • [45] Power Efficient Hardware Transactional Memory: Dynamic Issue of Transactions
    Do, Sang Wook Stephen
    Dubois, Michel
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (01)
  • [46] Efficient Management of Speculative Data in Hardware Transactional Memory Systems
    Waliullah, M. M.
    Stenstrom, Per
    2008 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING AND SIMULATION, PROCEEDINGS, 2008, : 158 - 164
  • [47] Efficient execution of speculative threads and transactions with hardware transactional memory
    Li, Gongming
    An, Hong
    Li, Qi
    Deng, Bobin
    Dai, Wenbo
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 30 : 242 - 253
  • [48] Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC
    Rios-Navarro, Antonio
    Gutierrez-Galan, Daniel
    Dominguez-Morales, Juan Pedro
    Pinero-Fuentes, Enrique
    Duran-Lopez, Lourdes
    Tapiador-Morales, Ricardo
    Dominguez-Morales, Manuel Jesus
    ELECTRONICS, 2021, 10 (01) : 1 - 10
  • [49] Efficient and lightweight in-memory computing architecture for hardware security
    Ajmi, Hala
    Zayer, Fakhreddine
    Fredj, Amira Hadj
    Belgacem, Hamdi
    Mohammad, Baker
    Werghi, Naoufel
    Dias, Jorge
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 190
  • [50] Efficient Synapse Memory Structure for Reconfigurable Digital Neuromorphic Hardware
    Kim, Jinseok
    Koo, Jongeun
    Kim, Taesu
    Kim, Jae-Joon
    FRONTIERS IN NEUROSCIENCE, 2018, 12