Ultra-low Loss Quantization Method for Deep Neural Network Compression

被引:0
|
作者
Gong C. [1 ,2 ]
Lu Y. [1 ,2 ]
Dai S.-R. [1 ,2 ]
Liu F.-X. [1 ,2 ]
Chen X.-W. [3 ]
Li T. [1 ,2 ,4 ]
机构
[1] College of Computer Science, Nankai University, Tianjin
[2] Tianjin Key Laboratory of Network and Data Security Technology (Nankai University), Tianjin
[3] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2021年 / 32卷 / 08期
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
Extremum of quantizationloss; Neural network compression; Neural network quantization; Uniform quantization; Weight distribution;
D O I
10.13328/j.cnki.jos.006189
中图分类号
学科分类号
摘要
Deep neural network (DNN) quantization is an efficient model compression method, in which parameters and intermediate results are expressed by low bit width. The bit width of data will directly affect the memory footprint, computing power and energy consumption. Previous researches on model quantization lack effective quantitative analysis, which leads to unpredictable quantization loss of these methods. This study proposes an ultra-low loss quantization (μL2Q) method for DNN compression, which reveals the internal relationship between quantization bit width and quantization loss, effectively guiding the selection of quantization bit width and reducing quantization loss. First, the original data is mapped to the data with standard normal distribution and then the optimal parameter configuration is sought to reduce the quantization loss under the target bit width. Finally, μL2Q has been encapsulated and integrated into two popular deep learning training frameworks, including Caffe and Keras, to support the design and training of end-to-end model compression. The experimental results show that compared with the state-of-the-art three clusters of quantization solutions, μL2Q can still guarantee the accuracy and deliver 1.94%, 3.73%, and 8.24% of accuracy improvements under the typical neural networks with the same quantization bit width, respectively. In addition, it is also verified that μL2Q can be competent for more complex computer vision tasks through salient object detection experiments. © Copyright 2021, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2391 / 2407
页数:16
相关论文
共 57 条
  • [1] Peng YL, Zhang L, Zhang Y, Liu SG, Guo M., Deep deconvolution neural network for image super-resolution, Ruan Jian Xue Bao/ Journal of Software, 29, 4, pp. 926-934, (2018)
  • [2] Ge DH, Li HS, Zhang L, Liu RY, Shen PY, Miao QG., Survey of lightweight neural network, Ruan Jian Xue Bao/Journal of Software, 31, 9, pp. 2627-2653, (2020)
  • [3] Simonyan K, Zisserman A., Very deep convolutional networks for large-scale image recognition, Proc. of the ICLR, (2015)
  • [4] He K, Zhang X, Ren S, Sun J., Deep residual learning for image recognition, Proc. of the CVPR, pp. 770-778, (2016)
  • [5] Fan DP, Wang W, Cheng MM, Shen J., Shifting more attention to video salient object detection, Proc. of the CVPR, pp. 8554-8564, (2019)
  • [6] Girshick RB., Fast R-CNN, Proc. of the ICCV, pp. 1440-1448, (2015)
  • [7] Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg AC., SSD: Single shot MultiBox detector, Proc. of the ECCV, pp. 21-37, (2016)
  • [8] Szegedy C, Ioffe S, Vanhoucke V, Alemi AA., Inception-v4, inception-ResNet and the impact of residual connections on learning, Proc. of the AAAI, pp. 4278-4284, (2017)
  • [9] Fan D, Cheng M, Liu J, Gao S, Hou Q, Borji A., Salient objects in clutter: Bringing salient object detection to the foreground, Proc. of the ECCV, pp. 196-212, (2018)
  • [10] Noh H, Hong S, Han B., Learning deconvolution network for semantic segmentation, Proc. of the ICCV, pp. 1520-1528, (2015)