HFPQ: deep neural network compression by hardware-friendly pruning-quantization

被引:0
|
作者
YingBo Fan
Wei Pang
ShengLi Lu
机构
[1] Southeast University,National ASIC System Engineering Research Center
来源
Applied Intelligence | 2021年 / 51卷
关键词
Neural network; Network compression; Exponential quantization; Channel pruning;
D O I
暂无
中图分类号
学科分类号
摘要
This paper presents a hardware-friendly compression method for deep neural networks. This method effectively combines layered channel pruning with quantization by a power exponential of 2. While keeping a small decrease in the accuracy of the network model, the computational resources for neural networks to be deployed on the hardware are greatly reduced. These computing resources for hardware resolution include memory, multiple accumulation cells (MACs), and many logic gates for neural networks. Layered channel pruning groups the different layers by decreasing the model accuracy of the pruned network. After pruning each layer in a specific order, the network is retrained. The pruning method in this paper sets a parameter, that can be adjusted to meet different pruning rates in practical applications. The quantization method converts high-precision weights to low-precision weights. The latter are all composed of 0 and powers of 2. In the same way, another parameter is set to control the quantized bit width, which can also be adjusted to meet different quantization precisions. The hardware-friendly pruning quantization (HFPQ) method proposed in this paper trains the network after pruning and then quantizes the weights. The experimental results show that the HFPQ method compresses VGGNet, ResNet and GoogLeNet by 30+ times while reducing the number of FLOPs by more than 85%.
引用
收藏
页码:7016 / 7028
页数:12
相关论文
共 50 条
  • [1] HFPQ: deep neural network compression by hardware-friendly pruning-quantization
    Fan, YingBo
    Pang, Wei
    Lu, ShengLi
    [J]. APPLIED INTELLIGENCE, 2021, 51 (10) : 7016 - 7028
  • [2] Deep Neural Network Compression by In-Parallel Pruning-Quantization
    Tung, Frederick
    Mori, Greg
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 568 - 579
  • [3] Single-shot pruning and quantization for hardware-friendly neural network acceleration
    Jiang, Bofeng
    Chen, Jun
    Liu, Yong
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [4] Hardware-friendly Deep Learning by Network Quantization and Binarization
    Qin, Haotong
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4911 - 4912
  • [5] CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization
    Tung, Frederick
    Mori, Greg
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7873 - 7882
  • [6] Octave Deep Compression: In-Parallel Pruning-Quantization on Different Frequencies
    He, Qisheng
    Dong, Ming
    Schwiebert, Loren
    [J]. 2021 IEEE 22ND INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2021), 2021, : 184 - 192
  • [7] Hardware-Friendly Acceleration for Deep Neural Networks with Micro-Structured Compression
    Sun, Mengshu
    Lin, Sheng
    Liu, Shan
    Li, Songnan
    Wang, Yanzhi
    Jiang, Wei
    Wang, Wei
    [J]. 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 229 - 229
  • [8] OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Aly, Mohamed M. Sabry
    Lin, Jie
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7780 - 7788
  • [9] Float-Fix: An Efficient and Hardware-Friendly Data Type for Deep Neural Network
    Dong Han
    Shengyuan Zhou
    Tian Zhi
    Yibo Wang
    Shaoli Liu
    [J]. International Journal of Parallel Programming, 2019, 47 : 345 - 359
  • [10] Hardware-friendly compression and hardware acceleration for transformer: A survey
    Huang, Shizhen
    Tang, Enhao
    Li, Shun
    Ping, Xiangzhan
    Chen, Ruiqi
    [J]. ELECTRONIC RESEARCH ARCHIVE, 2022, 30 (10): : 3755 - 3785