HFPQ: deep neural network compression by hardware-friendly pruning-quantization

被引：0

作者：

YingBo Fan

Wei Pang

ShengLi Lu

机构：

[1] Southeast University,National ASIC System Engineering Research Center

来源：

Applied Intelligence | 2021年 / 51卷

关键词：

Neural network; Network compression; Exponential quantization; Channel pruning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper presents a hardware-friendly compression method for deep neural networks. This method effectively combines layered channel pruning with quantization by a power exponential of 2. While keeping a small decrease in the accuracy of the network model, the computational resources for neural networks to be deployed on the hardware are greatly reduced. These computing resources for hardware resolution include memory, multiple accumulation cells (MACs), and many logic gates for neural networks. Layered channel pruning groups the different layers by decreasing the model accuracy of the pruned network. After pruning each layer in a specific order, the network is retrained. The pruning method in this paper sets a parameter, that can be adjusted to meet different pruning rates in practical applications. The quantization method converts high-precision weights to low-precision weights. The latter are all composed of 0 and powers of 2. In the same way, another parameter is set to control the quantized bit width, which can also be adjusted to meet different quantization precisions. The hardware-friendly pruning quantization (HFPQ) method proposed in this paper trains the network after pruning and then quantizes the weights. The experimental results show that the HFPQ method compresses VGGNet, ResNet and GoogLeNet by 30+ times while reducing the number of FLOPs by more than 85%.

引用

页码：7016 / 7028

页数：12

共 50 条

[1] HFPQ: deep neural network compression by hardware-friendly pruning-quantization
Fan, YingBo
Pang, Wei
Lu, ShengLi
[J]. APPLIED INTELLIGENCE, 2021, 51 (10) : 7016 - 7028
[2] Deep Neural Network Compression by In-Parallel Pruning-Quantization
Tung, Frederick
Mori, Greg
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 568 - 579
[3] Single-shot pruning and quantization for hardware-friendly neural network acceleration
Jiang, Bofeng
Chen, Jun
Liu, Yong
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[4] Hardware-friendly Deep Learning by Network Quantization and Binarization
Qin, Haotong
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4911 - 4912
[5] CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization
Tung, Frederick
Mori, Greg
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7873 - 7882
[6] Octave Deep Compression: In-Parallel Pruning-Quantization on Different Frequencies
He, Qisheng
Dong, Ming
Schwiebert, Loren
[J]. 2021 IEEE 22ND INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2021), 2021, : 184 - 192
[7] Hardware-Friendly Acceleration for Deep Neural Networks with Micro-Structured Compression
Sun, Mengshu
Lin, Sheng
Liu, Shan
Li, Songnan
Wang, Yanzhi
Jiang, Wei
Wang, Wei
[J]. 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 229 - 229
[8] OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization
Hu, Peng
Peng, Xi
Zhu, Hongyuan
Aly, Mohamed M. Sabry
Lin, Jie
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7780 - 7788
[9] Float-Fix: An Efficient and Hardware-Friendly Data Type for Deep Neural Network
Dong Han
Shengyuan Zhou
Tian Zhi
Yibo Wang
Shaoli Liu
[J]. International Journal of Parallel Programming, 2019, 47 : 345 - 359
[10] Hardware-friendly compression and hardware acceleration for transformer: A survey
Huang, Shizhen
Tang, Enhao
Li, Shun
Ping, Xiangzhan
Chen, Ruiqi
[J]. ELECTRONIC RESEARCH ARCHIVE, 2022, 30 (10): : 3755 - 3785

← 1 2 3 4 5 →