Pruning and quantization for deep neural network acceleration: A survey

被引：283

作者：

Liang, Tailin ^{[1
,2
]}

Glossner, John ^{[1
,2
,3
]}

Wang, Lei ^{[1
]}

Shi, Shaobo ^{[1
,2
]}

Zhang, Xiaotong ^{[1
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China

[2] Hua Xia Gen Processor Technol, Beijing 100080, Peoples R China

[3] Gen Proc Technol, Tarrytown, NY 10591 USA

来源：

NEUROCOMPUTING | 2021年 / 461卷

关键词：

Convolutional neural network; Neural network acceleration; Neural network quantization; Neural network pruning; Low-bit mathematics;

D O I：

10.1016/j.neucom.2021.07.045

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. Network compression can often be realized with little loss of accuracy. In some cases accuracy may even improve. This paper provides a survey on two types of network compression: pruning and quantization. Pruning can be categorized as static if it is performed offline or dynamic if it is performed at run-time. We compare pruning techniques and describe criteria used to remove redundant computations. We discuss trade-offs in element-wise, channel-wise, shape-wise, filter-wise, layer-wise and even network-wise pruning. Quantization reduces computations by reducing the precision of the datatype. Weights, biases, and activations may be quantized typically to 8-bit integers although lower bit width implementations are also discussed including binary neural networks. Both pruning and quantization can be used independently or combined. We compare current techniques, analyze their strengths and weaknesses, present compressed network accuracy results on a number of frameworks, and provide practical guidance for compressing networks. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：370 / 403

页数：34

共 50 条

[1] An FSCV Deep Neural Network: Development, Pruning, and Acceleration on an FPGA
Zhang, Zhichao
Oh, Yoonbae
Adams, Scott D.
Bennet, Kevin E.
Kouzani, Abbas Z.
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (06) : 2248 - 2259
[2] Deep Neural Network Compression by In-Parallel Pruning-Quantization
Tung, Frederick
Mori, Greg
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 568 - 579
[3] Single-shot pruning and quantization for hardware-friendly neural network acceleration
Jiang, Bofeng
Chen, Jun
Liu, Yong
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[4] Neural network pruning and hardware acceleration
Jeong, Taehee
Ghasemi, Ehsam
Tuyls, Jorn
Delaye, Elliott
Sirasao, Ashish
[J]. 2020 IEEE/ACM 13TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2020), 2020, : 440 - 445
[5] HFPQ: deep neural network compression by hardware-friendly pruning-quantization
YingBo Fan
Wei Pang
ShengLi Lu
[J]. Applied Intelligence, 2021, 51 : 7016 - 7028
[6] HFPQ: deep neural network compression by hardware-friendly pruning-quantization
Fan, YingBo
Pang, Wei
Lu, ShengLi
[J]. APPLIED INTELLIGENCE, 2021, 51 (10) : 7016 - 7028
[7] Neural Network Compression and Acceleration by Federated Pruning
Pei, Songwen
Wu, Yusheng
Qiu, Meikang
[J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT II, 2020, 12453 : 173 - 183
[8] DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration
Song, Zhuoran
Fu, Bangqi
Wu, Feiyang
Jiang, Zhaoming
Jiang, Li
Jing, Naifeng
Liang, Xiaoyao
[J]. 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 1010 - 1021
[9] Deep Neural Network Acceleration Based on Low-Rank Approximated Channel Pruning
Chen, Zhen
Chen, Zhibo
Lin, Jianxin
Liu, Sen
Li, Weiping
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (04) : 1232 - 1244
[10] Convolutional Neural Network Pruning: A Survey
Xu, Sheng
Huang, Anran
Chen, Lei
Zhang, Baochang
[J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7458 - 7463

← 1 2 3 4 5 →