Pruning and quantization for deep neural network acceleration: A survey

被引:283
|
作者
Liang, Tailin [1 ,2 ]
Glossner, John [1 ,2 ,3 ]
Wang, Lei [1 ]
Shi, Shaobo [1 ,2 ]
Zhang, Xiaotong [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
[2] Hua Xia Gen Processor Technol, Beijing 100080, Peoples R China
[3] Gen Proc Technol, Tarrytown, NY 10591 USA
关键词
Convolutional neural network; Neural network acceleration; Neural network quantization; Neural network pruning; Low-bit mathematics;
D O I
10.1016/j.neucom.2021.07.045
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. Network compression can often be realized with little loss of accuracy. In some cases accuracy may even improve. This paper provides a survey on two types of network compression: pruning and quantization. Pruning can be categorized as static if it is performed offline or dynamic if it is performed at run-time. We compare pruning techniques and describe criteria used to remove redundant computations. We discuss trade-offs in element-wise, channel-wise, shape-wise, filter-wise, layer-wise and even network-wise pruning. Quantization reduces computations by reducing the precision of the datatype. Weights, biases, and activations may be quantized typically to 8-bit integers although lower bit width implementations are also discussed including binary neural networks. Both pruning and quantization can be used independently or combined. We compare current techniques, analyze their strengths and weaknesses, present compressed network accuracy results on a number of frameworks, and provide practical guidance for compressing networks. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:370 / 403
页数:34
相关论文
共 50 条
  • [1] An FSCV Deep Neural Network: Development, Pruning, and Acceleration on an FPGA
    Zhang, Zhichao
    Oh, Yoonbae
    Adams, Scott D.
    Bennet, Kevin E.
    Kouzani, Abbas Z.
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (06) : 2248 - 2259
  • [2] Deep Neural Network Compression by In-Parallel Pruning-Quantization
    Tung, Frederick
    Mori, Greg
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 568 - 579
  • [3] Single-shot pruning and quantization for hardware-friendly neural network acceleration
    Jiang, Bofeng
    Chen, Jun
    Liu, Yong
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [4] Neural network pruning and hardware acceleration
    Jeong, Taehee
    Ghasemi, Ehsam
    Tuyls, Jorn
    Delaye, Elliott
    Sirasao, Ashish
    [J]. 2020 IEEE/ACM 13TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2020), 2020, : 440 - 445
  • [5] HFPQ: deep neural network compression by hardware-friendly pruning-quantization
    YingBo Fan
    Wei Pang
    ShengLi Lu
    [J]. Applied Intelligence, 2021, 51 : 7016 - 7028
  • [6] HFPQ: deep neural network compression by hardware-friendly pruning-quantization
    Fan, YingBo
    Pang, Wei
    Lu, ShengLi
    [J]. APPLIED INTELLIGENCE, 2021, 51 (10) : 7016 - 7028
  • [7] Neural Network Compression and Acceleration by Federated Pruning
    Pei, Songwen
    Wu, Yusheng
    Qiu, Meikang
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT II, 2020, 12453 : 173 - 183
  • [8] DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration
    Song, Zhuoran
    Fu, Bangqi
    Wu, Feiyang
    Jiang, Zhaoming
    Jiang, Li
    Jing, Naifeng
    Liang, Xiaoyao
    [J]. 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 1010 - 1021
  • [9] Deep Neural Network Acceleration Based on Low-Rank Approximated Channel Pruning
    Chen, Zhen
    Chen, Zhibo
    Lin, Jianxin
    Liu, Sen
    Li, Weiping
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (04) : 1232 - 1244
  • [10] Convolutional Neural Network Pruning: A Survey
    Xu, Sheng
    Huang, Anran
    Chen, Lei
    Zhang, Baochang
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7458 - 7463