HFPQ: deep neural network compression by hardware-friendly pruning-quantization

被引:11
|
作者
Fan, YingBo [1 ]
Pang, Wei [1 ]
Lu, ShengLi [1 ]
机构
[1] Southeast Univ, Natl ASIC Syst Engn Res Ctr, 2 SipailouRd, Nanjing, Jiangsu, Peoples R China
关键词
Neural network; Network compression; Exponential quantization; Channel pruning;
D O I
10.1007/s10489-020-01968-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a hardware-friendly compression method for deep neural networks. This method effectively combines layered channel pruning with quantization by a power exponential of 2. While keeping a small decrease in the accuracy of the network model, the computational resources for neural networks to be deployed on the hardware are greatly reduced. These computing resources for hardware resolution include memory, multiple accumulation cells (MACs), and many logic gates for neural networks. Layered channel pruning groups the different layers by decreasing the model accuracy of the pruned network. After pruning each layer in a specific order, the network is retrained. The pruning method in this paper sets a parameter, that can be adjusted to meet different pruning rates in practical applications. The quantization method converts high-precision weights to low-precision weights. The latter are all composed of 0 and powers of 2. In the same way, another parameter is set to control the quantized bit width, which can also be adjusted to meet different quantization precisions. The hardware-friendly pruning quantization (HFPQ) method proposed in this paper trains the network after pruning and then quantizes the weights. The experimental results show that the HFPQ method compresses VGGNet, ResNet and GoogLeNet by 30+ times while reducing the number of FLOPs by more than 85%.
引用
收藏
页码:7016 / 7028
页数:13
相关论文
共 50 条
  • [21] UCViT: Hardware-Friendly Vision Transformer via Unified Compression
    Song, HongRui
    Wang, Ya
    Wang, Meiqi
    Wang, Zhongfeng
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2022 - 2026
  • [22] HLC: A Hardware-friendly Quantization and Cache-based Accelerator for Transformer
    Sun, Xiangfeng
    Zhang, Yuanting
    Jiang, Yunchang
    Li, Zheng
    Han, Bingjin
    Mai, Junyi
    Luo, Zhibin
    Yao, Enyi
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 447 - 451
  • [23] SMOF: Squeezing More Out of Filters Yields Hardware-Friendly CNN Pruning
    Liu, Yanli
    Guan, Bochen
    Li, Weiyi
    Xu, Qinwen
    Quan, Shuxue
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 242 - 254
  • [24] ADAPTIVE LAYERWISE QUANTIZATION FOR DEEP NEURAL NETWORK COMPRESSION
    Zhu, Xiaotian
    Zhou, Wengang
    Li, Houqiang
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [25] Quantization Aware Factorization for Deep Neural Network Compression
    Cherniuk, Daria
    Abukhovich, Stanislav
    Phan, Anh-Huy
    Oseledets, Ivan
    Cichocki, Andrzej
    Gusak, Julia
    Journal of Artificial Intelligence Research, 2024, 81 : 973 - 988
  • [26] Parallel evolutionary training algorithms for “hardware-friendly” neural networks
    Vassilis P. Plagianakos
    Michael N. Vrahatis
    Natural Computing, 2002, 1 (2-3) : 307 - 322
  • [27] Hardware-Friendly Logarithmic Quantization with Mixed-Precision for MobileNetV2
    Choi, Dahun
    Kim, Hyun
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 348 - 351
  • [28] EBSP: Evolving Bit Sparsity Patterns for Hardware-Friendly Inference of Quantized Deep Neural Networks
    Liu, Fangxin
    Zhao, Wenbo
    Wang, Zongwu
    Chen, Yongbiao
    He, Zhezhi
    Jing, Naifeng
    Liang, Xiaoyao
    Jiang, Li
    PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 259 - 264
  • [29] Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms
    Epitropakis, M. G.
    Plagianakos, V. P.
    Vrahatis, M. N.
    APPLIED SOFT COMPUTING, 2010, 10 (02) : 398 - 408
  • [30] A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation
    Sui, Xuefu
    Lv, Qunbo
    Zhi, Liangjie
    Zhu, Baoyu
    Yang, Yuanbo
    Zhang, Yu
    Tan, Zheng
    SENSORS, 2023, 23 (02)