An adaptive joint optimization framework for pruning and quantization

被引:0
|
作者
Li, Xiaohai [1 ,2 ,3 ]
Yang, Xiaodong [1 ,2 ,3 ]
Zhang, Yingwei [1 ,2 ,3 ]
Yang, Jianrong [4 ,5 ]
Chen, Yiqiang [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] Beijing Key Lab Mobile Comp & Pervas Device, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
[4] Guangxi Acad Med Sci, Peoples Hosp Guangxi Zhuang Autonomous Reg, Dept Hepatobiliary Pancreas & Spleen Surg, Nanning, Peoples R China
[5] Peoples Hosp Guangxi Zhuang Autonomous Reg, Guangxi Clin Res Ctr Sleep Med, Nanning, Peoples R China
基金
中国国家自然科学基金;
关键词
Model compression; Network pruning; Quantization; Mutual learning; Multi-teacher knowledge distillation;
D O I
10.1007/s13042-024-02229-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pruning and quantization are among the most widely used techniques for deep learning model compression. Their combined application holds the potential for even greater performance gains. Most existing works combine pruning and quantization sequentially. However, this separation makes it difficult to fully leverage their complementarity and exploit the potential benefits of joint optimization. To address the limitations of existing methods, we propose A-JOPQ (adaptive joint optimization of pruning and quantization), an adaptive joint optimization framework for pruning and quantization. Starting with a deep neural network, A-JOPQ first constructs a pruning network through adaptive mutual learning with a quantization network. This process compensates for the loss of structural information during pruning. Subsequently, the pruning network is incrementally quantized using adaptive multi-teacher knowledge distillation of itself and the original uncompressed model. This approach effectively mitigates the adverse effects of quantization. Finally, A-JOPQ generates a pruning-quantization network that achieves significant model compression while maintaining high accuracy. Extensive experiments conducted on several public datasets demonstrate the superiority of our proposed method. Compared to existing methods, A-JOPQ achieves higher accuracy with a smaller model size. Additionally, we extend A-JOPQ to federated learning (FL) settings. Simulation experiments show that A-JOPQ can enhance FL by enabling resource-limited clients to participate effectively.
引用
收藏
页码:5199 / 5215
页数:17
相关论文
共 50 条
  • [1] PQ-PIM: A pruning-quantization joint optimization framework for ReRAM-based-in DNN accelerator
    Zhang, Yuhao
    Wang, Xinyu
    Jiang, Xikun
    Yang, Yuhan
    Shen, Zhaoyan
    Jia, Zhiping
    JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 127
  • [2] Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs
    Park, Jun-Hyung
    Kim, Kang-Min
    Lee, Sangkeun
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2022, 21 (05)
  • [3] A Lightweight Transmission Line Defect Detection Method via Joint Optimization of Pruning and Quantization
    Yang, Jie
    Li, Cong
    Chen, Xianda
    Wang, Yunpeng
    Liu, Xiaojing
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON POWER ELECTRONICS AND ARTIFICIAL INTELLIGENCE, PEAI 2024, 2024, : 186 - 191
  • [4] Quantization and pruning optimization method for attention mechanism
    He Y.
    Jiang J.
    Xu J.
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2024, 46 (01): : 113 - 120
  • [5] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
    Wang, Tianzhe
    Wang, Kuan
    Cai, Han
    Lin, Ji
    Liu, Zhijian
    Wang, Hanrui
    Lin, Yujun
    Han, Song
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2075 - 2084
  • [6] Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks
    Xu, Kaixin
    Wang, Zhe
    Geng, Xue
    Wu, Min
    Li, Xiaoli
    Lin, Weisi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17401 - 17411
  • [7] A Compression Method for Object Detection Network Using Joint Pruning and Quantization
    Yi, Lingjie
    Xie, Xianzhong
    Jiang, Bo
    2024 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE, ISMSI 2024, 2024, : 41 - 48
  • [8] Regularized Training Framework for Combining Pruning and Quantization to Compress Neural Networks
    Ding, Qimin
    Zhang, Ruonan
    Jiang, Yi
    Zhai, Daosen
    Li, Bin
    2019 11TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2019,
  • [9] Adaptive joint parameter quantization of sinusoidal parameters
    Wang, S. (s.wang@bipt.edu.cn), 1600, Chinese Institute of Electronics (42):
  • [10] An Automatic Neural Network Architecture-and-Quantization Joint Optimization Framework for Efficient Model Inference
    Liu, Lian
    Wang, Ying
    Zhao, Xiandong
    Chen, Weiwei
    Li, Huawei
    Li, Xiaowei
    Han, Yinhe
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (05) : 1497 - 1510