An adaptive joint optimization framework for pruning and quantization

被引：0

作者：

Li, Xiaohai ^{[1
,2
,3
]}

Yang, Xiaodong ^{[1
,2
,3
]}

Zhang, Yingwei ^{[1
,2
,3
]}

Yang, Jianrong ^{[4
,5
]}

Chen, Yiqiang ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China

[2] Beijing Key Lab Mobile Comp & Pervas Device, Beijing, Peoples R China

[3] Univ Chinese Acad Sci, Beijing, Peoples R China

[4] Guangxi Acad Med Sci, Peoples Hosp Guangxi Zhuang Autonomous Reg, Dept Hepatobiliary Pancreas & Spleen Surg, Nanning, Peoples R China

[5] Peoples Hosp Guangxi Zhuang Autonomous Reg, Guangxi Clin Res Ctr Sleep Med, Nanning, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2024年 / 15卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Model compression; Network pruning; Quantization; Mutual learning; Multi-teacher knowledge distillation;

D O I：

10.1007/s13042-024-02229-w

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pruning and quantization are among the most widely used techniques for deep learning model compression. Their combined application holds the potential for even greater performance gains. Most existing works combine pruning and quantization sequentially. However, this separation makes it difficult to fully leverage their complementarity and exploit the potential benefits of joint optimization. To address the limitations of existing methods, we propose A-JOPQ (adaptive joint optimization of pruning and quantization), an adaptive joint optimization framework for pruning and quantization. Starting with a deep neural network, A-JOPQ first constructs a pruning network through adaptive mutual learning with a quantization network. This process compensates for the loss of structural information during pruning. Subsequently, the pruning network is incrementally quantized using adaptive multi-teacher knowledge distillation of itself and the original uncompressed model. This approach effectively mitigates the adverse effects of quantization. Finally, A-JOPQ generates a pruning-quantization network that achieves significant model compression while maintaining high accuracy. Extensive experiments conducted on several public datasets demonstrate the superiority of our proposed method. Compared to existing methods, A-JOPQ achieves higher accuracy with a smaller model size. Additionally, we extend A-JOPQ to federated learning (FL) settings. Simulation experiments show that A-JOPQ can enhance FL by enabling resource-limited clients to participate effectively.

引用

页码：5199 / 5215

页数：17

共 50 条

[1] PQ-PIM: A pruning-quantization joint optimization framework for ReRAM-based-in DNN accelerator
Zhang, Yuhao
Wang, Xinyu
Jiang, Xikun
Yang, Yuhan
Shen, Zhaoyan
Jia, Zhiping
JOURNAL OF SYSTEMS ARCHITECTURE, 2022, 127
[2] Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs
Park, Jun-Hyung
Kim, Kang-Min
Lee, Sangkeun
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2022, 21 (05)
[3] A Lightweight Transmission Line Defect Detection Method via Joint Optimization of Pruning and Quantization
Yang, Jie
Li, Cong
Chen, Xianda
Wang, Yunpeng
Liu, Xiaojing
PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON POWER ELECTRONICS AND ARTIFICIAL INTELLIGENCE, PEAI 2024, 2024, : 186 - 191
[4] Quantization and pruning optimization method for attention mechanism
He Y.
Jiang J.
Xu J.
Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2024, 46 (01): : 113 - 120
[5] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
Wang, Tianzhe
Wang, Kuan
Cai, Han
Lin, Ji
Liu, Zhijian
Wang, Hanrui
Lin, Yujun
Han, Song
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2075 - 2084
[6] Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks
Xu, Kaixin
Wang, Zhe
Geng, Xue
Wu, Min
Li, Xiaoli
Lin, Weisi
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17401 - 17411
[7] A Compression Method for Object Detection Network Using Joint Pruning and Quantization
Yi, Lingjie
Xie, Xianzhong
Jiang, Bo
2024 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE, ISMSI 2024, 2024, : 41 - 48
[8] Regularized Training Framework for Combining Pruning and Quantization to Compress Neural Networks
Ding, Qimin
Zhang, Ruonan
Jiang, Yi
Zhai, Daosen
Li, Bin
2019 11TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2019,
[9] Adaptive joint parameter quantization of sinusoidal parameters
Wang, S. (s.wang@bipt.edu.cn), 1600, Chinese Institute of Electronics (42):
[10] An Automatic Neural Network Architecture-and-Quantization Joint Optimization Framework for Efficient Model Inference
Liu, Lian
Wang, Ying
Zhao, Xiandong
Chen, Weiwei
Li, Huawei
Li, Xiaowei
Han, Yinhe
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (05) : 1497 - 1510

← 1 2 3 4 5 →