Multi-Stage Model Compression using Teacher Assistant and Distillation with Hint-Based Training

被引：0

作者：

Morikawa, Takumi ^{[1
]}

Kameyama, Keisuke ^{[2
]}

机构：

[1] Univ Tsukuba, Grad Sch Sci & Technol, Degree Programs Syst & Informat Engn, Tsukuba, Ibaraki, Japan

[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki, Japan

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS) | 2022年

关键词：

Distillation; Hint-Based Training; Model compression; Image classification;

D O I：

10.1109/PerComWorkshops53856.2022.9767229

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large neural networks have shown high performance in various applications, however, they are not suitable for small devices such as smartphones. Therefore, there are needs to realize a small network that is easy to deploy in small devices and has high performance. One of the methods to solve this problem is distillation, which can be used to obtain a small neural network with high performance by transferring knowledge from a large, high-performance teacher model. However, if there is a large difference in the number of parameters between the teacher model and the student model, distillation may not work well. In this paper, we use the Teacher Assistant (TA) model, which is intermediate in the number of layers between the teacher model and the student model, to perform multi-step compression both the hidden and output layers which is a technique known as Hint-Based Training. First, we optimize the TA model by using the teacher model and performing distillation focusing on the hidden and output layers. Then, using the TA model as a teacher, we perform the same distillation of the hidden and output layers on the student model. In this way, we improve the performance of the student model by reducing the size of the model while increasing the depth of the layers step by step. Experiments show that the proposed method can compress the simple CNN model to a size with parameters of about 1/7 compared to the original neural network while maintaining the same classification accuracy for the test dataset. In the student model using ResNet with the bottleneck architecture, the proposed method outperformed the teacher model, which was about 8 times larger in parameter numbers. In addition, the proposed method achieved the best performance for the student model when compared with the existing studies.

引用

页数：7

共 50 条

[1] Layer-wise hint-based training for knowledge transfer in a teacher-student framework
Bae, Ji-Hoon
Yim, Junho
Kim, Nae-Soo
Pyo, Cheol-Sig
Kim, Junmo
ETRI JOURNAL, 2019, 41 (02) : 242 - 253
[2] Model-based multi-stage compression of human face images
Sakalli, M
Yan, H
Lam, KM
Kondo, T
FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1278 - 1280
[3] MTMS: Multi-teacher Multi-stage Knowledge Distillation for Reasoning-Based Machine Reading Comprehension
Zhao, Zhuo
Xie, Zhiwen
Zhou, Guangyou
Huang, Jimmy Xiangji
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1995 - 2005
[4] Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
Yang, Ze
Shou, Linjun
Gong, Ming
Lin, Wutao
Jiang, Daxin
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 690 - 698
[5] Theoretical study of multi-stage flash distillation using solar energy
Farwati, MA
ENERGY, 1997, 22 (01) : 1 - 5
[6] Multi-Stage Network for Event-Based Video Deblurring with Residual Hint Attention
Kim, Jeongmin
Jung, Yong Ju
SENSORS, 2023, 23 (06)
[7] Multi-stage pulse compression using an exploding foil and a PEOS
Stewardson, HR
Novac, BM
Smith, R
Enache, MC
11TH IEEE INTERNATIONAL PULSED POWER CONFERENCE - DIGEST OF TECHNICAL PAPERS, VOLS. 1 & 2, 1997, : 1227 - 1232
[8] Wavelet-based ECG Compression using Dynamic Multi-stage Vector Quantization
Jeong, Gyu-Hyeok
Lee, In-Sung
ICIEA: 2009 4TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-6, 2009, : 2091 - 2096
[9] Energy and exergy analysis of multi-stage vacuum membrane distillation integrated with mechanical vapor compression
Lin, Bosong
Malmali, Mahdi
SEPARATION AND PURIFICATION TECHNOLOGY, 2023, 306
[10] Online Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network
Huang, Zhonghao
Zhou, Yimin
Yang, Xingyao
IECON 2021 - 47TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2021,

← 1 2 3 4 5 →