Multi-Stage Model Compression using Teacher Assistant and Distillation with Hint-Based Training

被引：0

作者：

Morikawa, Takumi ^{[1
]}

Kameyama, Keisuke ^{[2
]}

机构：

[1] Univ Tsukuba, Grad Sch Sci & Technol, Degree Programs Syst & Informat Engn, Tsukuba, Ibaraki, Japan

[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki, Japan

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS) | 2022年

关键词：

Distillation; Hint-Based Training; Model compression; Image classification;

D O I：

10.1109/PerComWorkshops53856.2022.9767229

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Large neural networks have shown high performance in various applications, however, they are not suitable for small devices such as smartphones. Therefore, there are needs to realize a small network that is easy to deploy in small devices and has high performance. One of the methods to solve this problem is distillation, which can be used to obtain a small neural network with high performance by transferring knowledge from a large, high-performance teacher model. However, if there is a large difference in the number of parameters between the teacher model and the student model, distillation may not work well. In this paper, we use the Teacher Assistant (TA) model, which is intermediate in the number of layers between the teacher model and the student model, to perform multi-step compression both the hidden and output layers which is a technique known as Hint-Based Training. First, we optimize the TA model by using the teacher model and performing distillation focusing on the hidden and output layers. Then, using the TA model as a teacher, we perform the same distillation of the hidden and output layers on the student model. In this way, we improve the performance of the student model by reducing the size of the model while increasing the depth of the layers step by step. Experiments show that the proposed method can compress the simple CNN model to a size with parameters of about 1/7 compared to the original neural network while maintaining the same classification accuracy for the test dataset. In the student model using ResNet with the bottleneck architecture, the proposed method outperformed the teacher model, which was about 8 times larger in parameter numbers. In addition, the proposed method achieved the best performance for the student model when compared with the existing studies.

引用

页数：7

共 50 条

[21] A Multi-Stage Adaptive Copy-Paste Data Augmentation Algorithm Based on Model Training Preferences
Yu, Xiaoyu
Li, Fuchao
Liu, Yan
Wang, Aili
ELECTRONICS, 2023, 12 (17)
[22] Model-Based Optimization of Multi-Stage Nanofiltration Using the Solution-Diffusion-Electromigration Model
Hubach, Tobias
Schlueter, Stefan
Held, Christoph
PROCESSES, 2023, 11 (08)
[23] High water recovery of RU brine using multi-stage air gap membrane distillation
Geng, Hongxin
Wang, Juan
Zhang, Chunyao
Li, Pingli
Chang, Heying
DESALINATION, 2015, 355 : 178 - 185
[24] Surface Defect Detection System for Carrot Combine Harvest Based on Multi-Stage Knowledge Distillation
Zhou, Wenqi
Song, Chao
Song, Kai
Wen, Nuan
Sun, Xiaobo
Gao, Pengxiang
FOODS, 2023, 12 (04)
[25] A floating planting system based on concentrated solar multi-stage rising film distillation process
Wang, Lu
He, Qian
Yu, Huahong
Jin, Rihui
Zheng, Hongfei
ENERGY CONVERSION AND MANAGEMENT, 2022, 254
[26] Synchro-waveform data compression using multi-stage hybrid coding algorithm
Qiu, Wei
Yin, He
Wu, Yuru
Chen, Chang
Zhan, Lingwei
Zeng, Chujie
Liu, Yilu
MEASUREMENT, 2024, 232
[27] Performance evaluation of a multi-stage humidification compression with heat recovery based on mathematical modeling
Ghalavand, Arezoo
Hatamipour, Mohammad Sadegh
Ghalavand, Younes
DESALINATION, 2021, 515
[28] Efficient Approach for CFD-based Aerodynamic Optimization Using Multi-Stage Surrogate Model
Teng, Long
Li, Liu
Lei, Peng
PROCEEDINGS OF 2010 ASIA-PACIFIC INTERNATIONAL SYMPOSIUM ON AEROSPACE TECHNOLOGY, VOL 1 AND 2, 2010, : 354 - 358
[29] Thermodynamic simulation of multi-stage screw compressors using chamber-based screw model
Hauser, J.
Beinert, M.
Herlemann, S.
8TH INTERNATIONAL CONFERENCE ON COMPRESSORS AND THEIR SYSTEMS, 2013, : 247 - 256
[30] Detecting multi-stage attacks using sequence-to-sequence model
Zhou, Peng
Zhou, Gongyan
Wu, Dakui
Fei, Minrui
COMPUTERS & SECURITY, 2021, 105

← 1 2 3 4 5 →