Multi-Stage Model Compression using Teacher Assistant and Distillation with Hint-Based Training

被引:0
|
作者
Morikawa, Takumi [1 ]
Kameyama, Keisuke [2 ]
机构
[1] Univ Tsukuba, Grad Sch Sci & Technol, Degree Programs Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki, Japan
关键词
Distillation; Hint-Based Training; Model compression; Image classification;
D O I
10.1109/PerComWorkshops53856.2022.9767229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large neural networks have shown high performance in various applications, however, they are not suitable for small devices such as smartphones. Therefore, there are needs to realize a small network that is easy to deploy in small devices and has high performance. One of the methods to solve this problem is distillation, which can be used to obtain a small neural network with high performance by transferring knowledge from a large, high-performance teacher model. However, if there is a large difference in the number of parameters between the teacher model and the student model, distillation may not work well. In this paper, we use the Teacher Assistant (TA) model, which is intermediate in the number of layers between the teacher model and the student model, to perform multi-step compression both the hidden and output layers which is a technique known as Hint-Based Training. First, we optimize the TA model by using the teacher model and performing distillation focusing on the hidden and output layers. Then, using the TA model as a teacher, we perform the same distillation of the hidden and output layers on the student model. In this way, we improve the performance of the student model by reducing the size of the model while increasing the depth of the layers step by step. Experiments show that the proposed method can compress the simple CNN model to a size with parameters of about 1/7 compared to the original neural network while maintaining the same classification accuracy for the test dataset. In the student model using ResNet with the bottleneck architecture, the proposed method outperformed the teacher model, which was about 8 times larger in parameter numbers. In addition, the proposed method achieved the best performance for the student model when compared with the existing studies.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] A Multi-Stage Adaptive Copy-Paste Data Augmentation Algorithm Based on Model Training Preferences
    Yu, Xiaoyu
    Li, Fuchao
    Liu, Yan
    Wang, Aili
    ELECTRONICS, 2023, 12 (17)
  • [22] Model-Based Optimization of Multi-Stage Nanofiltration Using the Solution-Diffusion-Electromigration Model
    Hubach, Tobias
    Schlueter, Stefan
    Held, Christoph
    PROCESSES, 2023, 11 (08)
  • [23] High water recovery of RU brine using multi-stage air gap membrane distillation
    Geng, Hongxin
    Wang, Juan
    Zhang, Chunyao
    Li, Pingli
    Chang, Heying
    DESALINATION, 2015, 355 : 178 - 185
  • [24] Surface Defect Detection System for Carrot Combine Harvest Based on Multi-Stage Knowledge Distillation
    Zhou, Wenqi
    Song, Chao
    Song, Kai
    Wen, Nuan
    Sun, Xiaobo
    Gao, Pengxiang
    FOODS, 2023, 12 (04)
  • [25] A floating planting system based on concentrated solar multi-stage rising film distillation process
    Wang, Lu
    He, Qian
    Yu, Huahong
    Jin, Rihui
    Zheng, Hongfei
    ENERGY CONVERSION AND MANAGEMENT, 2022, 254
  • [26] Synchro-waveform data compression using multi-stage hybrid coding algorithm
    Qiu, Wei
    Yin, He
    Wu, Yuru
    Chen, Chang
    Zhan, Lingwei
    Zeng, Chujie
    Liu, Yilu
    MEASUREMENT, 2024, 232
  • [27] Performance evaluation of a multi-stage humidification compression with heat recovery based on mathematical modeling
    Ghalavand, Arezoo
    Hatamipour, Mohammad Sadegh
    Ghalavand, Younes
    DESALINATION, 2021, 515
  • [28] Efficient Approach for CFD-based Aerodynamic Optimization Using Multi-Stage Surrogate Model
    Teng, Long
    Li, Liu
    Lei, Peng
    PROCEEDINGS OF 2010 ASIA-PACIFIC INTERNATIONAL SYMPOSIUM ON AEROSPACE TECHNOLOGY, VOL 1 AND 2, 2010, : 354 - 358
  • [29] Thermodynamic simulation of multi-stage screw compressors using chamber-based screw model
    Hauser, J.
    Beinert, M.
    Herlemann, S.
    8TH INTERNATIONAL CONFERENCE ON COMPRESSORS AND THEIR SYSTEMS, 2013, : 247 - 256
  • [30] Detecting multi-stage attacks using sequence-to-sequence model
    Zhou, Peng
    Zhou, Gongyan
    Wu, Dakui
    Fei, Minrui
    COMPUTERS & SECURITY, 2021, 105