Multi-Stage Model Compression using Teacher Assistant and Distillation with Hint-Based Training

被引:0
|
作者
Morikawa, Takumi [1 ]
Kameyama, Keisuke [2 ]
机构
[1] Univ Tsukuba, Grad Sch Sci & Technol, Degree Programs Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Fac Engn Informat & Syst, Tsukuba, Ibaraki, Japan
关键词
Distillation; Hint-Based Training; Model compression; Image classification;
D O I
10.1109/PerComWorkshops53856.2022.9767229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Large neural networks have shown high performance in various applications, however, they are not suitable for small devices such as smartphones. Therefore, there are needs to realize a small network that is easy to deploy in small devices and has high performance. One of the methods to solve this problem is distillation, which can be used to obtain a small neural network with high performance by transferring knowledge from a large, high-performance teacher model. However, if there is a large difference in the number of parameters between the teacher model and the student model, distillation may not work well. In this paper, we use the Teacher Assistant (TA) model, which is intermediate in the number of layers between the teacher model and the student model, to perform multi-step compression both the hidden and output layers which is a technique known as Hint-Based Training. First, we optimize the TA model by using the teacher model and performing distillation focusing on the hidden and output layers. Then, using the TA model as a teacher, we perform the same distillation of the hidden and output layers on the student model. In this way, we improve the performance of the student model by reducing the size of the model while increasing the depth of the layers step by step. Experiments show that the proposed method can compress the simple CNN model to a size with parameters of about 1/7 compared to the original neural network while maintaining the same classification accuracy for the test dataset. In the student model using ResNet with the bottleneck architecture, the proposed method outperformed the teacher model, which was about 8 times larger in parameter numbers. In addition, the proposed method achieved the best performance for the student model when compared with the existing studies.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Layer-wise hint-based training for knowledge transfer in a teacher-student framework
    Bae, Ji-Hoon
    Yim, Junho
    Kim, Nae-Soo
    Pyo, Cheol-Sig
    Kim, Junmo
    ETRI JOURNAL, 2019, 41 (02) : 242 - 253
  • [2] Model-based multi-stage compression of human face images
    Sakalli, M
    Yan, H
    Lam, KM
    Kondo, T
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1278 - 1280
  • [3] MTMS: Multi-teacher Multi-stage Knowledge Distillation for Reasoning-Based Machine Reading Comprehension
    Zhao, Zhuo
    Xie, Zhiwen
    Zhou, Guangyou
    Huang, Jimmy Xiangji
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1995 - 2005
  • [4] Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
    Yang, Ze
    Shou, Linjun
    Gong, Ming
    Lin, Wutao
    Jiang, Daxin
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 690 - 698
  • [5] Theoretical study of multi-stage flash distillation using solar energy
    Farwati, MA
    ENERGY, 1997, 22 (01) : 1 - 5
  • [6] Multi-Stage Network for Event-Based Video Deblurring with Residual Hint Attention
    Kim, Jeongmin
    Jung, Yong Ju
    SENSORS, 2023, 23 (06)
  • [7] Multi-stage pulse compression using an exploding foil and a PEOS
    Stewardson, HR
    Novac, BM
    Smith, R
    Enache, MC
    11TH IEEE INTERNATIONAL PULSED POWER CONFERENCE - DIGEST OF TECHNICAL PAPERS, VOLS. 1 & 2, 1997, : 1227 - 1232
  • [8] Wavelet-based ECG Compression using Dynamic Multi-stage Vector Quantization
    Jeong, Gyu-Hyeok
    Lee, In-Sung
    ICIEA: 2009 4TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-6, 2009, : 2091 - 2096
  • [9] Energy and exergy analysis of multi-stage vacuum membrane distillation integrated with mechanical vapor compression
    Lin, Bosong
    Malmali, Mahdi
    SEPARATION AND PURIFICATION TECHNOLOGY, 2023, 306
  • [10] Online Knowledge Distillation Based on Multi-stage Multi-generative Adversarial Network
    Huang, Zhonghao
    Zhou, Yimin
    Yang, Xingyao
    IECON 2021 - 47TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2021,