Inplace knowledge distillation with teacher assistant for improved training of flexible deep neural networks

被引:3
|
作者
Ozerov, Alexey [1 ]
Duong, Ngoc Q. K. [1 ]
机构
[1] InterDigital R&D France, Cesson Sevigne, France
基金
欧盟地平线“2020”;
关键词
Deep Neural Networks; Flexible Models; Inplace Knowledge Distillation with Teacher Assistant;
D O I
10.23919/EUSIPCO54536.2021.9616244
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) have achieved great success in various machine learning tasks. However, most existing powerful DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory and computational resources or in applications with strict latency requirements. Thus, several resource-adaptable or flexible approaches were recently proposed that train at the same time a big model and several resource-specific sub-models. Inplace knowledge distillation (IPKD) became a popular method to train those models and consists in distilling the knowledge from a larger model (teacher) to all other sub-models (students). In this work a novel generic training method called IPKD with teacher assistant (IPKD-TA) is introduced, where sub-models themselves become teacher assistants teaching smaller sub-models. We evaluated the proposed IPKD-TA training method using two state-of-the-art flexible models (MSDNet and Slimmable MobileNet-V1) with two popular image classification benchmarks (CIFAR-10 and CIFAR-100). Our results demonstrate that the IPKD-TA is on par with the existing state of the art while improving it in most cases.
引用
收藏
页码:1356 / 1360
页数:5
相关论文
共 50 条
  • [21] Layer-by-Layer Knowledge Distillation for Training Simplified Bipolar Morphological Neural Networks
    Zingerenko, M. V.
    Limonova, E. E.
    PROGRAMMING AND COMPUTER SOFTWARE, 2023, 49 (SUPPL 2) : S108 - S114
  • [22] Layer-by-Layer Knowledge Distillation for Training Simplified Bipolar Morphological Neural Networks
    M. V. Zingerenko
    E. E. Limonova
    Programming and Computer Software, 2023, 49 : S108 - S114
  • [23] Knowledge distillation on neural networks for evolving graphs
    Antaris, Stefanos
    Rafailidis, Dimitrios
    Girdzijauskas, Sarunas
    SOCIAL NETWORK ANALYSIS AND MINING, 2021, 11 (01)
  • [24] On Representation Knowledge Distillation for Graph Neural Networks
    Joshi, Chaitanya K.
    Liu, Fayao
    Xun, Xu
    Lin, Jie
    Foo, Chuan Sheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4656 - 4667
  • [25] Knowledge distillation on neural networks for evolving graphs
    Stefanos Antaris
    Dimitrios Rafailidis
    Sarunas Girdzijauskas
    Social Network Analysis and Mining, 2021, 11
  • [26] Online cross-layer knowledge distillation on graph neural networks with deep supervision
    Jiongyu Guo
    Defang Chen
    Can Wang
    Neural Computing and Applications, 2023, 35 : 22359 - 22374
  • [27] Online cross-layer knowledge distillation on graph neural networks with deep supervision
    Guo, Jiongyu
    Chen, Defang
    Wang, Can
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (30): : 22359 - 22374
  • [28] Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks
    Boo, Yoonho
    Shin, Sungho
    Choi, Jungwook
    Sung, Wonyong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6794 - 6802
  • [29] TAKDSR: Teacher Assistant Knowledge Distillation Framework for Graphics Image Super-Resolution
    Yoon, Min
    Lee, Seunghyun
    Song, Byung Cheol
    IEEE ACCESS, 2023, 11 : 112015 - 112026
  • [30] On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks
    Thulasidasan, Sunil
    Chennupati, Gopinath
    Bilmes, Jeff
    Bhattacharya, Tanmoy
    Michalak, Sarah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32