Inplace knowledge distillation with teacher assistant for improved training of flexible deep neural networks

被引：3

作者：

Ozerov, Alexey ^{[1
]}

Duong, Ngoc Q. K. ^{[1
]}

机构：

[1] InterDigital R&D France, Cesson Sevigne, France

来源：

29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021) | 2021年

基金：

欧盟地平线“2020”;

关键词：

Deep Neural Networks; Flexible Models; Inplace Knowledge Distillation with Teacher Assistant;

D O I：

10.23919/EUSIPCO54536.2021.9616244

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep neural networks (DNNs) have achieved great success in various machine learning tasks. However, most existing powerful DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory and computational resources or in applications with strict latency requirements. Thus, several resource-adaptable or flexible approaches were recently proposed that train at the same time a big model and several resource-specific sub-models. Inplace knowledge distillation (IPKD) became a popular method to train those models and consists in distilling the knowledge from a larger model (teacher) to all other sub-models (students). In this work a novel generic training method called IPKD with teacher assistant (IPKD-TA) is introduced, where sub-models themselves become teacher assistants teaching smaller sub-models. We evaluated the proposed IPKD-TA training method using two state-of-the-art flexible models (MSDNet and Slimmable MobileNet-V1) with two popular image classification benchmarks (CIFAR-10 and CIFAR-100). Our results demonstrate that the IPKD-TA is on par with the existing state of the art while improving it in most cases.

引用

页码：1356 / 1360

页数：5

共 50 条

[21] Layer-by-Layer Knowledge Distillation for Training Simplified Bipolar Morphological Neural Networks
Zingerenko, M. V.
Limonova, E. E.
PROGRAMMING AND COMPUTER SOFTWARE, 2023, 49 (SUPPL 2) : S108 - S114
[22] Layer-by-Layer Knowledge Distillation for Training Simplified Bipolar Morphological Neural Networks
M. V. Zingerenko
E. E. Limonova
Programming and Computer Software, 2023, 49 : S108 - S114
[23] Knowledge distillation on neural networks for evolving graphs
Antaris, Stefanos
Rafailidis, Dimitrios
Girdzijauskas, Sarunas
SOCIAL NETWORK ANALYSIS AND MINING, 2021, 11 (01)
[24] On Representation Knowledge Distillation for Graph Neural Networks
Joshi, Chaitanya K.
Liu, Fayao
Xun, Xu
Lin, Jie
Foo, Chuan Sheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 4656 - 4667
[25] Knowledge distillation on neural networks for evolving graphs
Stefanos Antaris
Dimitrios Rafailidis
Sarunas Girdzijauskas
Social Network Analysis and Mining, 2021, 11
[26] Online cross-layer knowledge distillation on graph neural networks with deep supervision
Jiongyu Guo
Defang Chen
Can Wang
Neural Computing and Applications, 2023, 35 : 22359 - 22374
[27] Online cross-layer knowledge distillation on graph neural networks with deep supervision
Guo, Jiongyu
Chen, Defang
Wang, Can
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (30): : 22359 - 22374
[28] Stochastic Precision Ensemble: Self-Knowledge Distillation for Quantized Deep Neural Networks
Boo, Yoonho
Shin, Sungho
Choi, Jungwook
Sung, Wonyong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6794 - 6802
[29] TAKDSR: Teacher Assistant Knowledge Distillation Framework for Graphics Image Super-Resolution
Yoon, Min
Lee, Seunghyun
Song, Byung Cheol
IEEE ACCESS, 2023, 11 : 112015 - 112026
[30] On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks
Thulasidasan, Sunil
Chennupati, Gopinath
Bilmes, Jeff
Bhattacharya, Tanmoy
Michalak, Sarah
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →