Improved Knowledge Distillation via Teacher Assistant

被引:0
|
作者
Mirzadeh, Seyed Iman [1 ]
Farajtabar, Mehrdad [2 ]
Li, Ang [2 ]
Levine, Nir [2 ]
Matsukawa, Akihiro [3 ]
Ghasemzadeh, Hassan [1 ]
机构
[1] Washington State Univ, Pullman, WA 99164 USA
[2] DeepMind, Mountain View, CA USA
[3] DE Shaw, New York, NY USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.
引用
收藏
页码:5191 / 5198
页数:8
相关论文
共 50 条
  • [41] Decoupled Multi-teacher Knowledge Distillation based on Entropy
    Cheng, Xin
    Tang, Jialiang
    Zhang, Zhiqiang
    Yu, Wenxin
    Jiang, Ning
    Zhou, Jinjia
    [J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [42] Hybrid Learning with Teacher-student Knowledge Distillation for Recommenders
    Zhang, Hangbin
    Wong, Raymond K.
    Chu, Victor W.
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2020), 2020, : 227 - 235
  • [43] Data-Free Low-Bit Quantization via Dynamic Multi-teacher Knowledge Distillation
    Huang, Chong
    Lin, Shaohui
    Zhang, Yan
    Li, Ke
    Zhang, Baochang
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 28 - 41
  • [44] Knowledge Distillation via Route Constrained Optimization
    Jin, Xiao
    Peng, Baoyun
    Wu, Yichao
    Liu, Yu
    Liu, Jiaheng
    Liang, Ding
    Yan, Junjie
    Hu, Xiaolin
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1345 - 1354
  • [45] A Virtual Knowledge Distillation via Conditional GAN
    Kim, Sihwan
    [J]. IEEE ACCESS, 2022, 10 : 34766 - 34778
  • [46] Collaborative Knowledge Distillation via Multiknowledge Transfer
    Gou, Jianping
    Sun, Liyuan
    Yu, Baosheng
    Du, Lan
    Ramamohanarao, Kotagiri
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 6718 - 6730
  • [47] Knowledge distillation via Noisy Feature Reconstruction
    Shi, Chaokun
    Hao, Yuexing
    Li, Gongyan
    Xu, Shaoyun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 257
  • [48] Private Model Compression via Knowledge Distillation
    Wang, Ji
    Bao, Weidong
    Sun, Lichao
    Zhu, Xiaomin
    Cao, Bokai
    Yu, Philip S.
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1190 - +
  • [49] A Virtual Knowledge Distillation via Conditional GAN
    Kim, Sihwan
    [J]. IEEE Access, 2022, 10 : 34766 - 34778
  • [50] Self-knowledge distillation via dropout
    Lee, Hyoje
    Park, Yeachan
    Seo, Hyun
    Kang, Myungjoo
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 233