Reinforced Multi-Teacher Selection for Knowledge Distillation

被引:0
|
作者
Yuan, Fei [1 ,2 ]
Shou, Linjun [2 ]
Pei, Jian [3 ]
Lin, Wutao [2 ]
Gong, Ming [2 ]
Fu, Yan [1 ]
Jiang, Daxin [2 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Microsoft STCA NLP Grp, Beijing, Peoples R China
[3] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation transfers knowledge from one or multiple large (teacher) models to a small (student) model. When multiple teacher models are available in distillation, the state-of-the-art methods assign a fixed weight to a teacher model in the whole distillation. Furthermore, most of the existing methods allocate an equal weight to every teacher model. In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled. We systematically develop a reinforced method to dynamically assign weights to teacher models for different training instances and optimize the performance of student model. Our extensive experimental results on several NLP tasks clearly verify the feasibility and effectiveness of our approach.
引用
收藏
页码:14284 / 14291
页数:8
相关论文
共 50 条
  • [41] SA-MDRAD: sample-adaptive multi-teacher dynamic rectification adversarial distillation
    Li, Shuyi
    Yang, Xiaohan
    Cheng, Guozhen
    Liu, Wenyan
    Hu, Hongchao
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [42] Building and road detection from remote sensing images based on weights adaptive multi-teacher collaborative distillation using a fused knowledge
    Chen, Ziyi
    Deng, Liai
    Gou, Jing
    Wang, Cheng
    Li, Jonathan
    Li, Dilong
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 124
  • [43] Learning Accurate, Speedy, Lightweight CNNs via Instance-Specific Multi-Teacher Knowledge Distillation for Distracted Driver Posture Identification
    Li, Wenjing
    Wang, Jing
    Ren, Tingting
    Li, Fang
    Zhang, Jun
    Wu, Zhongcheng
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 17922 - 17935
  • [44] Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation
    Choi, Jaekeol
    Jung, Euna
    Suh, Jangwon
    Rhee, Wonjong
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2192 - 2196
  • [45] Multi-teacher cross-modal distillation with cooperative deep supervision fusion learning for unimodal segmentation
    Ahmad, Saeed
    Ullah, Zahid
    Gwak, Jeonghwan
    [J]. Knowledge-Based Systems, 2024, 297
  • [46] A Multi-Teacher Policy Distillation Framework for Enhancing Zero-Shot Generalization of Autonomous Driving Policies
    Yang, Jiachen
    Zhang, Jipeng
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (07) : 9734 - 9746
  • [47] BadCleaner: Defending Backdoor Attacks in Federated Learning via Attention-Based Multi-Teacher Distillation
    Zhang, Jiale
    Zhu, Chengcheng
    Ge, Chunpeng
    Ma, Chuan
    Zhao, Yanchao
    Sun, Xiaobing
    Chen, Bing
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (05) : 4559 - 4573
  • [48] Learning Multi-turn Response Selection in Grounded Dialogues with Reinforced Knowledge and Context Distillation
    Feng, Jiazhan
    Tao, Chongyang
    Zhao, Xueliang
    Zhao, Dongyan
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (04)
  • [49] Unsupervised Domain Adaptation in Medical Image Segmentation via Fourier Feature Decoupling and Multi-teacher Distillation
    Hu, Wei
    Xu, Qiaozhi
    Qi, Xuanhao
    Yin, Yanjun
    Zhi, Min
    Lian, Zhe
    Yang, Na
    Duan, Wentao
    Yu, Lei
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14867 : 98 - 110
  • [50] Multi-teacher knowledge extraction for prostate cancer recognition in intelligent medical assistance systems
    Li, Linyuan
    Zhang, Qian
    Liu, Zhengqi
    Xi, Xinyi
    Zhang, Haonan
    Nan, Yahui
    Tu, Huijuan
    [J]. INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2024,