Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation

被引:0
|
作者
Miao, Zhongjian [1 ,2 ]
Zhang, Wen [2 ]
Su, Jinsong [1 ]
Li, Xiang [2 ]
Luan, Jian [2 ]
Chen, Yidong [1 ]
Wang, Bin [2 ]
Zhang, Min [3 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
[2] Xiaomi AI Lab, Beijing, Peoples R China
[3] Soochow Univ, Inst Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional knowledge distillation (KD) approaches are commonly employed to compress neural machine translation (NMT) models. However, they only obtain one lightweight student each time. Consequently, we have to conduct KD multiple times when different students are required at the same time, which could be resource-intensive. Additionally, these students are individually optimized, and thus lack interactions with each other, leading to their potential not being fully exerted. In this work, we propose a novel All-In-One Knowledge Distillation (AIO-KD) framework for NMT, which generates multiple satisfactory students at once. Under AIO-KD, we first randomly extract fewer-layer subnetworks from the teacher as the sample students. Then, we jointly optimize the teacher and these students, where the students simultaneously learn the knowledge from the teacher and interact with other students via mutual learning. When utilized, we re-extract the candidate students, satisfying the specifications of various devices. Particularly, we adopt carefully-designed strategies for AIO-KD: 1) we dynamically detach gradients to prevent poorly-performed students from negatively affecting the teacher during the knowledge transfer, which could subsequently impact other students; 2) we design a two-stage mutual learning strategy, which alleviates the negative impacts of poorly-performed students on the early-stage student interactions. Extensive experiments and in-depth analyses on three benchmarks demonstrate the effectiveness and eco-friendliness of AIO-KD. Our source code is available at https://github.com/DeepLearnXMU/AIO-KD.
引用
收藏
页码:2929 / 2940
页数:12
相关论文
共 50 条
  • [31] Exploiting Knowledge Graph in Neural Machine Translation
    Lu, Yu
    Zhang, Jiajun
    Zong, Chengqing
    MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 27 - 38
  • [32] SIMPLE: All-in-one programs for exploring interactions in moderated multiple regression
    O'Connor, BP
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1998, 58 (05) : 836 - 840
  • [33] OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
    Li, Wanyun
    Guo, Pinxue
    Zhou, Xinyu
    Hong, Lingyi
    Het, Yangji
    Zhang, Xiangyu
    Zhang, Wei
    Zhang, Wenqiang
    COMPUTER VISION - ECCV 2024, PT LVIII, 2025, 15116 : 20 - 40
  • [34] AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
    Zhou, Zixiang
    Wan, Yu
    Wang, Baoyuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1357 - 1366
  • [35] Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation
    Huang, Chenyang
    Huang, Fei
    Zheng, Zaixiang
    Zaiane, Osmar
    Zhou, Hao
    Mou, Lili
    13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 161 - 170
  • [36] EXPLORING THE USE OF ACOUSTIC EMBEDDINGS IN NEURAL MACHINE TRANSLATION
    Deena, Salil
    Ng, Raymond W. M.
    Madhyastha, Pranava
    Specia, Lucia
    Hain, Thomas
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 450 - 457
  • [37] NetIDE: All-in-one framework for next generation, composed SDN applications
    Aranda Gutierrez, P. A.
    Rojas, E.
    Schwabe, A.
    Stritzke, C.
    Doriguzzi-Corin, R.
    Leckey, A.
    Petralia, G.
    Marsico, A.
    Phemius, K.
    Tamurejo, S.
    2016 IEEE NETSOFT CONFERENCE AND WORKSHOPS (NETSOFT), 2016, : 355 - 356
  • [38] Exploring Recombination for Efficient Decoding of Neural Machine Translation
    Zhang, Zhisong
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Hai
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4785 - 4790
  • [39] Multimodal Machine Translation Based on Enhanced Knowledge Distillation and Feature Fusion
    Tian, Erlin
    Zhu, Zengchao
    Liu, Fangmei
    Li, Zuhe
    Gu, Ran
    Zhao, Shuai
    ELECTRONICS, 2024, 13 (15)
  • [40] Resource-Adaptive Federated Learning with All-In-One Neural Composition
    Mei, Yiqun
    Guo, Pengfei
    Zhou, Mo
    Patel, Vishal M.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,