Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation

被引:0
|
作者
Miao, Zhongjian [1 ,2 ]
Zhang, Wen [2 ]
Su, Jinsong [1 ]
Li, Xiang [2 ]
Luan, Jian [2 ]
Chen, Yidong [1 ]
Wang, Bin [2 ]
Zhang, Min [3 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
[2] Xiaomi AI Lab, Beijing, Peoples R China
[3] Soochow Univ, Inst Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional knowledge distillation (KD) approaches are commonly employed to compress neural machine translation (NMT) models. However, they only obtain one lightweight student each time. Consequently, we have to conduct KD multiple times when different students are required at the same time, which could be resource-intensive. Additionally, these students are individually optimized, and thus lack interactions with each other, leading to their potential not being fully exerted. In this work, we propose a novel All-In-One Knowledge Distillation (AIO-KD) framework for NMT, which generates multiple satisfactory students at once. Under AIO-KD, we first randomly extract fewer-layer subnetworks from the teacher as the sample students. Then, we jointly optimize the teacher and these students, where the students simultaneously learn the knowledge from the teacher and interact with other students via mutual learning. When utilized, we re-extract the candidate students, satisfying the specifications of various devices. Particularly, we adopt carefully-designed strategies for AIO-KD: 1) we dynamically detach gradients to prevent poorly-performed students from negatively affecting the teacher during the knowledge transfer, which could subsequently impact other students; 2) we design a two-stage mutual learning strategy, which alleviates the negative impacts of poorly-performed students on the early-stage student interactions. Extensive experiments and in-depth analyses on three benchmarks demonstrate the effectiveness and eco-friendliness of AIO-KD. Our source code is available at https://github.com/DeepLearnXMU/AIO-KD.
引用
收藏
页码:2929 / 2940
页数:12
相关论文
共 50 条
  • [11] The Design of Hearing and hypnosis all-in-one Machine
    Zheng, Shiyong
    Li, Zhao
    Li, Biqing
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC, CONTROL AND AUTOMATION ENGINEERING (MECAE 2017), 2017, 61 : 306 - 309
  • [12] An All-in-One Bioinspired Neural Network br
    Radhakrishnan, Shiva Subbulakshmi
    Dodda, Akhil
    Das, Saptarshi
    ACS NANO, 2022, 16 (12) : 20100 - 20115
  • [13] Building a Multi-Domain Neural Machine Translation Model Using Knowledge Distillation
    Mghabbar, Idriss
    Ratnamogan, Pirashanth
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2116 - 2123
  • [14] Incorporating Multilingual Knowledge Distillation into Machine Translation Evaluation
    Zhang, Min
    Yang, Hao
    Tao, Shimin
    Zhao, Yanqing
    Qiao, Xiaosong
    Li, Yinlu
    Su, Chang
    Wang, Minghan
    Guo, Jiaxin
    Liu, Yilun
    Qin, Ying
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS THE DIGITAL ECONOMY, CCKS 2022, 2022, 1669 : 148 - 160
  • [15] CoRe optimizer: an all-in-one solution for machine learning
    Eckhoff, Marco
    Reiher, Markus
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (01):
  • [16] ESPnet-ST: All-in-One Speech Translation Toolkit
    Inaguma, Hirofumi
    Kiyono, Shun
    Duh, Kevin
    Karita, Shigeki
    Yalta, Nelson
    Hayashi, Tomoki
    Watanabe, Shinji
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): SYSTEM DEMONSTRATIONS, 2020, : 302 - 311
  • [17] An All-In-One Convolutional Neural Network for Face Analysis
    Ranjan, Rajeev
    Sankaranarayanan, Swami
    Castillo, Carlos D.
    Chellappa, Rama
    2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 17 - 24
  • [18] Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
    Zhang, Xinlu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    IEEE ACCESS, 2020, 8 : 206638 - 206645
  • [19] Sampling and Filtering of Neural Machine Translation Distillation Data
    Zouhar, Vilem
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1 - 8
  • [20] Integrating Prior Translation Knowledge Into Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 330 - 339