Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation

被引:0
|
作者
Miao, Zhongjian [1 ,2 ]
Zhang, Wen [2 ]
Su, Jinsong [1 ]
Li, Xiang [2 ]
Luan, Jian [2 ]
Chen, Yidong [1 ]
Wang, Bin [2 ]
Zhang, Min [3 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
[2] Xiaomi AI Lab, Beijing, Peoples R China
[3] Soochow Univ, Inst Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional knowledge distillation (KD) approaches are commonly employed to compress neural machine translation (NMT) models. However, they only obtain one lightweight student each time. Consequently, we have to conduct KD multiple times when different students are required at the same time, which could be resource-intensive. Additionally, these students are individually optimized, and thus lack interactions with each other, leading to their potential not being fully exerted. In this work, we propose a novel All-In-One Knowledge Distillation (AIO-KD) framework for NMT, which generates multiple satisfactory students at once. Under AIO-KD, we first randomly extract fewer-layer subnetworks from the teacher as the sample students. Then, we jointly optimize the teacher and these students, where the students simultaneously learn the knowledge from the teacher and interact with other students via mutual learning. When utilized, we re-extract the candidate students, satisfying the specifications of various devices. Particularly, we adopt carefully-designed strategies for AIO-KD: 1) we dynamically detach gradients to prevent poorly-performed students from negatively affecting the teacher during the knowledge transfer, which could subsequently impact other students; 2) we design a two-stage mutual learning strategy, which alleviates the negative impacts of poorly-performed students on the early-stage student interactions. Extensive experiments and in-depth analyses on three benchmarks demonstrate the effectiveness and eco-friendliness of AIO-KD. Our source code is available at https://github.com/DeepLearnXMU/AIO-KD.
引用
收藏
页码:2929 / 2940
页数:12
相关论文
共 50 条
  • [21] Incorporating Statistical Machine Translation Word Knowledge Into Neural Machine Translation
    Wang, Xing
    Tu, Zhaopeng
    Zhang, Min
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2255 - 2266
  • [22] All-in-One Framework for Detection, Unpacking, and Verification for Malware Analysis
    Choi, Mi-Jung
    Bang, Jiwon
    Kim, Jongwook
    Kim, Hajin
    Moon, Yang-Sae
    SECURITY AND COMMUNICATION NETWORKS, 2019, 2019
  • [23] Marine innovations, knowledge and experience presented at all-in-one event
    HSB International, 2006, 55 (10): : 62 - 63
  • [24] Continual Learning with Confidence-based Multi-teacher Knowledge Distillation for Neural Machine Translation
    Guo, Jiahua
    Liang, Yunlong
    Xu, Jinan
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 336 - 343
  • [25] One Sentence One Model for Neural Machine Translation
    Li, Xiaoqing
    Zhang, Jiajun
    Zong, Chengqing
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 910 - 917
  • [26] All-in-one smart dressing for simultaneous angiogenesis and neural regeneration
    Yuan, Tiejun
    Tan, Minhong
    Xu, Yang
    Xiao, Qiyao
    Wang, Hui
    Wu, Chen
    Li, Fulun
    Peng, Lihua
    JOURNAL OF NANOBIOTECHNOLOGY, 2023, 21 (01)
  • [27] All-in-one smart dressing for simultaneous angiogenesis and neural regeneration
    Tiejun Yuan
    Minhong Tan
    Yang Xu
    Qiyao Xiao
    Hui Wang
    Chen Wu
    Fulun Li
    Lihua Peng
    Journal of Nanobiotechnology, 21
  • [28] Neural Degradation Representation Learning for All-in-One Image Restoration
    Yao, Mingde
    Xu, Ruikang
    Guan, Yuanshen
    Huang, Jie
    Xiong, Zhiwei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5408 - 5423
  • [29] A General Framework for Adaptation of Neural Machine Translation to Simultaneous Translation
    Chen, Yun
    Li, Liangyou
    Jiang, Xin
    Chen, Xiao
    Liu, Qun
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 191 - 200
  • [30] Knowledge Graphs Enhanced Neural Machine Translation
    Zhao, Yang
    Zhang, Jiajun
    Zhou, Yu
    Zong, Chengqing
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4039 - 4045