Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation

被引:0
|
作者
Miao, Zhongjian [1 ,2 ]
Zhang, Wen [2 ]
Su, Jinsong [1 ]
Li, Xiang [2 ]
Luan, Jian [2 ]
Chen, Yidong [1 ]
Wang, Bin [2 ]
Zhang, Min [3 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
[2] Xiaomi AI Lab, Beijing, Peoples R China
[3] Soochow Univ, Inst Comp Sci & Technol, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional knowledge distillation (KD) approaches are commonly employed to compress neural machine translation (NMT) models. However, they only obtain one lightweight student each time. Consequently, we have to conduct KD multiple times when different students are required at the same time, which could be resource-intensive. Additionally, these students are individually optimized, and thus lack interactions with each other, leading to their potential not being fully exerted. In this work, we propose a novel All-In-One Knowledge Distillation (AIO-KD) framework for NMT, which generates multiple satisfactory students at once. Under AIO-KD, we first randomly extract fewer-layer subnetworks from the teacher as the sample students. Then, we jointly optimize the teacher and these students, where the students simultaneously learn the knowledge from the teacher and interact with other students via mutual learning. When utilized, we re-extract the candidate students, satisfying the specifications of various devices. Particularly, we adopt carefully-designed strategies for AIO-KD: 1) we dynamically detach gradients to prevent poorly-performed students from negatively affecting the teacher during the knowledge transfer, which could subsequently impact other students; 2) we design a two-stage mutual learning strategy, which alleviates the negative impacts of poorly-performed students on the early-stage student interactions. Extensive experiments and in-depth analyses on three benchmarks demonstrate the effectiveness and eco-friendliness of AIO-KD. Our source code is available at https://github.com/DeepLearnXMU/AIO-KD.
引用
收藏
页码:2929 / 2940
页数:12
相关论文
共 50 条
  • [41] Multi-Teacher Distillation With Single Model for Neural Machine Translation
    Liang, Xiaobo
    Wu, Lijun
    Li, Juntao
    Qin, Tao
    Zhang, Min
    Liu, Tie-Yan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 992 - 1002
  • [42] IMPROVING ADVERSARIAL NEURAL MACHINE TRANSLATION WITH PRIOR KNOWLEDGE
    Yang, Yating
    Li, Xiao
    Jiang, Tonghai
    Kong, Jinying
    Ma, Bo
    Zhou, Xi
    Wang, Lei
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 1373 - 1377
  • [43] Translating with Bilingual Topic Knowledge for Neural Machine Translation
    Wei, Xiangpeng
    Hu, Yue
    Xing, Luxi
    Wang, Yipeng
    Gao, Li
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7257 - 7264
  • [44] Utilizing Knowledge Graphs for Neural Machine Translation Augmentation
    Moussallem, Diego
    Ngomo, Axel-Cyrille Ngonga
    Buitelaar, Paul
    Arcan, Mihael
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, : 139 - 146
  • [45] KNOWLEDGE GRAPHS EFFECTIVENESS IN NEURAL MACHINE TRANSLATION IMPROVEMENT
    Ahmadnia, Benyamin
    Dorr, Bonnie J.
    Kordjamshidi, Parisa
    COMPUTER SCIENCE-AGH, 2020, 21 (03): : 287 - 306
  • [46] Linguistic Knowledge-Aware Neural Machine Translation
    Li, Qiang
    Wong, Derek F.
    Chao, Lidia S.
    Zhu, Muhua
    Xiao, Tong
    Zhu, Jingbo
    Zhang, Min
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2341 - 2354
  • [47] Restricted or Not: A General Training Framework for Neural Machine Translation
    Li, Zuchao
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Hai
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 245 - 251
  • [48] Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings
    Wang, Weixuan
    Peng, Wei
    Zhang, Meng
    Liu, Qun
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3197 - 3202
  • [49] DeepAdjoint: An All-in-One Photonic Inverse Design Framework Integrating Data-Driven Machine Learning with Optimization Algorithms
    Yeung, Christopher
    Pham, Benjamin
    Tsai, Ryan
    Fountaine, Katherine T.
    Raman, Aaswath P.
    ACS PHOTONICS, 2023, 10 (04): : 884 - 891
  • [50] All-in-One Mobile Outdoor Augmented Reality Framework for Cutural Heritage Site
    Park, Noh-Young
    Kim, Eunseok
    Lee, Jongwon
    Woo, Woontack
    2016 12TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS (SITIS), 2016, : 484 - 489