Improve Shallow Decoder Based Transformer with Structured Expert Prediction

被引:0
|
作者
Wang, Zongbing [1 ]
Han, Jingru [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian, Peoples R China
关键词
Structured Expert; Transformer; Machine Translation; MIXTURES;
D O I
10.1007/978-3-031-72350-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent work illustrates that a single-layer autoregressive decoder based on a sufficiently deep encoder (12-1 transformer) can also maintain excellent translation quality while achieving significant inference acceleration. However, we notice that simply decreasing the number of decoder layers may result in a certain degree of performance degradation and attribute it to the reduced decoder capacity. To remedy this, we propose to improve the 12-1 transformer with structured expert prediction. Our approach extends the traditional fixed single-layer autoregressive decoder to a set of experts, thereby increasing the decoder capacity to allow handling of various complex generation patterns in practical data. Meanwhile, only one expert is adaptively activated for each instance during inference, thus retaining almost comparable speed as the 12-1 transformer. Extensive results demonstrate that our approach can achieve consistent improvements over the 12-1 transformer, and also contributes to generating diverse translations.
引用
收藏
页码:224 / 234
页数:11
相关论文
共 50 条
  • [21] Transformer-based Sparse Encoder and Answer Decoder for Visual Question Answering
    Peng, Longkun
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 120 - 123
  • [22] Synthetic lethal connectivity and graph transformer improve synthetic lethality prediction
    Fan, Kunjie
    Gokbag, Birkan
    Tang, Shan
    Li, Shangjia
    Huang, Yirui
    Wang, Lingling
    Cheng, Lijun
    Li, Lang
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (05)
  • [23] Search-based structured prediction
    Daume, Hal, III
    Langford, John
    Marcu, Daniel
    MACHINE LEARNING, 2009, 75 (03) : 297 - 325
  • [24] Search-based structured prediction
    Hal Daumé
    John Langford
    Daniel Marcu
    Machine Learning, 2009, 75 : 297 - 325
  • [25] SuTraN: an Encoder-Decoder Transformer for Full-Context-Aware Suffix Prediction of Business Processes
    Wuyts, Brecht
    vanden Broucke, Seppe
    De Weerde, Jochen
    2024 6TH INTERNATIONAL CONFERENCE ON PROCESS MINING, ICPM, 2024, : 17 - 24
  • [26] Pavement Roughness Prediction Based on Encoder-decoder Structure
    Guo R.
    Yu X.
    Tongji Daxue Xuebao/Journal of Tongji University, 2023, 51 (08): : 1182 - 1190
  • [27] Joint Intention and Trajectory Prediction Based on Transformer
    Sui, Ze
    Zhou, Yue
    Zhao, Xu
    Chen, Ao
    Ni, Yiyang
    IEEE International Conference on Intelligent Robots and Systems, 2021, : 7082 - 7088
  • [28] Joint Intention and Trajectory Prediction Based on Transformer
    Sui, Ze
    Zhou, Yue
    Zhao, Xu
    Chen, Ao
    Ni, Yiyang
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 7082 - 7088
  • [29] EFFICIENT GPU-BASED INTER PREDICTION FOR VIDEO DECODER
    Jiang, Bo
    Luo, Falei
    Wang, Shanshe
    Guo, Xiaoqiang
    Ma, Siwei
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1109 - 1113
  • [30] Network security situation prediction based on Transformer
    Zhao, Dongmei
    Li, Zhijian
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2022, 50 (05): : 46 - 52