Improve Shallow Decoder Based Transformer with Structured Expert Prediction

被引:0
|
作者
Wang, Zongbing [1 ]
Han, Jingru [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian, Peoples R China
关键词
Structured Expert; Transformer; Machine Translation; MIXTURES;
D O I
10.1007/978-3-031-72350-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent work illustrates that a single-layer autoregressive decoder based on a sufficiently deep encoder (12-1 transformer) can also maintain excellent translation quality while achieving significant inference acceleration. However, we notice that simply decreasing the number of decoder layers may result in a certain degree of performance degradation and attribute it to the reduced decoder capacity. To remedy this, we propose to improve the 12-1 transformer with structured expert prediction. Our approach extends the traditional fixed single-layer autoregressive decoder to a set of experts, thereby increasing the decoder capacity to allow handling of various complex generation patterns in practical data. Meanwhile, only one expert is adaptively activated for each instance during inference, thus retaining almost comparable speed as the 12-1 transformer. Extensive results demonstrate that our approach can achieve consistent improvements over the 12-1 transformer, and also contributes to generating diverse translations.
引用
收藏
页码:224 / 234
页数:11
相关论文
共 50 条
  • [41] Research on Transformer Fault Diagnosis Expert System Based on DGA Database
    Peng, Zhenghong
    Song, Bin
    ICIC 2009: SECOND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTING SCIENCE, VOL 2, PROCEEDINGS: IMAGE ANALYSIS, INFORMATION AND SIGNAL PROCESSING, 2009, : 29 - +
  • [42] Traffic Transformer: Transformer-based framework for temporal traffic accident prediction
    Al-Thani, Mansoor G.
    Sheng, Ziyu
    Cao, Yuting
    Yang, Yin
    AIMS MATHEMATICS, 2024, 9 (05): : 12610 - 12629
  • [43] A Generic Shallow Lake Ecosystem Model Based on Collective Expert Knowledge
    Can Ozan Tan
    Uygar Özesmi
    Hydrobiologia, 2006, 563 : 125 - 142
  • [44] A generic shallow lake ecosystem model based on collective expert knowledge
    Tan, Can Ozan
    Ozesmi, Uygar
    HYDROBIOLOGIA, 2006, 563 (1) : 125 - 142
  • [45] Structured weight-based prediction algorithms
    Maruoka, A
    Takimoto, E
    ALGORITHMIC LEARNING THEORY, 1998, 1501 : 127 - 142
  • [46] Research on Transformer Condition Prediction Based on Gas Prediction and Fault Diagnosis
    Ding, Can
    Chen, Wenhui
    Yu, Donghai
    Yan, Yongcan
    ENERGIES, 2024, 17 (16)
  • [47] Code Generation Method based on Structured Tree Input and AST Decoder Attention Augmentation
    Wei, Wenjun
    Wu, Junhua
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 477 - 484
  • [48] Asymmetric Encoder-Decoder Structured FCN Based LiDAR to Color Image Generation
    Kim, Hyun-Koo
    Yoo, Kook-Yeol
    Park, Ju H.
    Jung, Ho-Youl
    SENSORS, 2019, 19 (21)
  • [49] Word Alignment Based Transformer Model for XML Structured Documentation Translation
    An, Jing
    Tang, Yecheng
    Bai, Yanbing
    Li, Jiyi
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT I, 2022, 13426 : 316 - 322
  • [50] Structured image super-resolution network based on improved Transformer
    Lv X.-D.
    Li J.
    Deng Z.-N.
    Feng H.
    Cui X.-T.
    Deng H.-X.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (05): : 865 - 874+910