Improve Shallow Decoder Based Transformer with Structured Expert Prediction

被引:0
|
作者
Wang, Zongbing [1 ]
Han, Jingru [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian, Peoples R China
关键词
Structured Expert; Transformer; Machine Translation; MIXTURES;
D O I
10.1007/978-3-031-72350-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent work illustrates that a single-layer autoregressive decoder based on a sufficiently deep encoder (12-1 transformer) can also maintain excellent translation quality while achieving significant inference acceleration. However, we notice that simply decreasing the number of decoder layers may result in a certain degree of performance degradation and attribute it to the reduced decoder capacity. To remedy this, we propose to improve the 12-1 transformer with structured expert prediction. Our approach extends the traditional fixed single-layer autoregressive decoder to a set of experts, thereby increasing the decoder capacity to allow handling of various complex generation patterns in practical data. Meanwhile, only one expert is adaptively activated for each instance during inference, thus retaining almost comparable speed as the 12-1 transformer. Extensive results demonstrate that our approach can achieve consistent improvements over the 12-1 transformer, and also contributes to generating diverse translations.
引用
收藏
页码:224 / 234
页数:11
相关论文
共 50 条
  • [31] A Rule Based Expert System for Syncope Prediction
    Guftar, Madiha
    Qamar, Usman
    2015 SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2015, : 559 - 564
  • [32] ELiFormer: A hierarchical Transformer based Model with Efficient Encoder and Lightweight Decoder for Semantic Segmentation
    Wu, Zixuan
    Zhou, Yue
    2024 2ND ASIA CONFERENCE ON COMPUTER VISION, IMAGE PROCESSING AND PATTERN RECOGNITION, CVIPPR 2024, 2024,
  • [33] A Transformer-Based Decoder for Semantic Segmentation with Multi-level Context Mining
    Shi, Bowen
    Jiang, Dongsheng
    Zhang, Xiaopeng
    Li, Han
    Dai, Wenrui
    Zou, Junni
    Xiong, Hongkai
    Tian, Qi
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 624 - 639
  • [34] A Modified Prediction Scheme of the H.264 Multiview Video Coding to Improve the Decoder Performance
    Hamadan, Ayman M.
    Aly, Hussein A.
    Fouad, Mohamed M.
    Dansereau, Richard M.
    REAL-TIME IMAGE AND VIDEO PROCESSING 2013, 2013, 8656
  • [35] Robust pavement crack segmentation network based on transformer and dual-branch decoder
    Yu, Zhenwei
    Chen, Qinyu
    Shen, Yonggang
    Zhang, Yiping
    CONSTRUCTION AND BUILDING MATERIALS, 2024, 453
  • [36] Research on fuzzy diagnosis expert system based on behavior of DGA in transformer
    Sun, C.X.
    Guo, J.F.
    Zheng, H.P.
    Cao, Y.
    Kang, P.
    Diangong Jishu Xuebao/Transactions of China Electrotechnical Society, 2001, 16 (03):
  • [37] Design of The Transformer Fault Diagnosis Expert System Based on Fuzzy Reasoning
    Shi Jiangping
    Tong Weiguang
    Wang Daling
    2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 110 - +
  • [38] Transformer insulation fault diagnosis method based on fuzzy expert systems
    Su, H. S.
    Li, Q. Z.
    ICPASM 2005: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON PROPERTIES AND APPLICATIONS OF DIELECTRIC MATERIALS, VOLS 1 AND 2, 2006, : 343 - +
  • [39] Design and Realization of Transformer Fault Diagnostic Expert System Based on Drools
    Li, Mu
    Lu, Wenhua
    Xiang, Dongdong
    Wen, Zhengqi
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 1583 - 1587
  • [40] Transformer and Graph Transformer-Based Prediction of Drug-Target Interactions
    Qian, Meiling
    Lu, Weizhong
    Zhang, Yu
    Liu, Junkai
    Wu, Hongjie
    Lu, Yaoyao
    Li, Haiou
    Fu, Qiming
    Shen, Jiyun
    Xiao, Yongbiao
    CURRENT BIOINFORMATICS, 2024, 19 (05) : 470 - 481