Improve Shallow Decoder Based Transformer with Structured Expert Prediction

被引:0
|
作者
Wang, Zongbing [1 ]
Han, Jingru [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian, Peoples R China
关键词
Structured Expert; Transformer; Machine Translation; MIXTURES;
D O I
10.1007/978-3-031-72350-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent work illustrates that a single-layer autoregressive decoder based on a sufficiently deep encoder (12-1 transformer) can also maintain excellent translation quality while achieving significant inference acceleration. However, we notice that simply decreasing the number of decoder layers may result in a certain degree of performance degradation and attribute it to the reduced decoder capacity. To remedy this, we propose to improve the 12-1 transformer with structured expert prediction. Our approach extends the traditional fixed single-layer autoregressive decoder to a set of experts, thereby increasing the decoder capacity to allow handling of various complex generation patterns in practical data. Meanwhile, only one expert is adaptively activated for each instance during inference, thus retaining almost comparable speed as the 12-1 transformer. Extensive results demonstrate that our approach can achieve consistent improvements over the 12-1 transformer, and also contributes to generating diverse translations.
引用
收藏
页码:224 / 234
页数:11
相关论文
共 50 条
  • [1] TransCFD: A transformer-based decoder for flow field prediction
    Jiang, Jundou
    Li, Guanxiong
    Jiang, Yi
    Zhang, Laiping
    Deng, Xiaogang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [2] SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder
    Fu, Pengbin
    Xiao, Ganyun
    Yang, Huirong
    VISUAL COMPUTER, 2025, 41 (02): : 883 - 900
  • [3] Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems
    Alharthi, Khalid Ayedh
    Jhumka, Arshad
    Di, Sheng
    Cappello, Franck
    PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
  • [4] Recurrent Glimpse-based Decoder for Detection with Transformer
    Chen, Zhe
    Zhang, Jing
    Tao, Dacheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5250 - 5259
  • [5] News headline generation based on improved decoder from transformer
    Zhengpeng Li
    Jiansheng Wu
    Jiawei Miao
    Xinmiao Yu
    Scientific Reports, 12
  • [6] News headline generation based on improved decoder from transformer
    Li, Zhengpeng
    Wu, Jiansheng
    Miao, Jiawei
    Yu, Xinmiao
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [7] Going "Deeper": Structured Sememe Prediction via Transformer with Tree Attention
    Ye, Yining
    Qi, Fanchao
    Liu, Zhiyuan
    Su, Maosong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 128 - 138
  • [8] Expert System Based Fault Detection of Power Transformer
    Nagpal, Tapsi
    Brar, Yadwinder Singh
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2015, 12 (02) : 208 - 214
  • [9] Transformer Decoder Based Reinforcement Learning Approach for Conversational Response Generation
    Faal, Farshid
    Yu, Jia Yuan
    Schmitt, Ketra
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [10] A confidence based tree structured decoder for private automatic branch exchange
    Liu, J
    Liu, J
    Zhu, X
    Xu, B
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 772 - 775