Improve Shallow Decoder Based Transformer with Structured Expert Prediction

被引：0

作者：

Wang, Zongbing ^{[1
]}

Han, Jingru ^{[2
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Software Engn, Wuhan, Peoples R China

[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII | 2024年 / 15022卷

关键词：

Structured Expert; Transformer; Machine Translation; MIXTURES;

D O I：

10.1007/978-3-031-72350-6_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent work illustrates that a single-layer autoregressive decoder based on a sufficiently deep encoder (12-1 transformer) can also maintain excellent translation quality while achieving significant inference acceleration. However, we notice that simply decreasing the number of decoder layers may result in a certain degree of performance degradation and attribute it to the reduced decoder capacity. To remedy this, we propose to improve the 12-1 transformer with structured expert prediction. Our approach extends the traditional fixed single-layer autoregressive decoder to a set of experts, thereby increasing the decoder capacity to allow handling of various complex generation patterns in practical data. Meanwhile, only one expert is adaptively activated for each instance during inference, thus retaining almost comparable speed as the 12-1 transformer. Extensive results demonstrate that our approach can achieve consistent improvements over the 12-1 transformer, and also contributes to generating diverse translations.

引用

页码：224 / 234

页数：11

共 50 条

[1] TransCFD: A transformer-based decoder for flow field prediction
Jiang, Jundou
Li, Guanxiong
Jiang, Yi
Zhang, Laiping
Deng, Xiaogang
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
[2] SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder
Fu, Pengbin
Xiao, Ganyun
Yang, Huirong
VISUAL COMPUTER, 2025, 41 (02): : 883 - 900
[3] Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems
Alharthi, Khalid Ayedh
Jhumka, Arshad
Di, Sheng
Cappello, Franck
PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
[4] Recurrent Glimpse-based Decoder for Detection with Transformer
Chen, Zhe
Zhang, Jing
Tao, Dacheng
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5250 - 5259
[5] News headline generation based on improved decoder from transformer
Zhengpeng Li
Jiansheng Wu
Jiawei Miao
Xinmiao Yu
Scientific Reports, 12
[6] News headline generation based on improved decoder from transformer
Li, Zhengpeng
Wu, Jiansheng
Miao, Jiawei
Yu, Xinmiao
SCIENTIFIC REPORTS, 2022, 12 (01)
[7] Going "Deeper": Structured Sememe Prediction via Transformer with Tree Attention
Ye, Yining
Qi, Fanchao
Liu, Zhiyuan
Su, Maosong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 128 - 138
[8] Expert System Based Fault Detection of Power Transformer
Nagpal, Tapsi
Brar, Yadwinder Singh
JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2015, 12 (02) : 208 - 214
[9] Transformer Decoder Based Reinforcement Learning Approach for Conversational Response Generation
Faal, Farshid
Yu, Jia Yuan
Schmitt, Ketra
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[10] A confidence based tree structured decoder for private automatic branch exchange
Liu, J
Liu, J
Zhu, X
Xu, B
2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 772 - 775

← 1 2 3 4 5 →