Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

被引：0

作者：

Tang, Gongbo ^{[1
]}

Mueller, Mathias ^{[2
]}

Rios, Annette ^{[2
]}

Sennrich, Rico ^{[2
,3
]}

机构：

[1] Uppsala Univ, Dept Linguist & Philol, Uppsala, Sweden

[2] Univ Zurich, Inst Computat Linguist, Zurich, Switzerland

[3] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland

来源：

2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018) | 2018年

基金：

瑞士国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

引用

页码：4263 / 4272

页数：10

共 50 条

[1] Self-Attention Neural Machine Translation for Automatic Software Repair
Cao, He-Ling
Liu, Yu
Han, Dong
[J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (03): : 945 - 956
[2] Multi-Granularity Self-Attention for Neural Machine Translation
Hao, Jie
Wang, Xing
Shi, Shuming
Zhang, Jinfeng
Tu, Zhaopeng
[J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 887 - 897
[3] Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation
Zhang, Zhebin
Wu, Sai
Chen, Gang
Jiang, Dawei
[J]. 11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 352 - 359
[4] Domain-Aware Self-Attention for Multi-Domain Neural Machine Translation
Zhang, Shiqi
Liu, Yan
Xiong, Deyi
Zhang, Pei
Chen, Boxing
[J]. INTERSPEECH 2021, 2021, : 2047 - 2051
[5] A neural machine translation method based on split graph convolutional self-attention encoding
Wan, Fei
Li, Ping
[J]. PEERJ COMPUTER SCIENCE, 2024, 10
[6] RESA: Relation Enhanced Self-Attention for Low-Resource Neural Machine Translation
Wu, Xing
Shi, Shumin
Huang, Heyan
[J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 159 - 164
[7] Enhancing Machine Translation with Dependency-Aware Self-Attention
Bugliarello, Emanuele
Okazaki, Naoaki
[J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1618 - 1627
[8] Self-Attention Architectures for Answer-Agnostic Neural Question Generation
Scialom, Thomas
Piwowarski, Benjamin
Staiano, Jacopo
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6027 - 6032
[9] Enhancing low-resource neural machine translation with syntax-graph guided self-attention
Gong, Longchao
Li, Yan
Guo, Junjun
Yu, Zhengtao
Gao, Shengxiang
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 246
[10] Re-Transformer: A Self-Attention Based Model for Machine Translation
Liu, Huey-Ing
Chen, Wei-Lin
[J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10

← 1 2 3 4 5 →