Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures

被引:0
|
作者
Tang, Gongbo [1 ]
Mueller, Mathias [2 ]
Rios, Annette [2 ]
Sennrich, Rico [2 ,3 ]
机构
[1] Uppsala Univ, Dept Linguist & Philol, Uppsala, Sweden
[2] Univ Zurich, Inst Computat Linguist, Zurich, Switzerland
[3] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks: subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.
引用
收藏
页码:4263 / 4272
页数:10
相关论文
共 50 条
  • [1] Self-Attention Neural Machine Translation for Automatic Software Repair
    Cao, He-Ling
    Liu, Yu
    Han, Dong
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (03): : 945 - 956
  • [2] Multi-Granularity Self-Attention for Neural Machine Translation
    Hao, Jie
    Wang, Xing
    Shi, Shuming
    Zhang, Jinfeng
    Tu, Zhaopeng
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 887 - 897
  • [3] Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation
    Zhang, Zhebin
    Wu, Sai
    Chen, Gang
    Jiang, Dawei
    [J]. 11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 352 - 359
  • [4] Domain-Aware Self-Attention for Multi-Domain Neural Machine Translation
    Zhang, Shiqi
    Liu, Yan
    Xiong, Deyi
    Zhang, Pei
    Chen, Boxing
    [J]. INTERSPEECH 2021, 2021, : 2047 - 2051
  • [5] A neural machine translation method based on split graph convolutional self-attention encoding
    Wan, Fei
    Li, Ping
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [6] RESA: Relation Enhanced Self-Attention for Low-Resource Neural Machine Translation
    Wu, Xing
    Shi, Shumin
    Huang, Heyan
    [J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 159 - 164
  • [7] Enhancing Machine Translation with Dependency-Aware Self-Attention
    Bugliarello, Emanuele
    Okazaki, Naoaki
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1618 - 1627
  • [8] Self-Attention Architectures for Answer-Agnostic Neural Question Generation
    Scialom, Thomas
    Piwowarski, Benjamin
    Staiano, Jacopo
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6027 - 6032
  • [9] Enhancing low-resource neural machine translation with syntax-graph guided self-attention
    Gong, Longchao
    Li, Yan
    Guo, Junjun
    Yu, Zhengtao
    Gao, Shengxiang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 246
  • [10] Re-Transformer: A Self-Attention Based Model for Machine Translation
    Liu, Huey-Ing
    Chen, Wei-Lin
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10