Recurrent Attention for Neural Machine Translation

被引:0
|
作者
Zeng, Jiali [1 ]
Wu, Shuangzhi [1 ]
Yin, Yongjing [2 ]
Jiang, Yufan [1 ]
Li, Mu [1 ]
机构
[1] Tencent Cloud Xiaowei, Beijing, Peoples R China
[2] Zhejiang Univ, Westlake Univ, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN). RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.
引用
收藏
页码:3216 / 3225
页数:10
相关论文
共 50 条
  • [21] A Recursive Recurrent Neural Network for Statistical Machine Translation
    Liu, Shujie
    Yang, Nan
    Li, Mu
    Zhou, Ming
    [J]. PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 1491 - 1500
  • [22] Recurrent Neural Network Techniques: Emphasis on Use in Neural Machine Translation
    Suleiman, Dima
    Etaiwi, Wael
    Awajan, Arafat
    [J]. INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2021, 45 (07): : 107 - 114
  • [23] Recurrent stacking of layers in neural networks: An application to neural machine translation
    Dabre, Prasanna Raj Noel
    Fujita, Atsushi
    [J]. arXiv, 2021,
  • [24] MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation
    Mahata, Sainik Kumar
    Das, Dipankar
    Bandyopadhyay, Sivaji
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 447 - 453
  • [25] Attention over Heads: A Multi-Hop Attention for Neural Machine Translation
    Iida, Shohei
    Kimura, Ryuichiro
    Cui, Hongyi
    Hung, Po-Hsuan
    Utsuro, Takehito
    Nagata, Masaaki
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 217 - 222
  • [26] Look Harder: A Neural Machine Translation Model with Hard Attention
    Indurthi, Sathish
    Chung, Insoo
    Kim, Sangha
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3037 - 3043
  • [27] Training Deeper Neural Machine Translation Models with Transparent Attention
    Bapna, Ankur
    Chen, Mia Xu
    Firat, Orhan
    Cao, Yuan
    Wu, Yonghui
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3028 - 3033
  • [28] Recursive Annotations for Attention-Based Neural Machine Translation
    Ye, Shaolin
    Guo, Wu
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 164 - 167
  • [29] Fine-grained attention mechanism for neural machine translation
    Choi, Heeyoul
    Cho, Kyunghyun
    Bengio, Yoshua
    [J]. NEUROCOMPUTING, 2018, 284 : 171 - 176
  • [30] Training with Adversaries to Improve Faithfulness of Attention in Neural Machine Translation
    Moradi, Pooya
    Kambhatla, Nishant
    Sarkar, Anoop
    [J]. AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 86 - 93