Recurrent Attention for Neural Machine Translation

被引:0
|
作者
Zeng, Jiali [1 ]
Wu, Shuangzhi [1 ]
Yin, Yongjing [2 ]
Jiang, Yufan [1 ]
Li, Mu [1 ]
机构
[1] Tencent Cloud Xiaowei, Beijing, Peoples R China
[2] Zhejiang Univ, Westlake Univ, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN). RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.
引用
收藏
页码:3216 / 3225
页数:10
相关论文
共 50 条
  • [41] English to Nepali Sentence Translation Using Recurrent Neural Network with Attention
    Nemkul, Kriti
    Shakya, Subarna
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, AND INTELLIGENT SYSTEMS (ICCCIS), 2021, : 607 - 611
  • [42] Context based machine translation with recurrent neural network for English-Amharic translation
    Ashengo, Yeabsira Asefa
    Aga, Rosa Tsegaye
    Abebe, Surafel Lemma
    [J]. MACHINE TRANSLATION, 2021, 35 (01) : 19 - 36
  • [43] Hybrid Attention for Chinese Character-Level Neural Machine Translation
    Wang, Feng
    Chen, Wei
    Yang, Zhen
    Xu, Shuang
    Xu, Bo
    [J]. NEUROCOMPUTING, 2019, 358 : 44 - 52
  • [44] Neural Machine Translation Models with Attention-Based Dropout Layer
    Israr, Huma
    Khan, Safdar Abbas
    Tahir, Muhammad Ali
    Shahzad, Muhammad Khuram
    Ahmad, Muneer
    Zain, Jasni Mohamad
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 2981 - 3009
  • [45] Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings
    Kuang, Shaohui
    Li, Junhui
    Branco, Antonio
    Luo, Weihua
    Xiong, Deyi
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1767 - 1776
  • [46] Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation
    Behnke, Maximiliana
    Heafield, Kenneth
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2664 - 2674
  • [47] Multi-Granularity Self-Attention for Neural Machine Translation
    Hao, Jie
    Wang, Xing
    Shi, Shuming
    Zhang, Jinfeng
    Tu, Zhaopeng
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 887 - 897
  • [48] Neural Machine Translation with Attention Based on a New Syntactic Branch Distance
    Peng, Ru
    Chen, Zhitao
    Hao, Tianyong
    Fang, Yi
    [J]. MACHINE TRANSLATION, CCMT 2019, 2019, 1104 : 47 - 57
  • [49] An Effective Coverage Approach for Attention-based Neural Machine Translation
    Hoang-Quan Nguyen
    Thuan-Minh Nguyen
    Huy-Hien Vu
    Van-Vinh Nguyen
    Phuong-Thai Nguyen
    Thi-Nga-My Dao
    Kieu-Hue Tran
    Khac-Quy Dinh
    [J]. PROCEEDINGS OF 2019 6TH NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT (NAFOSTED) CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2019, : 240 - 245
  • [50] Self-Attention Neural Machine Translation for Automatic Software Repair
    Cao, He-Ling
    Liu, Yu
    Han, Dong
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (03): : 945 - 956