Recurrent Attention for Neural Machine Translation

被引:0
|
作者
Zeng, Jiali [1 ]
Wu, Shuangzhi [1 ]
Yin, Yongjing [2 ]
Jiang, Yufan [1 ]
Li, Mu [1 ]
机构
[1] Tencent Cloud Xiaowei, Beijing, Peoples R China
[2] Zhejiang Univ, Westlake Univ, Hangzhou, Zhejiang, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research questions the importance of the dot-product self-attention in Transformer models and shows that most attention heads learn simple positional patterns. In this paper, we push further in this research line and propose a novel substitute mechanism for self-attention: Recurrent AtteNtion (RAN). RAN directly learns attention weights without any token-to-token interaction and further improves their capacity by layer-to-layer interaction. Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time. Particularly, when apply RAN to the decoder of Transformer, there brings consistent improvements by about +0.5 BLEU on 6 translation tasks and +1.0 BLEU on Turkish-English translation task. In addition, we conduct extensive analysis on the attention weights of RAN to confirm their reasonableness. Our RAN is a promising alternative to build more effective and efficient NMT models.
引用
收藏
页码:3216 / 3225
页数:10
相关论文
共 50 条
  • [1] Machine Translation for Indian Languages Utilizing Recurrent Neural Networks and Attention
    Sharma, Sonali
    Diwakar, Manoj
    [J]. DISTRIBUTED COMPUTING AND OPTIMIZATION TECHNIQUES, ICDCOT 2021, 2022, 903 : 593 - 602
  • [2] Neural Machine Translation with Deep Attention
    Zhang, Biao
    Xiong, Deyi
    Su, Jinsong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 154 - 163
  • [3] Variational Recurrent Neural Machine Translation
    Su, Jinsong
    Wu, Shan
    Xiong, Deyi
    Lu, Yaojie
    Han, Xianpei
    Zhang, Biao
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5488 - 5495
  • [4] Attention-via-Attention Neural Machine Translation
    Zhao, Shenjian
    Zhang, Zhihua
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 563 - 570
  • [5] Sparse and Constrained Attention for Neural Machine Translation
    Malaviya, Chaitanya
    Ferreira, Pedro
    Martins, Andre F. T.
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 370 - 376
  • [6] Bilingual attention based neural machine translation
    Kang, Liyan
    He, Shaojie
    Wang, Mingxuan
    Long, Fei
    Su, Jinsong
    [J]. APPLIED INTELLIGENCE, 2023, 53 (04) : 4302 - 4315
  • [7] Bilingual attention based neural machine translation
    Liyan Kang
    Shaojie He
    Mingxuan Wang
    Fei Long
    Jinsong Su
    [J]. Applied Intelligence, 2023, 53 : 4302 - 4315
  • [8] Attention Calibration for Transformer in Neural Machine Translation
    Lu, Yu
    Zeng, Jiali
    Zhang, Jiajun
    Wu, Shuangzhi
    Li, Mu
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1288 - 1298
  • [9] Parallel Attention Mechanisms in Neural Machine Translation
    Medina, Julian Richard
    Kalita, Jugal
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 547 - 552
  • [10] Recurrent Positional Embedding for Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1361 - 1367