Attention over Heads: A Multi-Hop Attention for Neural Machine Translation

被引:0
|
作者
Iida, Shohei [1 ]
Kimura, Ryuichiro [1 ]
Cui, Hongyi [1 ]
Hung, Po-Hsuan [1 ]
Utsuro, Takehito [1 ]
Nagata, Masaaki [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a multi-hop attention for the Transformer. It refines the attention for an output symbol by integrating that of each head, and consists of two hops. The first hop attention is the scaled dot-product attention which is the same attention mechanism used in the original Transformer. The second hop attention is a combination of multi-layer perceptron (MLP) attention and head gate, which efficiently increases the complexity of the model by adding dependencies between heads. We demonstrate that the translation accuracy of the proposed multi-hop attention outperforms the baseline Transformer significantly, +0.85 BLEU point for the IWSLT-2017 German-to-English task and +2.58 BLEU point for the WMT-2017 German-to-English task. We also find that the number of parameters required for a multi-hop attention is smaller than that for stacking another self-attention layer and the proposed model converges significantly faster than the original Transformer.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 50 条
  • [1] Multi-hop Attention Graph Neural Networks
    Wang, Guangtao
    Ying, Rex
    Huang, Jing
    Leskovec, Jure
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3089 - 3096
  • [2] Towards Understanding Neural Machine Translation with Attention Heads' Importance
    Zhou, Zijie
    Zhu, Junguo
    Li, Weijiang
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (07):
  • [3] Explainable Neural Subgraph Matching With Learnable Multi-Hop Attention
    Nguyen, Duc Q.
    Toan Nguyen, Thanh
    Jo, Jun
    Poux, Florent
    Anirban, Shikha
    Quan, Tho T.
    [J]. IEEE ACCESS, 2024, 12 : 130474 - 130492
  • [4] Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation
    Behnke, Maximiliana
    Heafield, Kenneth
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2664 - 2674
  • [5] ClueReader: Heterogeneous Graph Attention Network for Multi-Hop Machine Reading Comprehension
    Gao, Peng
    Gao, Feng
    Wang, Peng
    Ni, Jian-Cheng
    Wang, Fei
    Fujita, Hamido
    [J]. ELECTRONICS, 2023, 12 (14)
  • [6] Attention-via-Attention Neural Machine Translation
    Zhao, Shenjian
    Zhang, Zhihua
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 563 - 570
  • [7] Multi-hop Attention GNN with Answer-Evidence Contrastive Loss for Multi-hop QA
    Yang, Ni
    Yang, Meng
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] Ruminating Reader: Reasoning with Gated Multi-Hop Attention
    Gong, Yichen
    Bowman, Samuel R.
    [J]. MACHINE READING FOR QUESTION ANSWERING, 2018, : 1 - 11
  • [9] Recurrent Attention for Neural Machine Translation
    Zeng, Jiali
    Wu, Shuangzhi
    Yin, Yongjing
    Jiang, Yufan
    Li, Mu
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3216 - 3225
  • [10] Neural Machine Translation with Deep Attention
    Zhang, Biao
    Xiong, Deyi
    Su, Jinsong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 154 - 163