Multi-Granularity Self-Attention for Neural Machine Translation

被引:0
|
作者
Hao, Jie [1 ,2 ]
Wang, Xing [2 ]
Shi, Shuming [2 ]
Zhang, Jinfeng [1 ]
Tu, Zhaopeng [2 ]
机构
[1] Florida State Univ, Tallahassee, FL 32306 USA
[2] Tencent AI Lab, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (MG-SA): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalism. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling - a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveals that MG-SA indeed captures useful phrase information at various levels of granularities.
引用
收藏
页码:887 / 897
页数:11
相关论文
共 50 条
  • [21] Enhancing low-resource neural machine translation with syntax-graph guided self-attention
    Gong, Longchao
    Li, Yan
    Guo, Junjun
    Yu, Zhengtao
    Gao, Shengxiang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 246
  • [22] Information Extraction Network Based on Multi-Granularity Attention and Multi-Scale Self-Learning
    Sun, Weiwei
    Liu, Shengquan
    Liu, Yan
    Kong, Lingqi
    Jian, Zhaorui
    [J]. SENSORS, 2023, 23 (09)
  • [23] English Machine Translation Model Based on an Improved Self-Attention Technology
    Pan, Wenxia
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021
  • [24] Re-Transformer: A Self-Attention Based Model for Machine Translation
    Liu, Huey-Ing
    Chen, Wei-Lin
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10
  • [25] ADAPTIVE BI-DIRECTIONAL ATTENTION: EXPLORING MULTI-GRANULARITY REPRESENTATIONS FOR MACHINE READING COMPREHENSION
    Chen, Nuo
    Liu, Fenglin
    You, Chenyu
    Zhou, Peilin
    Zou, Yuexian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7833 - 7837
  • [26] A Multi-granularity Neural Network for Answer Sentence Selection
    Zhang, Chenggong
    Zhang, Weijuan
    Zha, Daren
    Ren, Pengjie
    Mu, Nan
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [27] Multi-granularity Hierarchical Attention Siamese Network for Visual Tracking
    Chen, Xing
    Zhang, Xiang
    Tan, Huibin
    Lan, Long
    Luo, Zhigang
    Huang, Xuhui
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [28] Progressive Multi-Granularity Training for Non-Autoregressive Translation
    Ding, Liang
    Wang, Longyue
    Liu, Xuebo
    Wong, Derek F.
    Tao, Dacheng
    Tu, Zhaopeng
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2797 - 2803
  • [29] Multi-granularity attention in attention for person re-identification in aerial images
    Xu, Simin
    Luo, Lingkun
    Hong, Haichao
    Hu, Jilin
    Yang, Bin
    Hu, Shiqiang
    [J]. VISUAL COMPUTER, 2024, 40 (06): : 4149 - 4166
  • [30] Neural News Recommendation with Multi-Head Self-Attention
    Wu, Chuhan
    Wu, Fangzhao
    Ge, Suyu
    Qi, Tao
    Huang, Yongfeng
    Xie, Xing
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6389 - 6394