Multi-Granularity Self-Attention for Neural Machine Translation

被引:0
|
作者
Hao, Jie [1 ,2 ]
Wang, Xing [2 ]
Shi, Shuming [2 ]
Zhang, Jinfeng [1 ]
Tu, Zhaopeng [2 ]
机构
[1] Florida State Univ, Tallahassee, FL 32306 USA
[2] Tencent AI Lab, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (MG-SA): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalism. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling - a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveals that MG-SA indeed captures useful phrase information at various levels of granularities.
引用
收藏
页码:887 / 897
页数:11
相关论文
共 50 条
  • [1] Multi-granularity Metamorphic Testing for Neural Machine Translation System
    Zhong, Wen-Kang
    Ge, Ji-Dong
    Chen, Xiang
    Li, Chuan-Yi
    Tang, Ze
    Luo, Bin
    [J]. Ruan Jian Xue Bao/Journal of Software, 2021, 32 (04): : 1051 - 1066
  • [2] MGSAN: A Multi-granularity Self-attention Network for Next POI Recommendation
    Li, Yepeng
    Xian, Xuefeng
    Zhao, Pengpeng
    Liu, Yanchi
    Sheng, Victor S.
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2021, PT II, 2021, 13081 : 193 - 208
  • [3] Multi-granularity Knowledge Sharing in Low-resource Neural Machine Translation
    Mi, Chenggang
    Xie, Shaoliang
    Fan, Yi
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)
  • [4] Multi-scaled self-attention for drug–target interaction prediction based on multi-granularity representation
    Yuni Zeng
    Xiangru Chen
    Dezhong Peng
    Lijun Zhang
    Haixiao Huang
    [J]. BMC Bioinformatics, 23
  • [5] Domain-Aware Self-Attention for Multi-Domain Neural Machine Translation
    Zhang, Shiqi
    Liu, Yan
    Xiong, Deyi
    Zhang, Pei
    Chen, Boxing
    [J]. INTERSPEECH 2021, 2021, : 2047 - 2051
  • [6] Quality Estimation for Machine Translation with Multi-granularity Interaction
    Tian, Ke
    Zhang, Jiajun
    [J]. MACHINE TRANSLATION, CCMT 2020, 2020, 1328 : 55 - 65
  • [7] Microblog Sentiment Analysis with Multi-Head Self-Attention Pooling and Multi-Granularity Feature Interaction Fusion
    Yan, Shangyi
    Wang, Jingya
    Liu, Xiaowen
    Cui, Yumeng
    Tao, Zhizhong
    Zhang, Xiaofan
    [J]. Data Analysis and Knowledge Discovery, 2023, 7 (04): : 32 - 45
  • [8] Multi-scaled self-attention for drug-target interaction prediction based on multi-granularity representation
    Zeng, Yuni
    Chen, Xiangru
    Peng, Dezhong
    Zhang, Lijun
    Huang, Haixiao
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [9] Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
    Tang, Gongbo
    Mueller, Mathias
    Rios, Annette
    Sennrich, Rico
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4263 - 4272
  • [10] Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation
    Zhang, Zhebin
    Wu, Sai
    Chen, Gang
    Jiang, Dawei
    [J]. 11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 352 - 359