Towards Understanding Neural Machine Translation with Attention Heads' Importance

被引:0
|
作者
Zhou, Zijie [1 ,2 ]
Zhu, Junguo [1 ,2 ]
Li, Weijiang [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Prov Key Lab Artificial Intelligence, Kunming 650500, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 07期
基金
中国国家自然科学基金;
关键词
neural machine translation; interpretability; linguistics;
D O I
10.3390/app14072798
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Although neural machine translation has made great progress, and the Transformer has advanced the state-of-the-art in various language pairs, the decision-making process of the attention mechanism, a crucial component of the Transformer, remains unclear. In this paper, we propose to understand the model's decisions by the attention heads' importance. We explore the knowledge acquired by the attention heads, elucidating the decision-making process through the lens of linguistic understanding. Specifically, we quantify the importance of each attention head by assessing its contribution to neural machine translation performance, employing a Masking Attention Heads approach. We evaluate the method and investigate the distribution of attention heads' importance, as well as its correlation with part-of-speech contribution. To understand the diverse decisions made by attention heads, we concentrate on analyzing multi-granularity linguistic knowledge. Our findings indicate that specialized heads play a crucial role in learning linguistics. By retaining important attention heads and removing the unimportant ones, we can optimize the attention mechanism. This optimization leads to a reduction in the number of model parameters and an increase in the model's speed. Moreover, by leveraging the connection between attention heads and multi-granular linguistic knowledge, we can enhance the model's interpretability. Consequently, our research provides valuable insights for the design of improved NMT models.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Towards Understanding Neural Machine Translation with Word Importance
    He, Shilin
    Tu, Zhaopeng
    Wang, Xing
    Wang, Longyue
    Lyu, Michael R.
    Shi, Shuming
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 953 - 962
  • [2] Attention over Heads: A Multi-Hop Attention for Neural Machine Translation
    Iida, Shohei
    Kimura, Ryuichiro
    Cui, Hongyi
    Hung, Po-Hsuan
    Utsuro, Takehito
    Nagata, Masaaki
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 217 - 222
  • [3] Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation
    Behnke, Maximiliana
    Heafield, Kenneth
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2664 - 2674
  • [4] Towards a Better Understanding of Label Smoothing in Neural Machine Translation
    Gao, Yingbo
    Wang, Weiyue
    Herold, Christian
    Yang, Zijian
    Ney, Hermann
    [J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 212 - 223
  • [5] Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation
    Zhang, Songming
    Liang, Yunlong
    Wang, Shuaibo
    Chen, Yufeng
    Han, Wenjuan
    Liu, Jian
    Xu, Jinan
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 8062 - 8079
  • [6] Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?
    Kim, Zae Myung
    Besacier, Laurent
    Nikoulina, Vassilina
    Schwab, Didier
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2832 - 2841
  • [7] Recurrent Attention for Neural Machine Translation
    Zeng, Jiali
    Wu, Shuangzhi
    Yin, Yongjing
    Jiang, Yufan
    Li, Mu
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3216 - 3225
  • [8] Neural Machine Translation with Deep Attention
    Zhang, Biao
    Xiong, Deyi
    Su, Jinsong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 154 - 163
  • [9] Attention-via-Attention Neural Machine Translation
    Zhao, Shenjian
    Zhang, Zhihua
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 563 - 570
  • [10] Visualizing and Understanding Neural Machine Translation
    Ding, Yanzhuo
    Liu, Yang
    Luan, Huanbo
    Sun, Maosong
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1150 - 1159